Python Split a String with at Least 2 Whitespaces

python split a string with at least 2 whitespaces

>>> import re    
>>> text = '10DEUTSCH GGS Neue Heide 25-27 Wahn-Heide -1 -1'
>>> re.split(r'\s{2,}', text)
['10DEUTSCH', 'GGS Neue Heide 25-27', 'Wahn-Heide', '-1', '-1']

Where

  • \s matches any whitespace character, like \t\n\r\f\v and more
  • {2,} is a repetition, meaning "2 or more"

Split on more than one space?

If you want to split by any whitespace, you can use str.split:

mystr.split()

# ['IDNumber', 'Firstname', 'Lastname', 'GPA', 'Credits']

For two or more spaces:

list(filter(None, mystr.split('  ')))

# ['IDNumber', 'Firstname Lastname', 'GPA', 'Credits']

Split a string with unknown number of spaces as separator in Python

If you don't pass any arguments to str.split(), it will treat runs of whitespace as a single separator:

>>> ' 1234    Q-24 2010-11-29         563   abc  a6G47er15'.split()
['1234', 'Q-24', '2010-11-29', '563', 'abc', 'a6G47er15']

Split string on whitespace in Python

The str.split() method without an argument splits on whitespace:

>>> "many   fancy word \nhello    \thi".split()
['many', 'fancy', 'word', 'hello', 'hi']

Split pandas dataframe string by multiple whitespaces

try this regex

(\s{4,})

\s whitespace
{4,} at least 4 times

Split vs Strip in Python to remove redundant white space

According to the documentation:

If sep is not specified or is None, a different splitting algorithm is applied: runs of consecutive whitespace are regarded as a single separator, and the result will contain no empty strings at the start or end if the string has leading or trailing whitespace.

Which means, that the logic of strip() is already included into split(), so I think, your teacher is wrong. (Notice, that this will change in case if you're using a non-default separator.)

Python split string exactly on one space. if double space make word not word

Just another regex solution: if you need to split with a single left-most whitespace char, use \s? to match one or zero whitespaces, and then capture 0+ remaining whitespaces and the subsequent non-whitespace chars.

One very important step: run rstrip on the input string before running the regex to remove all the trailing whitespace, since otherwise, its performance will decrease greatly.

import re
words = "this is a book and i like it"
print(re.findall(r'\s?(\s*\S+)', words.rstrip()))
# => ['this', 'is', 'a', ' book', 'and', 'i', ' like', 'it']

See a Python demo. The re.findall returns just the captured substrings and since we only have one capturing group, the result is a list of those captures.

Also, here is a regex demo. Details:

  • \s? - 1 or 0 (due to ? quantifier) whitespaces
  • (\s*\S+) - Capturing group #1 matching

    • \s* - zero or more (due to the * quantifier) whitespace
    • \S+ - 1 or more (due to + quantifier) non-whitespace symbols.


Related Topics



Leave a reply



Submit