python split a string with at least 2 whitespaces
>>> import re
>>> text = '10DEUTSCH GGS Neue Heide 25-27 Wahn-Heide -1 -1'
>>> re.split(r'\s{2,}', text)
['10DEUTSCH', 'GGS Neue Heide 25-27', 'Wahn-Heide', '-1', '-1']
Where
\s
matches any whitespace character, like\t\n\r\f\v
and more{2,}
is a repetition, meaning "2 or more"
Split on more than one space?
If you want to split by any whitespace, you can use str.split
:
mystr.split()
# ['IDNumber', 'Firstname', 'Lastname', 'GPA', 'Credits']
For two or more spaces:
list(filter(None, mystr.split(' ')))
# ['IDNumber', 'Firstname Lastname', 'GPA', 'Credits']
Split a string with unknown number of spaces as separator in Python
If you don't pass any arguments to str.split()
, it will treat runs of whitespace as a single separator:
>>> ' 1234 Q-24 2010-11-29 563 abc a6G47er15'.split()
['1234', 'Q-24', '2010-11-29', '563', 'abc', 'a6G47er15']
Split string on whitespace in Python
The str.split()
method without an argument splits on whitespace:
>>> "many fancy word \nhello \thi".split()
['many', 'fancy', 'word', 'hello', 'hi']
Split pandas dataframe string by multiple whitespaces
try this regex
(\s{4,})
\s whitespace
{4,} at least 4 times
Split vs Strip in Python to remove redundant white space
According to the documentation:
If sep is not specified or is
None
, a different splitting algorithm is applied: runs of consecutive whitespace are regarded as a single separator, and the result will contain no empty strings at the start or end if the string has leading or trailing whitespace.
Which means, that the logic of strip()
is already included into split()
, so I think, your teacher is wrong. (Notice, that this will change in case if you're using a non-default separator.)
Python split string exactly on one space. if double space make word not word
Just another regex solution: if you need to split with a single left-most whitespace char, use \s?
to match one or zero whitespaces, and then capture 0+ remaining whitespaces and the subsequent non-whitespace chars.
One very important step: run rstrip
on the input string before running the regex to remove all the trailing whitespace, since otherwise, its performance will decrease greatly.
import re
words = "this is a book and i like it"
print(re.findall(r'\s?(\s*\S+)', words.rstrip()))
# => ['this', 'is', 'a', ' book', 'and', 'i', ' like', 'it']
See a Python demo. The re.findall
returns just the captured substrings and since we only have one capturing group, the result is a list of those captures.
Also, here is a regex demo. Details:
\s?
- 1 or 0 (due to?
quantifier) whitespaces(\s*\S+)
- Capturing group #1 matching\s*
- zero or more (due to the*
quantifier) whitespace\S+
- 1 or more (due to+
quantifier) non-whitespace symbols.
Related Topics
How to Extract Info Within a #Shadow-Root (Open) Using Selenium Python
Create a Main Loop with Tkinter
How to Set "Camera Position" for 3D Plots Using Python/Matplotlib
What's the Difference Between 'R+' and 'A+' When Open File in Python
Variable Defined with With-Statement Available Outside of With-Block
Valueerror: Could Not Broadcast Input Array from Shape (224,224,3) into Shape (224,224)
Is There a Python Module to Solve Linear Equations
My Py2App App Will Not Open. What's the Problem
How to Set Selenium Webdriver from Headless Mode to Normal Mode Within the Same Session
Django.Db.Utils.Operationalerror Could Not Connect to Server
Sorting a 2D Numpy Array by Multiple Axes
Beautifulsoup:Difference Between .Find() and .Select()
Numpy: Fix Array with Rows of Different Lengths by Filling the Empty Elements with Zeros