When splitting an empty string in Python, why does split() return an empty list while split('\n') returns ['']?
Question: I am usingThesplit('\n')
to get lines in one string, and found that''.split()
returns an empty list,[]
, while''.split('\n')
returns['']
.
str.split()
method has two algorithms. If no arguments are given, it splits on repeated runs of whitespace. However, if an argument is given, it is treated as a single delimiter with no repeated runs.In the case of splitting an empty string, the first mode (no argument) will return an empty list because the whitespace is eaten and there are no values to put in the result list.
In contrast, the second mode (with an argument such as \n
) will produce the first empty field. Consider if you had written '\n'.split('\n')
, you would get two fields (one split, gives you two halves).
Question: Is there any specific reason for such a difference?This first mode is useful when data is aligned in columns with variable amounts of whitespace. For example:
>>> data = '''\
Shasta California 14,200
McKinley Alaska 20,300
Fuji Japan 12,400
'''
>>> for line in data.splitlines():
print(line.split())
['Shasta', 'California', '14,200']
['McKinley', 'Alaska', '20,300']
['Fuji', 'Japan', '12,400']
The second mode is useful for delimited data such as CSV where repeated commas denote empty fields. For example:>>> data = '''\
Guido,BDFL,,Amsterdam
Barry,FLUFL,,USA
Tim,,,USA
'''
>>> for line in data.splitlines():
print(line.split(','))
['Guido', 'BDFL', '', 'Amsterdam']
['Barry', 'FLUFL', '', 'USA']
['Tim', '', '', 'USA']
Note, the number of result fields is one greater than the number of delimiters. Think of cutting a rope. If you make no cuts, you have one piece. Making one cut, gives two pieces. Making two cuts, gives three pieces. And so it is with Python's str.split(delimiter)
method:>>> ''.split(',') # No cuts
['']
>>> ','.split(',') # One cut
['', '']
>>> ',,'.split(',') # Two cuts
['', '', '']
Question: And is there any more convenient way to count lines in a string?Yes, there are a couple of easy ways. One uses
str.count()
and the other uses str.splitlines()
. Both ways will give the same answer unless the final line is missing the \n
. If the final newline is missing, the str.splitlines
approach will give the accurate answer. A faster technique that is also accurate uses the count method but then corrects it for the final newline:>>> data = '''\
Line 1
Line 2
Line 3
Line 4'''
>>> data.count('\n') # Inaccurate
3
>>> len(data.splitlines()) # Accurate, but slow
4
>>> data.count('\n') + (not data.endswith('\n')) # Accurate and fast
4
Question from @Kaz: Why the heck are two very different algorithms shoe-horned into a single function?The signature for
str.split
is about 20 years old, and a number of the APIs from that era are strictly pragmatic. While not perfect, the method signature isn't "terrible" either. For the most part, Guido's API design choices have stood the test of time.The current API is not without advantages. Consider strings such as:
ps_aux_header = 'USER PID %CPU %MEM VSZ'
patient_header = 'name,age,height,weight'
When asked to break these strings into fields, people tend to describe both using the same English word, "split". When asked to read code such as fields = line.split()
or fields = line.split(',')
, people tend to correctly interpret the statements as "splits a line into fields".Microsoft Excel's text-to-columns tool made a similar API choice and
incorporates both splitting algorithms in the same tool. People seem to mentally model field-splitting as a single concept even though more than one algorithm is involved.
Why are empty strings returned in split() results?
str.split
complements str.join
, so
"/".join(['', 'segment', 'segment', ''])
gets you back the original string.If the empty strings were not there, the first and last '/'
would be missing after the join()
.
python split empty string
Based on python wiki :
str.split([sep[, maxsplit]])For more explanation read this answer too https://stackoverflow.com/a/16645307/2867928If sep is given, consecutive delimiters are not grouped together and are deemed to delimit empty strings (for example,
'1,,2'.split(',')
returns['1', '', '2']
). The sep argument may consist of multiple characters (for example,'1<>2<>3'.split('<>')
returns['1', '2', '3']
). Splitting an empty string with a specified separator returns['']
.If
sep
is not specified or isNone
, a different splitting algorithm is applied: runs of consecutive whitespace are regarded as a single separator, and the result will contain no empty strings at the start or end if the string has leading or trailing whitespace. Consequently, splitting an empty string or a string consisting of just whitespace with aNone
separator returns[]
.
Why does split on an empty string return a non-empty array?
For the same reason that
",test" split ','
and",test," split ','
will return an array of size 2. Everything before the first match is returned as the first element. Why does splitting a string on itself return an empty slice with a length of two?
As I understand it, the split
function returns everything before the /
(which is nothing) in the first item, and everything after the /
(also nothing) in the second item. Hence, two empty strings. As for why you ever get empty strings, it's so that split()
can basically be the opposite of join
, as explained here:
Why are empty strings returned in split() results?
How to get rid of empty strings in Python when splitting a list?
readfile=['Some name____2.0 2.1 1.3','Some other name_____2.2 3.4 1.1']
data=[]
for line in readfile:
first_split=list(part for part in line.split('_') if part!='')
data.append(list([first_split [0],first_split [1].split(' ')]))
print(data)
I think this does what you wanted if I understood you correctly. It prints out:[['Some name', ['2.0', '2.1', '1.3']], ['Some other name', ['2.2', '3.4', '1.1']]]
Related Topics
How to Create Module-Wide Variables in Python
How to Frame Two for Loops in List Comprehension Python
Python: Can't Pickle Type X, Attribute Lookup Failed
In Python, Why Is List[] Automatically Global
Pairwise Crossproduct in Python
Csvwriter Not Saving Data to File the Moment I Write It
How to Implement SQL Coalesce in Pandas
In What Order Does Python Display Dictionary Keys
Wrapping Long Y Labels in Matplotlib Tight Layout Using Setp
A Good Way to Make Long Strings Wrap to Newline
What Does 'Wb' Mean in This Code, Using Python
Pandas Latitude-Longitude to Distance Between Successive Rows
Powersets in Python Using Itertools
How to Read Class Attributes in the Same Order as Declared
How to Pickle a Dynamically Created Nested Class in Python
How to Get Windows' Special Folders for Currently Logged-In User