Split string on whitespace in Python
The str.split()
method without an argument splits on whitespace:
>>> "many fancy word \nhello \thi".split()
['many', 'fancy', 'word', 'hello', 'hi']
Does Python's split function splits by a newline or a whitespace by default
If sep is not specified or is None, a different splitting algorithm is
applied: runs of consecutive whitespace are regarded as a single
separator and the result will contain no empty strings at the start
or end if the string has leading or trailing whitespace.
Tabs (\t
), newlines (\n
), spaces, etc. They all count as whitespace characters as technically they all serve the same purpose. To space things out.
Split vs Strip in Python to remove redundant white space
According to the documentation:
If sep is not specified or is
None
, a different splitting algorithm is applied: runs of consecutive whitespace are regarded as a single separator, and the result will contain no empty strings at the start or end if the string has leading or trailing whitespace.
Which means, that the logic of strip()
is already included into split()
, so I think, your teacher is wrong. (Notice, that this will change in case if you're using a non-default separator.)
split string by arbitrary number of white spaces
Just use my_str.split()
without ' '
.
More, you can also indicate how many splits to perform by specifying the second parameter:
>>> ' 1 2 3 4 '.split(None, 2)
['1', '2', '3 4 ']
>>> ' 1 2 3 4 '.split(None, 1)
['1', '2 3 4 ']
Python: Split a string, including the whitespace character
You can use cast list()
to the string:
>>> list('the string')
['t', 'h', 'e', ' ', 's', 't', 'r', 'i', 'n', 'g']
Split string into a list on whitespace, excluding single spaces when the next character is not a dash
A less haphazard approach would be to interpret the headers on the first line as column indicators, and split on those widths.
import sys
import re
def col_widths(s):
# Shamelessly adapted from https://stackoverflow.com/a/33090071/874188
cols = re.findall(r'\S+\s+', s)
return [len(col) for col in cols]
widths = col_widths(next(sys.stdin))
for line in sys.stdin:
line = line.rstrip('\n')
fields = []
for col_max in widths[:-1]:
fields.append(line[0:col_max].strip())
line = line[col_max:]
fields.append(line)
print(fields)
Demo: https://ideone.com/ASANjn
This seems to provide a better interpretation of e,g. the LDate
column, where the dates are sometimes padded with more than one space. The penultimate column preserves the final dash as part of the column value; this seems more consistent with the apparent intent of the author of the original table, though perhaps separately split that off from that specific column if that's not to your liking.
If you don't want to read sys.stdin
, just wrap this in with open(filename) as handle:
and replace sys.stdin
with handle
everywhere.
python split string on whitespace
I see that you have several \t
sometimes. I'd use the re
module to split correctly:
for line in lines:
linedata = re.split(r'\t+', line)
print ",".join(linedata)
How can i split a string by whitespace and underscore?
You want the split
function in the re
package:
>>> import re
>>> mystring = "Tom Dough__________8.5 7.5 9.5"
>>> re.split(' |_', mystring)
['Tom', 'Dough', '', '', '', '', '', '', '', '', '', '8.5', '7.5', '9.5']
Preserve whitespaces when using split() and join() in python
You want to use re.split()
in that case, with a group:
re.split(r'(\s+)', line)
would return both the columns and the whitespace so you can rejoin the line later with the same amount of whitespace included.
Example:
>>> re.split(r'(\s+)', line)
['BBP1', ' ', '0.000000', ' ', '-0.150000', ' ', '2.033000', ' ', '0.00', ' ', '-0.150', ' ', '1.77']
You probably do want to remove the newline from the end.
Related Topics
Python Multiprocessing + Subprocess Issues
Fastest Way to Download 3 Million Objects from a S3 Bucket
Distributing Ruby/Python Desktop Apps
What Are the Risks of Running 'Sudo Pip'
What Does the Slash Mean in Help() Output
Python: Removing List Element While Iterating Over List
Formatting Floats Without Trailing Zeros
Why Is Button Parameter "Command" Executed When Declared
Order of Keys in Dictionaries in Old Versions of Python
Iterate an Iterator by Chunks (Of N) in Python
Django Multivaluedictkeyerror Error, How to Deal with It
How to Use Python Requests to Fake a Browser Visit A.K.A and Generate User Agent