Split String on Whitespace in Python

Split string on whitespace in Python

The str.split() method without an argument splits on whitespace:

>>> "many   fancy word \nhello    \thi".split()
['many', 'fancy', 'word', 'hello', 'hi']

Does Python's split function splits by a newline or a whitespace by default

If sep is not specified or is None, a different splitting algorithm is
applied: runs of consecutive whitespace are regarded as a single
separator and the result will contain no empty strings at the start
or end if the string has leading or trailing whitespace.

Tabs (\t), newlines (\n), spaces, etc. They all count as whitespace characters as technically they all serve the same purpose. To space things out.

Split vs Strip in Python to remove redundant white space

According to the documentation:

If sep is not specified or is None, a different splitting algorithm is applied: runs of consecutive whitespace are regarded as a single separator, and the result will contain no empty strings at the start or end if the string has leading or trailing whitespace.

Which means, that the logic of strip() is already included into split(), so I think, your teacher is wrong. (Notice, that this will change in case if you're using a non-default separator.)

split string by arbitrary number of white spaces

Just use my_str.split() without ' '.


More, you can also indicate how many splits to perform by specifying the second parameter:

>>> ' 1 2 3 4  '.split(None, 2)
['1', '2', '3 4 ']
>>> ' 1 2 3 4 '.split(None, 1)
['1', '2 3 4 ']

Python: Split a string, including the whitespace character

You can use cast list() to the string:

>>> list('the string')
['t', 'h', 'e', ' ', 's', 't', 'r', 'i', 'n', 'g']

Split string into a list on whitespace, excluding single spaces when the next character is not a dash

A less haphazard approach would be to interpret the headers on the first line as column indicators, and split on those widths.

import sys
import re

def col_widths(s):
# Shamelessly adapted from https://stackoverflow.com/a/33090071/874188
cols = re.findall(r'\S+\s+', s)
return [len(col) for col in cols]

widths = col_widths(next(sys.stdin))

for line in sys.stdin:
line = line.rstrip('\n')
fields = []
for col_max in widths[:-1]:
fields.append(line[0:col_max].strip())
line = line[col_max:]
fields.append(line)
print(fields)

Demo: https://ideone.com/ASANjn

This seems to provide a better interpretation of e,g. the LDate column, where the dates are sometimes padded with more than one space. The penultimate column preserves the final dash as part of the column value; this seems more consistent with the apparent intent of the author of the original table, though perhaps separately split that off from that specific column if that's not to your liking.

If you don't want to read sys.stdin, just wrap this in with open(filename) as handle: and replace sys.stdin with handle everywhere.

python split string on whitespace

I see that you have several \t sometimes. I'd use the re module to split correctly:

for line in lines:
linedata = re.split(r'\t+', line)
print ",".join(linedata)

How can i split a string by whitespace and underscore?

You want the split function in the re package:

>>> import re
>>> mystring = "Tom Dough__________8.5 7.5 9.5"
>>> re.split(' |_', mystring)
['Tom', 'Dough', '', '', '', '', '', '', '', '', '', '8.5', '7.5', '9.5']

Preserve whitespaces when using split() and join() in python

You want to use re.split() in that case, with a group:

re.split(r'(\s+)', line)

would return both the columns and the whitespace so you can rejoin the line later with the same amount of whitespace included.

Example:

>>> re.split(r'(\s+)', line)
['BBP1', ' ', '0.000000', ' ', '-0.150000', ' ', '2.033000', ' ', '0.00', ' ', '-0.150', ' ', '1.77']

You probably do want to remove the newline from the end.



Related Topics



Leave a reply



Submit