How to Read File with Space Separated Values in Pandas

How to read file with space separated values in pandas

add delim_whitespace=True argument, it's faster than regex.

read in .txt file , transform into pandas dataframe, but spaces seperating value vary in number of spaces

You can use regex as the delimiter:

pd.read_csv(data_file, header=None, delimiter=r"\s+", names='Col_a Col_b Col_c'.split(' '))

Or you can use delim_whitespace=True argument, it's faster than regex:

pd.read_csv(data_file, header=None, delim_whitespace=True, names='Col_a Col_b Col_c'.split(' '))

Reference: How to read file with space separated values in pandas

python pandas reading space separated data

You can read all data in one column with some separator not exist in text like | and then for new columns use Series.str.split with n parameter and no separator, because space is default sep:

data = pd.read_csv("twitter_file_path.txt", sep="|", names=['data'])
print (data)
                                                data
0  702377236289228800 2016-02-24 09:19:17 +03 <Aa...

data = data['data'].str.split(n=5, expand=True)
data.columns = ["seq", "date", "Hour", "GMT","userID","text"]
print (data)
                  seq        date      Hour  GMT            userID  \
0  702377236289228800  2016-02-24  09:19:17  +03  <Aadil_Siddiqui>   

                                                text  
0  #HECRanking Rs71 Bil bdget alloctd 2 HEC is no...

Read Space-separated Data with Pandas

Your original line:

pd.read_csv(filename, sep=' ',header=None)

was specifying the separator as a single space, because your csvs can have spaces or tabs you can pass a regular expression to the sep param like so:

pd.read_csv(filename, sep='\s+',header=None)

This defines separator as being one single white space or more, there is a handy cheatsheet that lists regular expressions.

Reading string data separated by spaces in Pandas

Using Regex Lookbehind & Lookahead sep="(?<=\w) (?=\d)"

Ex:

import pandas as pd

df = pd.read_csv(filename, sep="(?<=\w) (?=\d)", names=["Name", "Fraction"])
print(df)

Output:

                                          Name  Fraction
0  Balkrishna Industries Ltd. Auto Ancillaries      3.54
1        Aurobindo Pharma Ltd. Pharmaceuticals      3.36
2              NIIT Technologies Ltd. Software      3.31
3                Sonata Software Ltd. Software      3.21

How to Read File with Space Separated Values in Pandas