How to read file with space separated values in pandas
add delim_whitespace=True
argument, it's faster than regex.
read in .txt file , transform into pandas dataframe, but spaces seperating value vary in number of spaces
You can use regex as the delimiter:
pd.read_csv(data_file, header=None, delimiter=r"\s+", names='Col_a Col_b Col_c'.split(' '))
Or you can use delim_whitespace=True
argument, it's faster than regex:
pd.read_csv(data_file, header=None, delim_whitespace=True, names='Col_a Col_b Col_c'.split(' '))
Reference: How to read file with space separated values in pandas
python pandas reading space separated data
You can read all data in one column with some separator not exist in text like |
and then for new columns use Series.str.split
with n
parameter and no separator, because space is default sep:
data = pd.read_csv("twitter_file_path.txt", sep="|", names=['data'])
print (data)
data
0 702377236289228800 2016-02-24 09:19:17 +03 <Aa...
data = data['data'].str.split(n=5, expand=True)
data.columns = ["seq", "date", "Hour", "GMT","userID","text"]
print (data)
seq date Hour GMT userID \
0 702377236289228800 2016-02-24 09:19:17 +03 <Aadil_Siddiqui>
text
0 #HECRanking Rs71 Bil bdget alloctd 2 HEC is no...
Read Space-separated Data with Pandas
Your original line:
pd.read_csv(filename, sep=' ',header=None)
was specifying the separator as a single space, because your csvs can have spaces or tabs you can pass a regular expression to the sep
param like so:
pd.read_csv(filename, sep='\s+',header=None)
This defines separator as being one single white space or more, there is a handy cheatsheet that lists regular expressions.
Reading string data separated by spaces in Pandas
Using Regex Lookbehind & Lookahead sep="(?<=\w) (?=\d)"
Ex:
import pandas as pd
df = pd.read_csv(filename, sep="(?<=\w) (?=\d)", names=["Name", "Fraction"])
print(df)
Output:
Name Fraction
0 Balkrishna Industries Ltd. Auto Ancillaries 3.54
1 Aurobindo Pharma Ltd. Pharmaceuticals 3.36
2 NIIT Technologies Ltd. Software 3.31
3 Sonata Software Ltd. Software 3.21
Reading txt file with more than one space as a delimiter in Python
Other solution, using pandas
:
import pandas as pd
df = pd.read_csv("your_file.txt", sep=r"\s{2,}", engine="python", header=None)
print(df)
Prints:
0 1 2 3
0 aaaxx 123 A xyz 456 BB
1 zcbb a b XYZ xtz 1
2 cdddtr a 111 tddw
Related Topics
Python Pandas: Group Datetime Column into Hour and Minute Aggregations
Optimizing Database Queries in Django Rest Framework
Executing Command Line Programs from Within Python
Rreplace - How to Replace the Last Occurrence of an Expression in a String
How to Access a File's Properties on Windows
How to Compare Two JSON Objects with the Same Elements in a Different Order Equal
Using Cprofile Results with Kcachegrind
How to Iterate Through Dictionary in a Dictionary in Django Template
Pandas Finding Local Max and Min
Uploading Multiple Files with Flask
Get an Attribute Value Based on the Name Attribute with Beautifulsoup
Does Tkinter Have a Table Widget
Validating Detailed Types in Python Dataclasses
Operation on Every Pair of Element in a List
What Is Different Between All These Opencv Python Interfaces