Pandas: Error could possibly be due to quotes being ignored when a multi-char delimiter is used
We can work around this by setting our seperator to "3 or more spaces":
df = pd.read_csv(text, sep="\s{3,}", header=None)
print(df)
0 1 2 3 4
0 option19971675181 ACHILLE BLA BLA BLA1 blabla 88 498
1 option19971675182 ACHILLE BLA BLA BLA 1 blabla 176 498
2 option19971675183 ACHILLE BLA BLA BLA1 blabla 191 498
3 option19971675184 ACHILLE BLA BLA BLA1 blabla 521 498
4 option19971675185 ACHILLE BLA BLA BLA1 blabla 919 498
5 option19971675186 ACHILLE BLA BLA BLA134234531 blabla 10 498
6 option19971675187 ACHILLE BLA BLA BLA134234531 7 65 blabla 0 0
7 option19971675188 ACHILLE BLA BLA BLA1342 90345 31 blabla 0 0
8 option19971675189 ACHILLE BLA BLA BLA 134 23N 094 87OP531 blabla 0 0
9 option19971675190 ACHILLE BLA BLA BLA 134 23N 094 87 OP53 blabla 0 0
Note: although this solution works, your file format looks more like a fixed width format and thus you should use pd.read_fwf
df = pd.read_fwf(text, colspecs="infer", header=None)
print(df)
0 1 2 3
0 option19971675181 ACHILLE BLA BLA BLA1 blabla 88 498
1 option19971675182 ACHILLE BLA BLA BLA 1 blabla 176 498
2 option19971675183 ACHILLE BLA BLA BLA1 blabla 191 498
3 option19971675184 ACHILLE BLA BLA BLA1 blabla 521 498
4 option19971675185 ACHILLE BLA BLA BLA1 blabla 919 498
5 option19971675186 ACHILLE BLA BLA BLA134234531 blabla 10 498
6 option19971675187 ACHILLE BLA BLA BLA134234531 7 65 blabla 0 0
7 option19971675188 ACHILLE BLA BLA BLA1342 90345 31 blabla 0 0
8 option19971675189 ACHILLE BLA BLA BLA 134 23N 094 87OP531 blabla 0 0
9 option19971675190 ACHILLE BLA BLA BLA 134 23N 094 87 OP53 blabla 0 0
Python pandas read_csv with custom separator
From .read_csv()
sep:str, default ‘,’ : Delimiter to use. ... In addition, separators longer than 1 character and different from '\s+' will be interpreted as regular expressions and will also force the use of the Python parsing engine.
And |
is special char in regex grammar (means OR) so you need to escape it, so you need
df = pd.read_csv('data_analyst_assignment.csv',sep='\|\|/', engine='python')
pandas read_csv() for multiple delimiters
From this question, Handling Variable Number of Columns with Pandas - Python, one workaround to pandas.errors.ParserError: Expected 29 fields in line 11, saw 45.
is let read_csv
know about how many columns in advance.
my_cols = [str(i) for i in range(45)] # create some col names
df_user_key_word_org = pd.read_csv(filepath+"user_key_word.txt",
sep="\s+|;|:",
names=my_cols,
header=None,
engine="python")
# I tested with s = StringIO(text_from_OP) on my computer
Hope this works.
Related Topics
How to Get All Days in Current Month
How to Drop Rows of Pandas Dataframe Whose Value in a Certain Column Is Nan
Vscode: There Is No Pip Installer Available in the Selected Environment
Stripping Whitespaces from a List Inside the List of Tuples
How to Repeat a Function N Times
In Python, How to Check If a String Only Contains Certain Characters
Sort Array and Return Original Indexes of Sorted Array
Python Command Not Working in Command Prompt
Python - How to Fix "Valueerror: Not Enough Values to Unpack (Expected 2, Got 1)"
Opencv Typeerror: Expected Cv::Umat for Argument 'Src' - What Is This
How to Smooth a Curve in the Right Way
How to Extract Data from Dictionary in the List
How to Use and Print the Pandas Dataframe Name
How to Extract All Upper from a String - Python
How to Read Pdf Files One by One from a Folder in Python
Python: How to Calculate the Average Word Length in a Sentence Using the .Split Command