Pandas.Errors.Parsererror: Error Could Possibly Be Due to Quotes Being Ignored When a Multi-Char Delimiter Is Used

Pandas: Error could possibly be due to quotes being ignored when a multi-char delimiter is used

We can work around this by setting our seperator to "3 or more spaces":

df = pd.read_csv(text, sep="\s{3,}", header=None)
print(df)
                   0                                         1       2    3    4
0 option19971675181 ACHILLE BLA BLA BLA1 blabla 88 498
1 option19971675182 ACHILLE BLA BLA BLA 1 blabla 176 498
2 option19971675183 ACHILLE BLA BLA BLA1 blabla 191 498
3 option19971675184 ACHILLE BLA BLA BLA1 blabla 521 498
4 option19971675185 ACHILLE BLA BLA BLA1 blabla 919 498
5 option19971675186 ACHILLE BLA BLA BLA134234531 blabla 10 498
6 option19971675187 ACHILLE BLA BLA BLA134234531 7 65 blabla 0 0
7 option19971675188 ACHILLE BLA BLA BLA1342 90345 31 blabla 0 0
8 option19971675189 ACHILLE BLA BLA BLA 134 23N 094 87OP531 blabla 0 0
9 option19971675190 ACHILLE BLA BLA BLA 134 23N 094 87 OP53 blabla 0 0

Note: although this solution works, your file format looks more like a fixed width format and thus you should use pd.read_fwf

df = pd.read_fwf(text, colspecs="infer", header=None)
print(df)
                   0                                         1       2          3
0 option19971675181 ACHILLE BLA BLA BLA1 blabla 88 498
1 option19971675182 ACHILLE BLA BLA BLA 1 blabla 176 498
2 option19971675183 ACHILLE BLA BLA BLA1 blabla 191 498
3 option19971675184 ACHILLE BLA BLA BLA1 blabla 521 498
4 option19971675185 ACHILLE BLA BLA BLA1 blabla 919 498
5 option19971675186 ACHILLE BLA BLA BLA134234531 blabla 10 498
6 option19971675187 ACHILLE BLA BLA BLA134234531 7 65 blabla 0 0
7 option19971675188 ACHILLE BLA BLA BLA1342 90345 31 blabla 0 0
8 option19971675189 ACHILLE BLA BLA BLA 134 23N 094 87OP531 blabla 0 0
9 option19971675190 ACHILLE BLA BLA BLA 134 23N 094 87 OP53 blabla 0 0

Python pandas read_csv with custom separator

From .read_csv()

sep:str, default ‘,’ : Delimiter to use. ... In addition, separators longer than 1 character and different from '\s+' will be interpreted as regular expressions and will also force the use of the Python parsing engine.

And | is special char in regex grammar (means OR) so you need to escape it, so you need

df = pd.read_csv('data_analyst_assignment.csv',sep='\|\|/', engine='python')

pandas read_csv() for multiple delimiters

From this question, Handling Variable Number of Columns with Pandas - Python, one workaround to pandas.errors.ParserError: Expected 29 fields in line 11, saw 45. is let read_csv know about how many columns in advance.

my_cols = [str(i) for i in range(45)] # create some col names
df_user_key_word_org = pd.read_csv(filepath+"user_key_word.txt",
sep="\s+|;|:",
names=my_cols,
header=None,
engine="python")
# I tested with s = StringIO(text_from_OP) on my computer

Sample Image

Hope this works.



Related Topics



Leave a reply



Submit