import text to pandas with multiple delimiters
One way might be to use the regex separators permitted by the python engine. For example:
>>> !cat castle.dat
c stuff
c more header
c begin data
1 1:.5
1 2:6.5
1 3:5.3
>>> df = pd.read_csv('castle.dat', skiprows=3, names=['a', 'b', 'c'],
sep=' |:', engine='python')
>>> df
a b c
0 1 1 0.5
1 1 2 6.5
2 1 3 5.3
Import .txt to Pandas Dataframe With Multiple Delimiters
You can start with setting names on you existing columns, and then applying regex on data while creating the new columns.
In order to fix the "single space delimiter" issue in your output, you can define "at least 2 space characters" eg [\s]{2,}
as delimiter which would fix the issue for St. Elf
in City names
An example :
import pandas as pd
import re
df = pd.read_csv(
'test.txt',
sep = '[\s]{2,}',
engine = 'python',
header = None,
index_col = False,
names= [
"FirstN","LastN","FULLSID","TeacherData","TeacherLastN"
]
)
sid_pattern = re.compile(r'(\d{9})(\d+-\d+-\d+)(.*)', re.IGNORECASE)
df['SID'] = df.apply(lambda row: sid_pattern.search(row.FULLSID).group(1), axis = 1)
df['Birth'] = df.apply(lambda row: sid_pattern.search(row.FULLSID).group(2), axis = 1)
df['City'] = df.apply(lambda row: sid_pattern.search(row.FULLSID).group(3), axis = 1)
teacherdata_pattern = re.compile(r'(.{2})([\dA-Z]+\d)(.*)', re.IGNORECASE)
df['States'] = df.apply(lambda row: teacherdata_pattern.search(row.TeacherData).group(1), axis = 1)
df['Postal'] = df.apply(lambda row: teacherdata_pattern.search(row.TeacherData).group(2)[-4:], axis = 1)
df['TeacherFirstN'] = df.apply(lambda row: teacherdata_pattern.search(row.TeacherData).group(3), axis = 1)
del df['FULLSID']
del df['TeacherData']
print(df)
Output : FirstN LastN TeacherLastN SID Birth City States Postal TeacherFirstN
0 Ann Gosh Ryan 123456789 2008-12-15 Irvine CA A9Z5 Steve
1 Yosh Dave Tuck 987654321 2009-04-18 St. Elf NY P8G0 Brad
2 Clair Simon John 324567457 2008-12-29 New Jersey NJ R9B3 Dan
pandas read_csv() for multiple delimiters
From this question, Handling Variable Number of Columns with Pandas - Python, one workaround to pandas.errors.ParserError: Expected 29 fields in line 11, saw 45.
is let read_csv
know about how many columns in advance.
my_cols = [str(i) for i in range(45)] # create some col names
df_user_key_word_org = pd.read_csv(filepath+"user_key_word.txt",
sep="\s+|;|:",
names=my_cols,
header=None,
engine="python")
# I tested with s = StringIO(text_from_OP) on my computer
Hope this works.
Convert text file into dataframe with custom multiple delimiter in python
It's tricky to know exactly what are your rules for splitting. You can use a regex as delimiter.
Here is a working example to split the lists and date as columns, but you'll probably have to tweak it to your exact rules:
df = pd.read_csv('output.txt', sep=r'(?:,\s*|^)(?:\d+: \d+x\d+|Done[^)]+\)\s*)',
header=None, engine='python', names=(None, 'a', 'b', 'date')).iloc[:, 1:]
output: a b date
0 2 persons, 1 cat, 1 clock 2 persons, 1 chair Tue, 05 April 03:54:02
1 3 persons, 1 cat, 1 laptop, 1 clock 4 persons, 2 chairs Tue, 05 April 03:54:05
2 3 persons, 1 chair 4 persons, 2 chairs Tue, 05 April 03:54:07
How to read txt file in pandas with multiple delimiters?
The \s+
delimiter would work :
df = pd.read_csv(os.path.join(maindir, 'EDMA_1_rcp26_2025_1_output.rsv'),\
skiprows = 9, delimiter = r'\s+', header = None)
Pretty simple, actually. Convert text file containing multiple delimiters to CSV
Your regex needs a tweak, `r"[ \t]+" selects any length of spaces and tabs (1 or greater). Additionally, pandas uses the first line of the file to determine how many columns there are. Your example starts with 4 columns and then adds another later on. That's too late - pandas has already created 4 element rows. You can solve that by supplying your own column names, letting pandas know how many there really are. In this example I'm just using integers but you could give them more useful names.
df = pd.read_csv('Water level.txt' , sep=r'[ \t]', encoding='GBK',
engine='python', names=range(5))
Importing CSV file with Multiple Delimiters in Python
I tried the file you provided, and it was actually giving me an encoding error.
Try the following encoding:
pd.read_csv('ses_awards.csv', encoding = 'ISO-8859-1')
Parsing txt file with multiple delimiters
A crude "solution" (which assumes the datafile is perfectly formatted):
with open('matrix.dat', 'r') as data_file:
rows, cols = [int(c) for c in data_file.readline().split() if c.isnumeric()]
array = np.fromstring(data_file.read(), sep=' ').reshape(rows, cols)
And here's a probably unnecessary alternative which avoids reading the entire file as a single string:import itertools
chainstar = itertools.chain.from_iterable
with open('matrix.dat', 'r') as data_file:
rows, cols = [int(c)
for c in data_file.readline().split()
if c.isnumeric()]
array = np.fromiter(chainstar(map(lambda s:s.split(), data_file)),
dtype=np.float,
count=rows*cols).reshape(rows, cols)
Related Topics
Python Memory Usage of Numpy Arrays
Why "Numpy.Any" Has No Short-Circuit Mechanism
Filtering a Pyspark Dataframe with SQL-Like in Clause
How to Compare Times of the Day
Fitting a Closed Curve to a Set of Points
Using Configparser to Read a File Without Section Name
How to Check If Two Strings Are Anagrams of Each Other
Basic Program to Convert Integer to Roman Numerals
How to Read a Response from Python Requests
Does Python Evaluate If's Conditions Lazily
Matplotlib: Finding Out Xlim and Ylim After Zoom
What Are All the Dtypes That Pandas Recognizes
Why Does Python Use 'Magic Methods'
How to Convert a Python List into a C Array by Using Ctypes
Binary Numpy Array to List of Integers
"Permission Denied" Trying to Run Python on Windows 10
Passing Double Quote Shell Commands in Python to Subprocess.Popen()