Load data from txt with pandas
You can do as:
import pandas as pd
df = pd.read_csv('file_location\filename.txt', delimiter = "\t")
(like, df = pd.read_csv('F:\Desktop\ds\text.txt', delimiter = "\t")
Get specific data from txt file to pandas dataframe
First you can process the txt file to a list of list. Inner list means each hunk lines. Outer list means different hunks:
import pandas as pd
with open('data.txt', 'r') as f:
res = f.read()
records = [list(map(str.strip, line.strip().split('\n'))) for line in res.split('\n\n')]
print(records)
[['Wed Mar 23 16:59:25 GMT 2022', '1 State', '1 ESTAB'], ['Wed Mar 23 16:59:26 GMT 2022', '1 State', '1 ESTAB', '1 CLOSE-WAIT'], ['Wed Mar 23 16:59:27 GMT 2022', '1 State', '1 ESTAB', '10 FIN-WAIT'], ['Wed Mar 23 16:59:28 GMT 2022', '1 State', '1 CLOSE-WAIT', '102 ESTAB']]
Then you can turn the list of list to list of dictionary by manually define each key and value
l = []
for record in records:
d = {}
d['timestamp'] = record[0]
for r in record[1:]:
key = r.split(' ')[1]
value = r.split(' ')[0]
d[key] = value
l.append(d)
print(l)
[{'timestamp': 'Wed Mar 23 16:59:25 GMT 2022', 'State': '1', 'ESTAB': '1'}, {'timestamp': 'Wed Mar 23 16:59:26 GMT 2022', 'State': '1', 'ESTAB': '1', 'CLOSE-WAIT': '1'}, {'timestamp': 'Wed Mar 23 16:59:27 GMT 2022', 'State': '1', 'ESTAB': '1', 'FIN-WAIT': '10'}, {'timestamp': 'Wed Mar 23 16:59:28 GMT 2022', 'State': '1', 'CLOSE-WAIT': '1', 'ESTAB': '102'}]
At last you can feed this dictionary into dataframe and fill the nan cell
df = pd.DataFrame(l).fillna(0)
print(df)
timestamp State ESTAB CLOSE-WAIT FIN-WAIT
0 Wed Mar 23 16:59:25 GMT 2022 1 1 0 0
1 Wed Mar 23 16:59:26 GMT 2022 1 1 1 0
2 Wed Mar 23 16:59:27 GMT 2022 1 1 0 10
3 Wed Mar 23 16:59:28 GMT 2022 1 102 1 0
How to read a .txt in Pandas that isn't properly delimited
You could pre-parse each line and then create the DataFrame, for example:
import pandas as pd
with open('input.txt') as f_input:
data = [line.strip().split(' ', 4) for line in f_input]
df = pd.DataFrame(data, columns=['c1', 'c2', 'date', 'time', 'desc'])
print(df)
Giving you:
c1 c2 date time desc
0 09 4 10/11/2021 22:21:17 The PLC reported that sorter SS02 has E-stopped.
1 08 4 10/11/2021 22:21:17 The PLC reported that sorter SS02 has stopped.
2 08 4 10/11/2021 22:21:18 The PLC reported that sorter SS01 has stopped.
3 20 5 10/11/2021 22:21:18 The PLC reported that purge mode was disabled for sorter SS02.
4 20 5 10/11/2021 22:21:18 The PLC reported that purge mode was disabled for sorter SS01.
5 23 5 10/11/2021 22:21:19 AUX Sortation has been enabled for sorter SS02.
6 23 5 10/11/2021 22:21:20 AUX Sortation has been enabled for sorter SS01.
A datetime column could be added by combining the date
and time
columns and converting them into a datetime:
df['datetime'] = pd.to_datetime(df['date'] + ' ' + df['time'])
Related Topics
How to Count Occurrence of Unique Values Inside a List
Most Efficient Way of Making an If-Elif-Elif-Else Statement When the Else Is Done the Most
Convert Unix Time to Readable Date in Pandas Dataframe
How to Validate a Url in Python? (Malformed or Not)
How to Find All Positions of the Maximum Value in a List
How to Download a File on a Click Event Using Selenium
Working with Big Data in Python and Numpy, Not Enough Ram, How to Save Partial Results on Disc
Returning the Product of a List
Isprime Function for Python Language
How to Expand a List to Function Arguments in Python
Matplotlib Colorbar for Scatter
Intersection of Two Graphs in Python, Find the X Value
Run a Program from Python, and Have It Continue to Run After the Script Is Killed
Read File from Line 2 or Skip Header Row
How to Decrypt Openssl Aes-Encrypted Files in Python
Remove and Replace Printed Items