Load Data from Txt with Pandas

Load data from txt with pandas

You can do as:

import pandas as pd
df = pd.read_csv('file_location\filename.txt', delimiter = "\t")

(like, df = pd.read_csv('F:\Desktop\ds\text.txt', delimiter = "\t")

Get specific data from txt file to pandas dataframe

First you can process the txt file to a list of list. Inner list means each hunk lines. Outer list means different hunks:

import pandas as pd

with open('data.txt', 'r') as f:
res = f.read()

records = [list(map(str.strip, line.strip().split('\n'))) for line in res.split('\n\n')]
print(records)

[['Wed Mar 23 16:59:25 GMT 2022', '1 State', '1 ESTAB'], ['Wed Mar 23 16:59:26 GMT 2022', '1 State', '1 ESTAB', '1 CLOSE-WAIT'], ['Wed Mar 23 16:59:27 GMT 2022', '1 State', '1 ESTAB', '10 FIN-WAIT'], ['Wed Mar 23 16:59:28 GMT 2022', '1 State', '1 CLOSE-WAIT', '102 ESTAB']]

Then you can turn the list of list to list of dictionary by manually define each key and value

l = []
for record in records:
d = {}
d['timestamp'] = record[0]
for r in record[1:]:
key = r.split(' ')[1]
value = r.split(' ')[0]
d[key] = value

l.append(d)
print(l)

[{'timestamp': 'Wed Mar 23 16:59:25 GMT 2022', 'State': '1', 'ESTAB': '1'}, {'timestamp': 'Wed Mar 23 16:59:26 GMT 2022', 'State': '1', 'ESTAB': '1', 'CLOSE-WAIT': '1'}, {'timestamp': 'Wed Mar 23 16:59:27 GMT 2022', 'State': '1', 'ESTAB': '1', 'FIN-WAIT': '10'}, {'timestamp': 'Wed Mar 23 16:59:28 GMT 2022', 'State': '1', 'CLOSE-WAIT': '1', 'ESTAB': '102'}]

At last you can feed this dictionary into dataframe and fill the nan cell

df = pd.DataFrame(l).fillna(0)
print(df)

timestamp State ESTAB CLOSE-WAIT FIN-WAIT
0 Wed Mar 23 16:59:25 GMT 2022 1 1 0 0
1 Wed Mar 23 16:59:26 GMT 2022 1 1 1 0
2 Wed Mar 23 16:59:27 GMT 2022 1 1 0 10
3 Wed Mar 23 16:59:28 GMT 2022 1 102 1 0

How to read a .txt in Pandas that isn't properly delimited

You could pre-parse each line and then create the DataFrame, for example:

import pandas as pd

with open('input.txt') as f_input:
data = [line.strip().split(' ', 4) for line in f_input]

df = pd.DataFrame(data, columns=['c1', 'c2', 'date', 'time', 'desc'])
print(df)

Giving you:

   c1 c2        date      time                                                            desc
0 09 4 10/11/2021 22:21:17 The PLC reported that sorter SS02 has E-stopped.
1 08 4 10/11/2021 22:21:17 The PLC reported that sorter SS02 has stopped.
2 08 4 10/11/2021 22:21:18 The PLC reported that sorter SS01 has stopped.
3 20 5 10/11/2021 22:21:18 The PLC reported that purge mode was disabled for sorter SS02.
4 20 5 10/11/2021 22:21:18 The PLC reported that purge mode was disabled for sorter SS01.
5 23 5 10/11/2021 22:21:19 AUX Sortation has been enabled for sorter SS02.
6 23 5 10/11/2021 22:21:20 AUX Sortation has been enabled for sorter SS01.

A datetime column could be added by combining the date and time columns and converting them into a datetime:

df['datetime'] = pd.to_datetime(df['date'] + ' ' + df['time'])


Related Topics



Leave a reply



Submit