Python Convert Comma Separated List to Pandas Dataframe

Python convert comma separated list to pandas dataframe

You need to split each string in your list:

import  pandas as pd

df = pd.DataFrame([sub.split(",") for sub in l])
print(df)

Output:

   0         1   2               3         4               5         6
0 AN 2__AS000 26 20150826113000 -283.000 20150826120000 -283.000
1 AN 2__A000 26 20150826113000 0.000 20150826120000 0.000
2 AN 2__AE000 26 20150826113000 -269.000 20150826120000 -269.000
3 AN 2__AE000 26 20150826113000 -255.000 20150826120000 -255.000
4 AN 2__AE00 26 20150826113000 -254.000 20150826120000 -254.000

If you know how many lines to skip in your csv you can do it all with read_csv using skiprows=lines_of_metadata:

import  pandas as pd

df = pd.read_csv("in.csv",skiprows=3,header=None)
print(df)

Or if each line of the metadata starts with a certain character you can use comment:

df = pd.read_csv("in.csv",header=None,comment="#")  

If you need to specify more then one character you can combine itertools.takewhile which will drop lines starting with xxx:

import pandas as pd
from itertools import dropwhile
import csv
with open("in.csv") as f:
f = dropwhile(lambda x: x.startswith("#!!"), f)
r = csv.reader(f)
df = pd.DataFrame().from_records(r)

Using your input data adding some lines starting with #!!:

#!! various
#!! metadata
#!! lines
AN,2__AS000,26,20150826113000,-283.000,20150826120000,-283.000
AN,2__A000,26,20150826113000,0.000,20150826120000,0.000
AN,2__AE000,26,20150826113000,-269.000,20150826120000,-269.000
AN,2__AE000,26,20150826113000,-255.000,20150826120000,-255.000
AN,2__AE00,26,20150826113000,-254.000,20150826120000,-254.000

Outputs:

    0         1   2               3         4               5         6
0 AN 2__AS000 26 20150826113000 -283.000 20150826120000 -283.000
1 AN 2__A000 26 20150826113000 0.000 20150826120000 0.000
2 AN 2__AE000 26 20150826113000 -269.000 20150826120000 -269.000
3 AN 2__AE000 26 20150826113000 -255.000 20150826120000 -255.000
4 AN 2__AE00 26 20150826113000 -254.000 20150826120000 -254.000

Dataframe: Cell Level: Convert Comma Separated String to List

  • Use pandas.Series.str.split to split the string into a list.
# use str split on the column
df.mgrs_grids = df.mgrs_grids.str.split(',')

# display(df)
driver_code journey_code mgrs_grids
0 7211863 7211863-140 [18TWL927129, 18TWL888113, 18TWL888113, 18TWL887113, 18TWL888113, 18TWL887113, 18TWL887113, 18TWL887113, 18TWL903128]
1 7211863 7211863-105 [18TWL927129, 18TWL939112, 18TWL939112, 18TWL939113, 18TWL939113, 18TWL939113, 18TWL939113, 18TWL939113, 18TWL939113, 18TWL960111, 18TWL960112]
2 7211863 7211863-50 [18TWL927129, 18TWL889085, 18TWL889085, 18TWL888085, 18TWL888085, 18TWL888085, 18TWL888085, 18TWL888085, 18TWL890085]
3 7211863 7211863-109 [18TWL927129, 18TWL952106, 18TWL952106, 18TWL952106, 18TWL952106, 18TWL952106, 18TWL952106, 18TWL952106, 18TWL952105, 18TWL951103]

print(type(df.loc[0, 'mgrs_grids']))
[out]:
list
separate row per value
  • After creating a column of lists.
  • Use pandas.DataFrame.explode to create separate rows for each value in the list.
# get a separate row for each value
df = df.explode('mgrs_grids').reset_index(drop=True)

# display(df.hea())
driver_code journey_code mgrs_grids
0 7211863 7211863-140 18TWL927129
1 7211863 7211863-140 18TWL888113
2 7211863 7211863-140 18TWL888113
3 7211863 7211863-140 18TWL887113
4 7211863 7211863-140 18TWL888113
Update
  • Here is another option, which combines the 'journey_code' to the front of 'mgrs_grids', and then splits the string into a list.
    • This list is assigned back to 'mgrs_grids', but can also be assigned to a new column.
# add the journey code to mgrs_grids and then split
df.mgrs_grids = (df.journey_code + ',' + df.mgrs_grids).str.split(',')

# display(df.head())
driver_code journey_code mgrs_grids
0 7211863 7211863-140 [7211863-140, 18TWL927129, 18TWL888113, 18TWL888113, 18TWL887113, 18TWL888113, 18TWL887113, 18TWL887113, 18TWL887113, 18TWL903128]
1 7211863 7211863-105 [7211863-105, 18TWL927129, 18TWL939112, 18TWL939112, 18TWL939113, 18TWL939113, 18TWL939113, 18TWL939113, 18TWL939113, 18TWL939113, 18TWL960111, 18TWL960112]
2 7211863 7211863-50 [7211863-50, 18TWL927129, 18TWL889085, 18TWL889085, 18TWL888085, 18TWL888085, 18TWL888085, 18TWL888085, 18TWL888085, 18TWL890085]
3 7211863 7211863-109 [7211863-109, 18TWL927129, 18TWL952106, 18TWL952106, 18TWL952106, 18TWL952106, 18TWL952106, 18TWL952106, 18TWL952106, 18TWL952105, 18TWL951103]

# output to nested list
df.mgrs_grids.tolist()

[out]:
[['7211863-140', '18TWL927129', '18TWL888113', '18TWL888113', '18TWL887113', '18TWL888113', '18TWL887113', '18TWL887113', '18TWL887113', '18TWL903128'],
['7211863-105', '18TWL927129', '18TWL939112', '18TWL939112', '18TWL939113', '18TWL939113', '18TWL939113', '18TWL939113', '18TWL939113', '18TWL939113', '18TWL960111', '18TWL960112'],
['7211863-50', '18TWL927129', '18TWL889085', '18TWL889085', '18TWL888085', '18TWL888085', '18TWL888085', '18TWL888085', '18TWL888085', '18TWL890085'],
['7211863-109', '18TWL927129', '18TWL952106', '18TWL952106', '18TWL952106', '18TWL952106', '18TWL952106', '18TWL952106', '18TWL952106', '18TWL952105', '18TWL951103']]

Converting string with comma delimited data and newline character to pandas dataframe

This works fine when feeding the string to pandas.read_csv:

import pandas as pd
from io import StringIO

mystr = StringIO("""2018-06-11 09:31:00,968.250,965.000,968.000,965.250,17220,1160\n2018-06-11 09:32:00,965.250,964.250,965.250,964.750,17872,611\n2018-06-11 09:33:00,965.000,963.250,965.000,963.500,18851,547\n""")

df = pd.read_csv(mystr, index_col=0, header=None)
df.index = pd.to_datetime(df.index)

print(df)

1 2 3 4 5 6
0
2018-06-11 09:31:00 968.25 965.00 968.00 965.25 17220 1160
2018-06-11 09:32:00 965.25 964.25 965.25 964.75 17872 611
2018-06-11 09:33:00 965.00 963.25 965.00 963.50 18851 547

print(df.dtypes)

1 float64
2 float64
3 float64
4 float64
5 int64
6 int64
dtype: object

Convert Comma Seperated String to Pandas Dataframe

You can try regex

import re
import pandas as pd

s = "Key=xxxx, age=11, key=yyyy , age=22,Key=zzzz, age=01, key=qqqq, age=21,Key=wwwww, age=91, key=pppp, age=22"

df = pd.DataFrame(zip(re.findall(r'Key=([^,\s]+)', s, re.IGNORECASE), re.findall(r'age=([^,\s]+)', s, re.IGNORECASE)),
columns=['key', 'age'])

df
     key    age
0 xxxx 11
1 yyyy 22
2 zzzz 01
3 qqqq 21
4 wwwww 91
5 pppp 22

convert a string delimited by comma into list in pandas

You can use str.split to split a comma-separated string to list. You can also use apply(set) for your specific purposes IIUC:

(df['Col1'].str.split(',').apply(set) - df['Col2'].str.split(',').apply(set)).tolist()

[out]

[{'a', 'b', 'c'}, {'g'}, set()]


Related Topics



Leave a reply



Submit