Python Convert Comma Separated List to Pandas Dataframe

Python convert comma separated list to pandas dataframe

You need to split each string in your list:

import  pandas as pd

df = pd.DataFrame([sub.split(",") for sub in l])
print(df)

Output:

   0         1   2               3         4               5         6
0  AN  2__AS000  26  20150826113000  -283.000  20150826120000  -283.000
1  AN   2__A000  26  20150826113000     0.000  20150826120000     0.000
2  AN  2__AE000  26  20150826113000  -269.000  20150826120000  -269.000
3  AN  2__AE000  26  20150826113000  -255.000  20150826120000  -255.000
4  AN   2__AE00  26  20150826113000  -254.000  20150826120000  -254.000

If you know how many lines to skip in your csv you can do it all with read_csv using skiprows=lines_of_metadata:

import  pandas as pd

df = pd.read_csv("in.csv",skiprows=3,header=None)
print(df)

Or if each line of the metadata starts with a certain character you can use comment:

df = pd.read_csv("in.csv",header=None,comment="#")

If you need to specify more then one character you can combine itertools.takewhile which will drop lines starting with xxx:

import pandas as pd
from itertools import dropwhile
import csv
with open("in.csv") as f:
    f = dropwhile(lambda x: x.startswith("#!!"), f)
    r = csv.reader(f)
    df = pd.DataFrame().from_records(r)

Using your input data adding some lines starting with #!!:

#!! various
#!! metadata
#!! lines
AN,2__AS000,26,20150826113000,-283.000,20150826120000,-283.000
AN,2__A000,26,20150826113000,0.000,20150826120000,0.000
AN,2__AE000,26,20150826113000,-269.000,20150826120000,-269.000
AN,2__AE000,26,20150826113000,-255.000,20150826120000,-255.000
AN,2__AE00,26,20150826113000,-254.000,20150826120000,-254.000

Outputs:

    0         1   2               3         4               5         6
0  AN  2__AS000  26  20150826113000  -283.000  20150826120000  -283.000
1  AN   2__A000  26  20150826113000     0.000  20150826120000     0.000
2  AN  2__AE000  26  20150826113000  -269.000  20150826120000  -269.000
3  AN  2__AE000  26  20150826113000  -255.000  20150826120000  -255.000
4  AN   2__AE00  26  20150826113000  -254.000  20150826120000  -254.000

Dataframe: Cell Level: Convert Comma Separated String to List

Use pandas.Series.str.split to split the string into a list.

# use str split on the column
df.mgrs_grids = df.mgrs_grids.str.split(',')

# display(df)
   driver_code journey_code                                                                                                                                       mgrs_grids
0      7211863  7211863-140                            [18TWL927129, 18TWL888113, 18TWL888113, 18TWL887113, 18TWL888113, 18TWL887113, 18TWL887113, 18TWL887113, 18TWL903128]
1      7211863  7211863-105  [18TWL927129, 18TWL939112, 18TWL939112, 18TWL939113, 18TWL939113, 18TWL939113, 18TWL939113, 18TWL939113, 18TWL939113, 18TWL960111, 18TWL960112]
2      7211863   7211863-50                            [18TWL927129, 18TWL889085, 18TWL889085, 18TWL888085, 18TWL888085, 18TWL888085, 18TWL888085, 18TWL888085, 18TWL890085]
3      7211863  7211863-109               [18TWL927129, 18TWL952106, 18TWL952106, 18TWL952106, 18TWL952106, 18TWL952106, 18TWL952106, 18TWL952106, 18TWL952105, 18TWL951103]

print(type(df.loc[0, 'mgrs_grids']))
[out]:
list

separate row per value

After creating a column of lists.
Use pandas.DataFrame.explode to create separate rows for each value in the list.

# get a separate row for each value
df = df.explode('mgrs_grids').reset_index(drop=True)

# display(df.hea())
   driver_code journey_code   mgrs_grids
0      7211863  7211863-140  18TWL927129
1      7211863  7211863-140  18TWL888113
2      7211863  7211863-140  18TWL888113
3      7211863  7211863-140  18TWL887113
4      7211863  7211863-140  18TWL888113

Update

Here is another option, which combines the 'journey_code' to the front of 'mgrs_grids', and then splits the string into a list.
- This list is assigned back to 'mgrs_grids', but can also be assigned to a new column.

# add the journey code to mgrs_grids and then split
df.mgrs_grids = (df.journey_code + ',' + df.mgrs_grids).str.split(',')

# display(df.head())
   driver_code journey_code                                                                                                                                                    mgrs_grids
0      7211863  7211863-140                            [7211863-140, 18TWL927129, 18TWL888113, 18TWL888113, 18TWL887113, 18TWL888113, 18TWL887113, 18TWL887113, 18TWL887113, 18TWL903128]
1      7211863  7211863-105  [7211863-105, 18TWL927129, 18TWL939112, 18TWL939112, 18TWL939113, 18TWL939113, 18TWL939113, 18TWL939113, 18TWL939113, 18TWL939113, 18TWL960111, 18TWL960112]
2      7211863   7211863-50                             [7211863-50, 18TWL927129, 18TWL889085, 18TWL889085, 18TWL888085, 18TWL888085, 18TWL888085, 18TWL888085, 18TWL888085, 18TWL890085]
3      7211863  7211863-109               [7211863-109, 18TWL927129, 18TWL952106, 18TWL952106, 18TWL952106, 18TWL952106, 18TWL952106, 18TWL952106, 18TWL952106, 18TWL952105, 18TWL951103]

# output to nested list
df.mgrs_grids.tolist()

[out]:
[['7211863-140', '18TWL927129', '18TWL888113', '18TWL888113', '18TWL887113', '18TWL888113', '18TWL887113', '18TWL887113', '18TWL887113', '18TWL903128'],
 ['7211863-105', '18TWL927129', '18TWL939112', '18TWL939112', '18TWL939113', '18TWL939113', '18TWL939113', '18TWL939113', '18TWL939113', '18TWL939113', '18TWL960111', '18TWL960112'],
 ['7211863-50', '18TWL927129', '18TWL889085', '18TWL889085', '18TWL888085', '18TWL888085', '18TWL888085', '18TWL888085', '18TWL888085', '18TWL890085'],
 ['7211863-109', '18TWL927129', '18TWL952106', '18TWL952106', '18TWL952106', '18TWL952106', '18TWL952106', '18TWL952106', '18TWL952106', '18TWL952105', '18TWL951103']]

Converting string with comma delimited data and newline character to pandas dataframe

This works fine when feeding the string to pandas.read_csv:

import pandas as pd
from io import StringIO

mystr = StringIO("""2018-06-11 09:31:00,968.250,965.000,968.000,965.250,17220,1160\n2018-06-11 09:32:00,965.250,964.250,965.250,964.750,17872,611\n2018-06-11 09:33:00,965.000,963.250,965.000,963.500,18851,547\n""")

df = pd.read_csv(mystr, index_col=0, header=None)
df.index = pd.to_datetime(df.index)

print(df)

                          1       2       3       4      5     6
0                                                               
2018-06-11 09:31:00  968.25  965.00  968.00  965.25  17220  1160
2018-06-11 09:32:00  965.25  964.25  965.25  964.75  17872   611
2018-06-11 09:33:00  965.00  963.25  965.00  963.50  18851   547

print(df.dtypes)

1    float64
2    float64
3    float64
4    float64
5      int64
6      int64
dtype: object

Convert Comma Seperated String to Pandas Dataframe

You can try regex

import re
import pandas as pd

s = "Key=xxxx, age=11, key=yyyy , age=22,Key=zzzz, age=01, key=qqqq, age=21,Key=wwwww, age=91, key=pppp, age=22"

df = pd.DataFrame(zip(re.findall(r'Key=([^,\s]+)', s, re.IGNORECASE), re.findall(r'age=([^,\s]+)', s, re.IGNORECASE)),
                 columns=['key', 'age'])

df

     key    age
0   xxxx    11
1   yyyy    22
2   zzzz    01
3   qqqq    21
4   wwwww   91
5   pppp    22

convert a string delimited by comma into list in pandas

You can use str.split to split a comma-separated string to list. You can also use apply(set) for your specific purposes IIUC:

(df['Col1'].str.split(',').apply(set) - df['Col2'].str.split(',').apply(set)).tolist()

[out]

[{'a', 'b', 'c'}, {'g'}, set()]

Python Convert Comma Separated List to Pandas Dataframe