Python convert comma separated list to pandas dataframe
You need to split each string in your list:
import pandas as pd
df = pd.DataFrame([sub.split(",") for sub in l])
print(df)
Output:
0 1 2 3 4 5 6
0 AN 2__AS000 26 20150826113000 -283.000 20150826120000 -283.000
1 AN 2__A000 26 20150826113000 0.000 20150826120000 0.000
2 AN 2__AE000 26 20150826113000 -269.000 20150826120000 -269.000
3 AN 2__AE000 26 20150826113000 -255.000 20150826120000 -255.000
4 AN 2__AE00 26 20150826113000 -254.000 20150826120000 -254.000
If you know how many lines to skip in your csv you can do it all with read_csv using skiprows=lines_of_metadata
:
import pandas as pd
df = pd.read_csv("in.csv",skiprows=3,header=None)
print(df)
Or if each line of the metadata starts with a certain character you can use comment:
df = pd.read_csv("in.csv",header=None,comment="#")
If you need to specify more then one character you can combine itertools.takewhile
which will drop lines starting with xxx
:
import pandas as pd
from itertools import dropwhile
import csv
with open("in.csv") as f:
f = dropwhile(lambda x: x.startswith("#!!"), f)
r = csv.reader(f)
df = pd.DataFrame().from_records(r)
Using your input data adding some lines starting with #!!:
#!! various
#!! metadata
#!! lines
AN,2__AS000,26,20150826113000,-283.000,20150826120000,-283.000
AN,2__A000,26,20150826113000,0.000,20150826120000,0.000
AN,2__AE000,26,20150826113000,-269.000,20150826120000,-269.000
AN,2__AE000,26,20150826113000,-255.000,20150826120000,-255.000
AN,2__AE00,26,20150826113000,-254.000,20150826120000,-254.000
Outputs:
0 1 2 3 4 5 6
0 AN 2__AS000 26 20150826113000 -283.000 20150826120000 -283.000
1 AN 2__A000 26 20150826113000 0.000 20150826120000 0.000
2 AN 2__AE000 26 20150826113000 -269.000 20150826120000 -269.000
3 AN 2__AE000 26 20150826113000 -255.000 20150826120000 -255.000
4 AN 2__AE00 26 20150826113000 -254.000 20150826120000 -254.000
Dataframe: Cell Level: Convert Comma Separated String to List
- Use
pandas.Series.str.split
to split the string into alist
.
# use str split on the column
df.mgrs_grids = df.mgrs_grids.str.split(',')
# display(df)
driver_code journey_code mgrs_grids
0 7211863 7211863-140 [18TWL927129, 18TWL888113, 18TWL888113, 18TWL887113, 18TWL888113, 18TWL887113, 18TWL887113, 18TWL887113, 18TWL903128]
1 7211863 7211863-105 [18TWL927129, 18TWL939112, 18TWL939112, 18TWL939113, 18TWL939113, 18TWL939113, 18TWL939113, 18TWL939113, 18TWL939113, 18TWL960111, 18TWL960112]
2 7211863 7211863-50 [18TWL927129, 18TWL889085, 18TWL889085, 18TWL888085, 18TWL888085, 18TWL888085, 18TWL888085, 18TWL888085, 18TWL890085]
3 7211863 7211863-109 [18TWL927129, 18TWL952106, 18TWL952106, 18TWL952106, 18TWL952106, 18TWL952106, 18TWL952106, 18TWL952106, 18TWL952105, 18TWL951103]
print(type(df.loc[0, 'mgrs_grids']))
[out]:
list
separate row per value- After creating a column of lists.
- Use
pandas.DataFrame.explode
to create separate rows for each value in the list.
# get a separate row for each value
df = df.explode('mgrs_grids').reset_index(drop=True)
# display(df.hea())
driver_code journey_code mgrs_grids
0 7211863 7211863-140 18TWL927129
1 7211863 7211863-140 18TWL888113
2 7211863 7211863-140 18TWL888113
3 7211863 7211863-140 18TWL887113
4 7211863 7211863-140 18TWL888113
Update- Here is another option, which combines the
'journey_code'
to the front of'mgrs_grids'
, and then splits the string into a list.- This list is assigned back to
'mgrs_grids'
, but can also be assigned to a new column.
- This list is assigned back to
# add the journey code to mgrs_grids and then split
df.mgrs_grids = (df.journey_code + ',' + df.mgrs_grids).str.split(',')
# display(df.head())
driver_code journey_code mgrs_grids
0 7211863 7211863-140 [7211863-140, 18TWL927129, 18TWL888113, 18TWL888113, 18TWL887113, 18TWL888113, 18TWL887113, 18TWL887113, 18TWL887113, 18TWL903128]
1 7211863 7211863-105 [7211863-105, 18TWL927129, 18TWL939112, 18TWL939112, 18TWL939113, 18TWL939113, 18TWL939113, 18TWL939113, 18TWL939113, 18TWL939113, 18TWL960111, 18TWL960112]
2 7211863 7211863-50 [7211863-50, 18TWL927129, 18TWL889085, 18TWL889085, 18TWL888085, 18TWL888085, 18TWL888085, 18TWL888085, 18TWL888085, 18TWL890085]
3 7211863 7211863-109 [7211863-109, 18TWL927129, 18TWL952106, 18TWL952106, 18TWL952106, 18TWL952106, 18TWL952106, 18TWL952106, 18TWL952106, 18TWL952105, 18TWL951103]
# output to nested list
df.mgrs_grids.tolist()
[out]:
[['7211863-140', '18TWL927129', '18TWL888113', '18TWL888113', '18TWL887113', '18TWL888113', '18TWL887113', '18TWL887113', '18TWL887113', '18TWL903128'],
['7211863-105', '18TWL927129', '18TWL939112', '18TWL939112', '18TWL939113', '18TWL939113', '18TWL939113', '18TWL939113', '18TWL939113', '18TWL939113', '18TWL960111', '18TWL960112'],
['7211863-50', '18TWL927129', '18TWL889085', '18TWL889085', '18TWL888085', '18TWL888085', '18TWL888085', '18TWL888085', '18TWL888085', '18TWL890085'],
['7211863-109', '18TWL927129', '18TWL952106', '18TWL952106', '18TWL952106', '18TWL952106', '18TWL952106', '18TWL952106', '18TWL952106', '18TWL952105', '18TWL951103']]
Converting string with comma delimited data and newline character to pandas dataframe
This works fine when feeding the string to pandas.read_csv
:
import pandas as pd
from io import StringIO
mystr = StringIO("""2018-06-11 09:31:00,968.250,965.000,968.000,965.250,17220,1160\n2018-06-11 09:32:00,965.250,964.250,965.250,964.750,17872,611\n2018-06-11 09:33:00,965.000,963.250,965.000,963.500,18851,547\n""")
df = pd.read_csv(mystr, index_col=0, header=None)
df.index = pd.to_datetime(df.index)
print(df)
1 2 3 4 5 6
0
2018-06-11 09:31:00 968.25 965.00 968.00 965.25 17220 1160
2018-06-11 09:32:00 965.25 964.25 965.25 964.75 17872 611
2018-06-11 09:33:00 965.00 963.25 965.00 963.50 18851 547
print(df.dtypes)
1 float64
2 float64
3 float64
4 float64
5 int64
6 int64
dtype: object
Convert Comma Seperated String to Pandas Dataframe
You can try regex
import re
import pandas as pd
s = "Key=xxxx, age=11, key=yyyy , age=22,Key=zzzz, age=01, key=qqqq, age=21,Key=wwwww, age=91, key=pppp, age=22"
df = pd.DataFrame(zip(re.findall(r'Key=([^,\s]+)', s, re.IGNORECASE), re.findall(r'age=([^,\s]+)', s, re.IGNORECASE)),
columns=['key', 'age'])
df
key age
0 xxxx 11
1 yyyy 22
2 zzzz 01
3 qqqq 21
4 wwwww 91
5 pppp 22
convert a string delimited by comma into list in pandas
You can use str.split
to split a comma-separated string to list
. You can also use apply(set)
for your specific purposes IIUC:
(df['Col1'].str.split(',').apply(set) - df['Col2'].str.split(',').apply(set)).tolist()
[out]
[{'a', 'b', 'c'}, {'g'}, set()]
Related Topics
How Does \R (Carriage Return) Work in Python
I Want to Reshape 2D Array into 3D Array
Replacing All Negative Values in Certain Columns by Another Value in Pandas
Finding the Max Value in a Two Dimensional Array
Using Regex to Get the Value Between Two Characters (Python 3)
How to Write List Elements into a Tab-Separated File
How to Remove Name and Dtype from Pandas Output
Making a Dictionary from Each Line in a File
How to Save a Pandas Dataframe Table as a Png
Delete Every Non Utf-8 Symbols from String
Calculating the Area Under a Curve Given a Set of Coordinates, Without Knowing the Function
Python How to Use Excelwriter to Write into an Existing Worksheet
Using Selenium in Python to Save a Webpage on Firefox
Use a Loop to Plot N Charts Python
Check If Value from One Dataframe Exists in Another Dataframe