Getting List of Lists into Pandas Dataframe

Getting list of lists into pandas DataFrame

Call the pd.DataFrame constructor directly:

df = pd.DataFrame(table, columns=headers)
df

Heading1 Heading2
0 1 2
1 3 4

How to turn a list of lists into columns of a pandas dataframe?

You can try using df.explode and df.apply:

import pandas as pd

df = pd.DataFrame(data= {'Generation': 0, 'Route_set':[[[20., 19., 47., 56.], [21., 34., 78., 34.]]]})
df['route1']=df['Route_set'].apply(lambda x: x[0])
df['route2']=df['Route_set'].apply(lambda x: x[1])
df = df.explode(['route1', 'route2'], ignore_index=True)
df2 = df[df.columns.difference(['Route_set', 'Generation'])]
|    |   route1 |   route2 |
|---:|---------:|---------:|
| 0 | 20 | 21 |
| 1 | 19 | 34 |
| 2 | 47 | 78 |
| 3 | 56 | 34 |

Or you can just create a new dataframe with the values like this:

import pandas as pd

df = pd.DataFrame(data= {'Generation': 0, 'Route_set':[[[20., 19., 47., 56.], [21., 34., 78., 34.]]]})
df1 = pd.DataFrame.from_dict(dict(zip(['route1', 'route2'], df.Route_set.to_numpy()[0])), orient='index').transpose()
|    |   route1 |   route2 |
|---:|---------:|---------:|
| 0 | 20 | 21 |
| 1 | 19 | 34 |
| 2 | 47 | 78 |
| 3 | 56 | 34 |

Update 1:

import pandas as pd

df = pd.DataFrame(data= {'Generation': 0, 'Route_set':[
[[20.0, 19.0, 47.0, 56.0, 43.0, 53.0, 18.0, -1.0, -1.0, -1.0, -1.0, -1.0], [20.0, 51.0, 46.0, 37.0, 2.0, 57.0, 49.0, 36.0, 25.0, 5.0, 4.0, 34.0], [54.0, 23.0, 5.0, 46.0, 34.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0], [57.0, 48.0, 46.0, 35.0, 25.0, 27.0, 52.0, 8.0, 39.0, 22.0, 51.0, 28.0], [57.0, 16.0, 45.0, 25.0, 49.0, 38.0, 0.0, 46.0, 13.0, 18.0, 19.0, 20.0], [21.0, 11.0, 6.0, 33.0, 25.0, 49.0, 57.0, 29.0, 12.0, 3.0, -1.0, -1.0], [9.0, 15.0, 47.0, 42.0, 25.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0], [51.0, 25.0, 22.0, 14.0, 39.0, 8.0, 40.0, 0.0, 10.0, 26.0, 32.0, 47.0], [1.0, 33.0, 24.0, 46.0, 56.0, 30.0, 48.0, 51.0, -1.0, -1.0, -1.0, -1.0], [25.0, 31.0, 50.0, 17.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0], [57.0, 12.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0], [20.0, 41.0, 47.0, 15.0, 46.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0], [14.0, 44.0, 39.0, 25.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0], [20.0, 51.0, 25.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0], [57.0, 49.0, 5.0, 20.0, 37.0, 46.0, 36.0, 25.0, 39.0, 51.0, 48.0, -1.0], [5.0, 0.0, 33.0, 55.0, 25.0, 48.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0], [51.0, 32.0, 33.0, 24.0, 35.0, 8.0, 25.0, 4.0, 46.0, 1.0, 7.0, -1.0], [5.0, 25.0, 34.0, 46.0, 1.0, 9.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0], [38.0, 57.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0], [12.0, 57.0, 49.0, 25.0, 9.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0]],
]})

data = df.Route_set.to_numpy()[0]

df = pd.DataFrame.from_dict(dict(zip(['route{}'.format(i) for i in range(1, len(data)+1)], [data[i] for i in range(len(data))])), orient='index').transpose()
df = df.apply(lambda x: x.explode() if 'route' in x.name else x)

df[sorted(df.columns)]
print(df.to_markdown())
|    |   route1 |   route2 |   route3 |   route4 |   route5 |   route6 |   route7 |   route8 |   route9 |   route10 |   route11 |   route12 |   route13 |   route14 |   route15 |   route16 |   route17 |   route18 |   route19 |   route20 |
|---:|---------:|---------:|---------:|---------:|---------:|---------:|---------:|---------:|---------:|----------:|----------:|----------:|----------:|----------:|----------:|----------:|----------:|----------:|----------:|----------:|
| 0 | 20 | 20 | 54 | 57 | 57 | 21 | 9 | 51 | 1 | 25 | 57 | 20 | 14 | 20 | 57 | 5 | 51 | 5 | 38 | 12 |
| 1 | 19 | 51 | 23 | 48 | 16 | 11 | 15 | 25 | 33 | 31 | 12 | 41 | 44 | 51 | 49 | 0 | 32 | 25 | 57 | 57 |
| 2 | 47 | 46 | 5 | 46 | 45 | 6 | 47 | 22 | 24 | 50 | -1 | 47 | 39 | 25 | 5 | 33 | 33 | 34 | -1 | 49 |
| 3 | 56 | 37 | 46 | 35 | 25 | 33 | 42 | 14 | 46 | 17 | -1 | 15 | 25 | -1 | 20 | 55 | 24 | 46 | -1 | 25 |
| 4 | 43 | 2 | 34 | 25 | 49 | 25 | 25 | 39 | 56 | -1 | -1 | 46 | -1 | -1 | 37 | 25 | 35 | 1 | -1 | 9 |
| 5 | 53 | 57 | -1 | 27 | 38 | 49 | -1 | 8 | 30 | -1 | -1 | -1 | -1 | -1 | 46 | 48 | 8 | 9 | -1 | -1 |
| 6 | 18 | 49 | -1 | 52 | 0 | 57 | -1 | 40 | 48 | -1 | -1 | -1 | -1 | -1 | 36 | -1 | 25 | -1 | -1 | -1 |
| 7 | -1 | 36 | -1 | 8 | 46 | 29 | -1 | 0 | 51 | -1 | -1 | -1 | -1 | -1 | 25 | -1 | 4 | -1 | -1 | -1 |
| 8 | -1 | 25 | -1 | 39 | 13 | 12 | -1 | 10 | -1 | -1 | -1 | -1 | -1 | -1 | 39 | -1 | 46 | -1 | -1 | -1 |
| 9 | -1 | 5 | -1 | 22 | 18 | 3 | -1 | 26 | -1 | -1 | -1 | -1 | -1 | -1 | 51 | -1 | 1 | -1 | -1 | -1 |
| 10 | -1 | 4 | -1 | 51 | 19 | -1 | -1 | 32 | -1 | -1 | -1 | -1 | -1 | -1 | 48 | -1 | 7 | -1 | -1 | -1 |
| 11 | -1 | 34 | -1 | 28 | 20 | -1 | -1 | 47 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 |

pandas: convert list of lists to dataframe

The apostrophe means that the data is string type in the list, but can be extracted as the first element using my_list[0]. Need to process each list using list comprehension before putting into the dataframe.

There seems some typo (missing coordinates) in the last line of data, so I corrected it by adding 'null'.

import pandas as pd

data = [['1,er,2,Fado de Padd,1\'18"1,H,6,2600,J. Dekker,17 490 €,A. De Wrede,1,6'],
['2,e,7,Elixir Normand,1\'18"2,H,7,2600,S. Schoonhoven,24 755 €,S. Schoonhoven,14'],
['3,e,3,Give You All of Me,1\'18"2,H,5,2600,JF. Van Dooyeweerd,17 600 €,JF. Van Dooyeweerd,10'],
['4,e,4,Gouritch,1\'18"3,H,5,2600,BJ. Crebas,20 700 €,BJ. Crebas,32'],
['5,e,1,Franky du Cap Vert,1\'18"4,H,6,2600,JH. Mieras,15 536 €,N. De Vreede,65'],
['6,e,10,Défi Magik,1\'18"0,H,8,2620,F. Verkaik,44 865 €,AW. Bosscha,6,3'],
['7,e,9,Fleuron,1\'18"2,H,6,2620,M. Brouwer,44 830 €,D. Brouwer,7,3'],
['8,e,8,Dream Gibus,1\'18"6,H,8,2620,R. Ebbinge,33 330 €,Mme A. Lehmann,36'],
['9,e,5,Beau Gaillard,1\'19"5,H,10,2600,A. Bakker,20 140 €,N. De Vreede,44'],
['0,DAI,6,Bikini de Larcy,null,H,10,2600,D. Den Dubbelden,21 834 €,N. Rip,52']]

df = pd.DataFrame([line[0].split(',') for line in data])
print(df)

Output

   0    1   2                   3       4  5   6     7                   8   \
0 1 er 2 Fado de Padd 1'18"1 H 6 2600 J. Dekker
1 2 e 7 Elixir Normand 1'18"2 H 7 2600 S. Schoonhoven
2 3 e 3 Give You All of Me 1'18"2 H 5 2600 JF. Van Dooyeweerd
3 4 e 4 Gouritch 1'18"3 H 5 2600 BJ. Crebas
4 5 e 1 Franky du Cap Vert 1'18"4 H 6 2600 JH. Mieras
5 6 e 10 Défi Magik 1'18"0 H 8 2620 F. Verkaik
6 7 e 9 Fleuron 1'18"2 H 6 2620 M. Brouwer
7 8 e 8 Dream Gibus 1'18"6 H 8 2620 R. Ebbinge
8 9 e 5 Beau Gaillard 1'19"5 H 10 2600 A. Bakker
9 0 DAI 6 Bikini de Larcy null H 10 2600 D. Den Dubbelden

9 10 11 12
0 17 490 € A. De Wrede 1 6
1 24 755 € S. Schoonhoven 14 None
2 17 600 € JF. Van Dooyeweerd 10 None
3 20 700 € BJ. Crebas 32 None
4 15 536 € N. De Vreede 65 None
5 44 865 € AW. Bosscha 6 3
6 44 830 € D. Brouwer 7 3
7 33 330 € Mme A. Lehmann 36 None
8 20 140 € N. De Vreede 44 None
9 21 834 € N. Rip 52 None

Second method with the same output:

df = pd.DataFrame(data)[0].str.split(',', expand=True)

Third method with similar output:

from io import StringIO

stringdata = StringIO('\n'.join([line[0] for line in data]))
df = pd.read_csv(stringdata, sep=',', header=None)

However, please note that the first method (list comprehension) is still the most efficient!

convert list of lists to pandas data frame

You can flatten the list and drop duplicates from your dataframe.

# import toolboxes
import pandas as pd
from itertools import chain

# get data
data = [[{'timestamp': 1648558320942, 'price': 47876.0},
{'timestamp': 1648558320942, 'price': 47876.0}],
[{'timestamp': 1648558321945, 'price': 47881.0},
{'timestamp': 1648558321945, 'price': 47881.0},
{'timestamp': 1648558321945, 'price': 47881.0}],
[{'timestamp': 1648558326768, 'price': 47876.0}]]

# flatten, create df and drop duplicates
a = list(chain.from_iterable(data))
df = pd.DataFrame(a)
df = df.drop_duplicates()

Output:

print(df)
timestamp price
0 1648558320942 47876.0
2 1648558321945 47881.0
5 1648558326768 47876.0

Pandas DataFrame to List of Lists

You could access the underlying array and call its tolist method:

>>> df = pd.DataFrame([[1,2,3],[3,4,5]])
>>> lol = df.values.tolist()
>>> lol
[[1L, 2L, 3L], [3L, 4L, 5L]]

How to convert list of lists into a Pandas dataframe in python

If need only columns pass mylist:

df = pd.DataFrame(mylist,columns=columns)
print (df)

year score_1 score_2 score_3 score_4 score_5
0 2000 0.5 0.3 0.8 0.9 0.8
1 2001 0.5 0.6 0.8 0.9 0.9
2 2002 0.5 0.3 0.8 0.8 0.8
3 2003 0.9 0.9 0.9 0.9 0.8

But if need index by years use dictionary comprehension with DataFrame.from_dict:

df = pd.DataFrame.from_dict({x[0]: x[1:] for x in mylist},columns=columns[1:], orient='index')
print (df)

score_1 score_2 score_3 score_4 score_5
2000 0.5 0.3 0.8 0.9 0.8
2001 0.5 0.6 0.8 0.9 0.9
2002 0.5 0.3 0.8 0.8 0.8
2003 0.9 0.9 0.9 0.9 0.8

And if need set index names add DataFrame.rename_axis:

d = {x[0]: x[1:] for x in mylist}
df = pd.DataFrame.from_dict(d,columns=columns[1:], orient='index').rename_axis(columns[0])
print (df)

score_1 score_2 score_3 score_4 score_5
year
2000 0.5 0.3 0.8 0.9 0.8
2001 0.5 0.6 0.8 0.9 0.9
2002 0.5 0.3 0.8 0.8 0.8
2003 0.9 0.9 0.9 0.9 0.8

How do I extract a list of lists from a Pandas DataFrame?

Try:

print(df.groupby("Person").agg(list)["Movies"].to_list())

Prints:

[['ET'], ['Apollo 13', '12 Angry Men'], ['Citizen Kane']]


Related Topics



Leave a reply



Submit