Pandas Dataframe to List of Lists

Pandas DataFrame to List of Lists

You could access the underlying array and call its tolist method:

>>> df = pd.DataFrame([[1,2,3],[3,4,5]])
>>> lol = df.values.tolist()
>>> lol
[[1L, 2L, 3L], [3L, 4L, 5L]]

How to convert a Python Dataframe to List of Lists?

Loop through all columns in your dataframe, index that column, and convert it to a list:

lst = [df[i].tolist() for i in df.columns]

Example:

df = pd.DataFrame({'a' : [1, 2, 3, 4],
'b' : [5, 6, 7, 8]})

print(df)
print('list', [df[i].tolist() for i in df.columns])

Output:

   a  b
0 1 5
1 2 6
2 3 7
3 4 8
'list' [[1, 2, 3, 4], [5, 6, 7, 8]]

pandas: convert list of lists to dataframe

The apostrophe means that the data is string type in the list, but can be extracted as the first element using my_list[0]. Need to process each list using list comprehension before putting into the dataframe.

There seems some typo (missing coordinates) in the last line of data, so I corrected it by adding 'null'.

import pandas as pd

data = [['1,er,2,Fado de Padd,1\'18"1,H,6,2600,J. Dekker,17 490 €,A. De Wrede,1,6'],
['2,e,7,Elixir Normand,1\'18"2,H,7,2600,S. Schoonhoven,24 755 €,S. Schoonhoven,14'],
['3,e,3,Give You All of Me,1\'18"2,H,5,2600,JF. Van Dooyeweerd,17 600 €,JF. Van Dooyeweerd,10'],
['4,e,4,Gouritch,1\'18"3,H,5,2600,BJ. Crebas,20 700 €,BJ. Crebas,32'],
['5,e,1,Franky du Cap Vert,1\'18"4,H,6,2600,JH. Mieras,15 536 €,N. De Vreede,65'],
['6,e,10,Défi Magik,1\'18"0,H,8,2620,F. Verkaik,44 865 €,AW. Bosscha,6,3'],
['7,e,9,Fleuron,1\'18"2,H,6,2620,M. Brouwer,44 830 €,D. Brouwer,7,3'],
['8,e,8,Dream Gibus,1\'18"6,H,8,2620,R. Ebbinge,33 330 €,Mme A. Lehmann,36'],
['9,e,5,Beau Gaillard,1\'19"5,H,10,2600,A. Bakker,20 140 €,N. De Vreede,44'],
['0,DAI,6,Bikini de Larcy,null,H,10,2600,D. Den Dubbelden,21 834 €,N. Rip,52']]

df = pd.DataFrame([line[0].split(',') for line in data])
print(df)

Output

   0    1   2                   3       4  5   6     7                   8   \
0 1 er 2 Fado de Padd 1'18"1 H 6 2600 J. Dekker
1 2 e 7 Elixir Normand 1'18"2 H 7 2600 S. Schoonhoven
2 3 e 3 Give You All of Me 1'18"2 H 5 2600 JF. Van Dooyeweerd
3 4 e 4 Gouritch 1'18"3 H 5 2600 BJ. Crebas
4 5 e 1 Franky du Cap Vert 1'18"4 H 6 2600 JH. Mieras
5 6 e 10 Défi Magik 1'18"0 H 8 2620 F. Verkaik
6 7 e 9 Fleuron 1'18"2 H 6 2620 M. Brouwer
7 8 e 8 Dream Gibus 1'18"6 H 8 2620 R. Ebbinge
8 9 e 5 Beau Gaillard 1'19"5 H 10 2600 A. Bakker
9 0 DAI 6 Bikini de Larcy null H 10 2600 D. Den Dubbelden

9 10 11 12
0 17 490 € A. De Wrede 1 6
1 24 755 € S. Schoonhoven 14 None
2 17 600 € JF. Van Dooyeweerd 10 None
3 20 700 € BJ. Crebas 32 None
4 15 536 € N. De Vreede 65 None
5 44 865 € AW. Bosscha 6 3
6 44 830 € D. Brouwer 7 3
7 33 330 € Mme A. Lehmann 36 None
8 20 140 € N. De Vreede 44 None
9 21 834 € N. Rip 52 None

Second method with the same output:

df = pd.DataFrame(data)[0].str.split(',', expand=True)

Third method with similar output:

from io import StringIO

stringdata = StringIO('\n'.join([line[0] for line in data]))
df = pd.read_csv(stringdata, sep=',', header=None)

However, please note that the first method (list comprehension) is still the most efficient!

How do I extract a list of lists from a Pandas DataFrame?

Try:

print(df.groupby("Person").agg(list)["Movies"].to_list())

Prints:

[['ET'], ['Apollo 13', '12 Angry Men'], ['Citizen Kane']]

Convert pandas df to list of lists with varying length

groupby object is dict, you may use it to avoid agg to speed up more

In [229]: [v.tolist() for v in df.set_index('1').groupby('0').groups.values()]
Out[229]: [[4.3, 3.2, 2.1], [9.1, 2.0], [2.8, 1.7, 0.8, 0.2]]

Timing on 90K rows

df = pd.concat([df] * 10000)

%timeit [v.tolist() for v in df.set_index('1').groupby('0').groups.values()]
15.2 ms ± 425 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit df.groupby('0')['1'].agg(list).tolist()
32.8 ms ± 623 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [236]: %%timeit
...: d_tuples = [*list(zip(df['0'],df['1']))]
...: keys = df['0'].unique()
...: list_of_lists = []
...: for key in keys:
...: list_of_lists+=[[tup[1] for tup in d_tuples if tup[0] == key]]
...:
69.4 ms ± 754 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

comparing two list of lists with a dataframe column python

Answer

result = []
for l1, l2 in zip(list1, list2):
res = df.loc[df["rid"].isin(l1) & df["pid"].isin(l2)]["value"].tolist()
result.append(res)
[['chocolate', 'milk'], ['bread']]

Explain

  • zip will combine the two lists, equivalent to
for i in range(len(list1)):
l1 = list1[i]
l2 = list2[i]
  • df["rid"].isin(l1) & df["pid"].isin(l2) will combine the condition with and operator &

Attation

  • The length of list1 and list2 must be equal, otherwise, zip will ignore the rest element of the longer list.

Compare each element in list of lists with a column in a dataframe python

Build a dict:

d = df.set_index('rid').to_dict()['pid']

And use it to build the Dataframe:

pd.DataFrame(((x, [d[el] for el in x]) for x in groups_rids), columns=['groups_rid', 'pid'])
         groups_rid            pid
0 [AX1, AX2] [P2, P0]
1 [AX6, AX5, AX17] [P3, P9, P13]

Filtering a pandas dataframe based of list of lists

Use tuples for filtering in both - column and also convert list to tuples:

t = [tuple(x) for x in slist]
df = df[df['path'].apply(lambda x: tuple(eval(str(x).lower()))).isin(t)]

Or:

df = df[df['path'].apply(lambda x: tuple([y.lower() for y in x])).isin(t)]


print (df)
id path
1 102 [Activities (DEV), public, behavior_trackers]
2 103 [Activities (DEV), public, journal_entries]
4 105 [pg-prd (DEV-RR), public, activities]

Create a DataFrame from list in lists (Pandas)

you could fix this with a for loop

overly_nested = [[['TOTAL DAS DESPESAS DE CUSTEIO DA LAVOURA (A)', 559.64, 8.01, 0.5520765512479038]],
[['TOTAL DAS DESPESAS DE CUSTEIO DA LAVOURA (A)', 520.34, 7.44, 0.5393857093988743]],
[['TOTAL DAS DESPESAS DE CUSTEIO DA LAVOURA (A)', 556.72, 7.96, 0.5410827096899603]],
[['TOTAL DAS DESPESAS DE CUSTEIO DA LAVOURA (A)', 688.67, 9.84, 0.5845350761787548]],
[['TOTAL DAS DESPESAS DE CUSTEIO DA LAVOURA (A)', 625.3, 8.94, 0.5612954767824924]]]

for i, sub_list in enumerate(overly_nested):
overly_nested[i]=sub_list[0]
df = pd.DataFrame(overly_nested)
print(df)

I'm sure theres a way to do this with zip(), let me experiment and I'll edit if I find it



Related Topics



Leave a reply



Submit