Pandas Dataframe to List of Lists

Pandas DataFrame to List of Lists

You could access the underlying array and call its tolist method:

>>> df = pd.DataFrame([[1,2,3],[3,4,5]])
>>> lol = df.values.tolist()
>>> lol
[[1L, 2L, 3L], [3L, 4L, 5L]]

How to convert a Python Dataframe to List of Lists?

Loop through all columns in your dataframe, index that column, and convert it to a list:

lst = [df[i].tolist() for i in df.columns]

Example:

df = pd.DataFrame({'a' : [1, 2, 3, 4],
'b' : [5, 6, 7, 8]})

print(df)
print('list', [df[i].tolist() for i in df.columns])

Output:

   a  b
0  1  5
1  2  6
2  3  7
3  4  8
'list' [[1, 2, 3, 4], [5, 6, 7, 8]]

pandas: convert list of lists to dataframe

The apostrophe means that the data is string type in the list, but can be extracted as the first element using my_list[0]. Need to process each list using list comprehension before putting into the dataframe.

There seems some typo (missing coordinates) in the last line of data, so I corrected it by adding 'null'.

import pandas as pd

data = [['1,er,2,Fado de Padd,1\'18"1,H,6,2600,J. Dekker,17 490 €,A. De Wrede,1,6'],
 ['2,e,7,Elixir Normand,1\'18"2,H,7,2600,S. Schoonhoven,24 755 €,S. Schoonhoven,14'],
 ['3,e,3,Give You All of Me,1\'18"2,H,5,2600,JF. Van Dooyeweerd,17 600 €,JF. Van Dooyeweerd,10'],
 ['4,e,4,Gouritch,1\'18"3,H,5,2600,BJ. Crebas,20 700 €,BJ. Crebas,32'],
 ['5,e,1,Franky du Cap Vert,1\'18"4,H,6,2600,JH. Mieras,15 536 €,N. De Vreede,65'],
 ['6,e,10,Défi Magik,1\'18"0,H,8,2620,F. Verkaik,44 865 €,AW. Bosscha,6,3'],
 ['7,e,9,Fleuron,1\'18"2,H,6,2620,M. Brouwer,44 830 €,D. Brouwer,7,3'],
 ['8,e,8,Dream Gibus,1\'18"6,H,8,2620,R. Ebbinge,33 330 €,Mme A. Lehmann,36'],
 ['9,e,5,Beau Gaillard,1\'19"5,H,10,2600,A. Bakker,20 140 €,N. De Vreede,44'],
 ['0,DAI,6,Bikini de Larcy,null,H,10,2600,D. Den Dubbelden,21 834 €,N. Rip,52']]

df = pd.DataFrame([line[0].split(',') for line in data])
print(df)

Output

   0    1   2                   3       4  5   6     7                   8   \
0  1   er   2        Fado de Padd  1'18"1  H   6  2600           J. Dekker   
1  2    e   7      Elixir Normand  1'18"2  H   7  2600      S. Schoonhoven   
2  3    e   3  Give You All of Me  1'18"2  H   5  2600  JF. Van Dooyeweerd   
3  4    e   4            Gouritch  1'18"3  H   5  2600          BJ. Crebas   
4  5    e   1  Franky du Cap Vert  1'18"4  H   6  2600          JH. Mieras   
5  6    e  10          Défi Magik  1'18"0  H   8  2620          F. Verkaik   
6  7    e   9             Fleuron  1'18"2  H   6  2620          M. Brouwer   
7  8    e   8         Dream Gibus  1'18"6  H   8  2620          R. Ebbinge   
8  9    e   5       Beau Gaillard  1'19"5  H  10  2600           A. Bakker   
9  0  DAI   6     Bikini de Larcy    null  H  10  2600    D. Den Dubbelden   

          9                  10  11    12  
0  17 490 €         A. De Wrede   1     6  
1  24 755 €      S. Schoonhoven  14  None  
2  17 600 €  JF. Van Dooyeweerd  10  None  
3  20 700 €          BJ. Crebas  32  None  
4  15 536 €        N. De Vreede  65  None  
5  44 865 €         AW. Bosscha   6     3  
6  44 830 €          D. Brouwer   7     3  
7  33 330 €      Mme A. Lehmann  36  None  
8  20 140 €        N. De Vreede  44  None  
9  21 834 €              N. Rip  52  None

Second method with the same output:

df = pd.DataFrame(data)[0].str.split(',', expand=True)

Third method with similar output:

from io import StringIO

stringdata = StringIO('\n'.join([line[0] for line in data]))
df = pd.read_csv(stringdata, sep=',', header=None)

However, please note that the first method (list comprehension) is still the most efficient!

How do I extract a list of lists from a Pandas DataFrame?

Try:

print(df.groupby("Person").agg(list)["Movies"].to_list())

Prints:

[['ET'], ['Apollo 13', '12 Angry Men'], ['Citizen Kane']]

Convert pandas df to list of lists with varying length

groupby object is dict, you may use it to avoid agg to speed up more

In [229]: [v.tolist() for v in df.set_index('1').groupby('0').groups.values()]
Out[229]: [[4.3, 3.2, 2.1], [9.1, 2.0], [2.8, 1.7, 0.8, 0.2]]

Timing on 90K rows

df = pd.concat([df] * 10000)

%timeit [v.tolist() for v in df.set_index('1').groupby('0').groups.values()]
15.2 ms ± 425 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit df.groupby('0')['1'].agg(list).tolist()
32.8 ms ± 623 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [236]: %%timeit
     ...: d_tuples = [*list(zip(df['0'],df['1']))]
     ...: keys = df['0'].unique()
     ...: list_of_lists = []
     ...: for key in keys:
     ...:     list_of_lists+=[[tup[1] for tup in d_tuples if tup[0] == key]]
     ...:
69.4 ms ± 754 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

comparing two list of lists with a dataframe column python

Answer

result = []
for l1, l2 in zip(list1, list2):
    res = df.loc[df["rid"].isin(l1) & df["pid"].isin(l2)]["value"].tolist()
    result.append(res)

[['chocolate', 'milk'], ['bread']]

Explain

zip will combine the two lists, equivalent to

for i in range(len(list1)):
    l1 = list1[i]
    l2 = list2[i]

df["rid"].isin(l1) & df["pid"].isin(l2) will combine the condition with and operator &

Attation

The length of list1 and list2 must be equal, otherwise, zip will ignore the rest element of the longer list.

Compare each element in list of lists with a column in a dataframe python

Build a dict:

d = df.set_index('rid').to_dict()['pid']

And use it to build the Dataframe:

pd.DataFrame(((x, [d[el] for el in x]) for x in groups_rids), columns=['groups_rid', 'pid'])

         groups_rid            pid
0        [AX1, AX2]       [P2, P0]
1  [AX6, AX5, AX17]  [P3, P9, P13]

Filtering a pandas dataframe based of list of lists

Use tuples for filtering in both - column and also convert list to tuples:

t = [tuple(x) for x in slist]
df = df[df['path'].apply(lambda x: tuple(eval(str(x).lower()))).isin(t)]

Or:

df = df[df['path'].apply(lambda x: tuple([y.lower() for y in x])).isin(t)]

print (df)
    id                                           path
1  102  [Activities (DEV), public, behavior_trackers]
2  103    [Activities (DEV), public, journal_entries]
4  105          [pg-prd (DEV-RR), public, activities]

Create a DataFrame from list in lists (Pandas)

you could fix this with a for loop

overly_nested = [[['TOTAL DAS DESPESAS DE CUSTEIO DA LAVOURA (A)', 559.64, 8.01, 0.5520765512479038]],
[['TOTAL DAS DESPESAS DE CUSTEIO DA LAVOURA (A)', 520.34, 7.44, 0.5393857093988743]],
[['TOTAL DAS DESPESAS DE CUSTEIO DA LAVOURA (A)', 556.72, 7.96, 0.5410827096899603]],
[['TOTAL DAS DESPESAS DE CUSTEIO DA LAVOURA (A)', 688.67, 9.84, 0.5845350761787548]],
[['TOTAL DAS DESPESAS DE CUSTEIO DA LAVOURA (A)', 625.3, 8.94, 0.5612954767824924]]]

for i, sub_list in enumerate(overly_nested):
    overly_nested[i]=sub_list[0]
df = pd.DataFrame(overly_nested)
print(df)

I'm sure theres a way to do this with zip(), let me experiment and I'll edit if I find it

Pandas Dataframe to List of Lists