Get List from Pandas Dataframe Column or Row

Get list from pandas dataframe column or row?

Pandas DataFrame columns are Pandas Series when you pull them out, which you can then call x.tolist() on to turn them into a Python list. Alternatively you cast it with list(x).

import pandas as pd

data_dict = {'one': pd.Series([1, 2, 3], index=['a', 'b', 'c']),
'two': pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}

df = pd.DataFrame(data_dict)

print(f"DataFrame:\n{df}\n")
print(f"column types:\n{df.dtypes}")

col_one_list = df['one'].tolist()

col_one_arr = df['one'].to_numpy()

print(f"\ncol_one_list:\n{col_one_list}\ntype:{type(col_one_list)}")
print(f"\ncol_one_arr:\n{col_one_arr}\ntype:{type(col_one_arr)}")

Output:

DataFrame:
one two
a 1.0 1
b 2.0 2
c 3.0 3
d NaN 4

column types:
one float64
two int64
dtype: object

col_one_list:
[1.0, 2.0, 3.0, nan]
type:<class 'list'>

col_one_arr:
[ 1. 2. 3. nan]
type:<class 'numpy.ndarray'>

Pandas column tolist() while each row data being list of strings?

Your problem lies with the saving method. CSVs are not natively able to store lists unless you specifically parse them after reading.

Would it be possible for you to save time and effort by saving in another format instead? JSON natively supprots lists and is also a format that can be easily read by humans.

Here is an obligatory snippet for you:

import pandas as pd
df = pd.DataFrame([{"sentence":['aa', 'bb', 'cc']},{"sentence":['dd', 'ee', 'ff']}])

df.to_json("myfile.json")
df2 = pd.read_json("myfile.json")

Giving the following result:

>>> df2
sentence
0 [aa, bb, cc]
1 [dd, ee, ff]

Pandas column of lists, create a row for each list element

UPDATE: the solution below was helpful for older Pandas versions, because the DataFrame.explode() wasn’t available. Starting from Pandas 0.25.0 you can simply use DataFrame.explode().



lst_col = 'samples'

r = pd.DataFrame({
col:np.repeat(df[col].values, df[lst_col].str.len())
for col in df.columns.drop(lst_col)}
).assign(**{lst_col:np.concatenate(df[lst_col].values)})[df.columns]

Result:

In [103]: r
Out[103]:
samples subject trial_num
0 0.10 1 1
1 -0.20 1 1
2 0.05 1 1
3 0.25 1 2
4 1.32 1 2
5 -0.17 1 2
6 0.64 1 3
7 -0.22 1 3
8 -0.71 1 3
9 -0.03 2 1
10 -0.65 2 1
11 0.76 2 1
12 1.77 2 2
13 0.89 2 2
14 0.65 2 2
15 -0.98 2 3
16 0.65 2 3
17 -0.30 2 3

PS here you may find a bit more generic solution


UPDATE: some explanations: IMO the easiest way to understand this code is to try to execute it step-by-step:

in the following line we are repeating values in one column N times where N - is the length of the corresponding list:

In [10]: np.repeat(df['trial_num'].values, df[lst_col].str.len())
Out[10]: array([1, 1, 1, 2, 2, 2, 3, 3, 3, 1, 1, 1, 2, 2, 2, 3, 3, 3], dtype=int64)

this can be generalized for all columns, containing scalar values:

In [11]: pd.DataFrame({
...: col:np.repeat(df[col].values, df[lst_col].str.len())
...: for col in df.columns.drop(lst_col)}
...: )
Out[11]:
trial_num subject
0 1 1
1 1 1
2 1 1
3 2 1
4 2 1
5 2 1
6 3 1
.. ... ...
11 1 2
12 2 2
13 2 2
14 2 2
15 3 2
16 3 2
17 3 2

[18 rows x 2 columns]

using np.concatenate() we can flatten all values in the list column (samples) and get a 1D vector:

In [12]: np.concatenate(df[lst_col].values)
Out[12]: array([-1.04, -0.58, -1.32, 0.82, -0.59, -0.34, 0.25, 2.09, 0.12, 0.83, -0.88, 0.68, 0.55, -0.56, 0.65, -0.04, 0.36, -0.31])

putting all this together:

In [13]: pd.DataFrame({
...: col:np.repeat(df[col].values, df[lst_col].str.len())
...: for col in df.columns.drop(lst_col)}
...: ).assign(**{lst_col:np.concatenate(df[lst_col].values)})
Out[13]:
trial_num subject samples
0 1 1 -1.04
1 1 1 -0.58
2 1 1 -1.32
3 2 1 0.82
4 2 1 -0.59
5 2 1 -0.34
6 3 1 0.25
.. ... ... ...
11 1 2 0.68
12 2 2 0.55
13 2 2 -0.56
14 2 2 0.65
15 3 2 -0.04
16 3 2 0.36
17 3 2 -0.31

[18 rows x 3 columns]

using pd.DataFrame()[df.columns] will guarantee that we are selecting columns in the original order...

Python pandas dataframe to list by column instead of row

Just use Transpose(T) attribute:

lst=df.T.values.tolist()

OR

use transpose() method:

lst=df.transpose().values.tolist()

If you print lst you will get:

[['Apple', 'Orange', 'Kiwi', 'Mango'], [220, 200, 1000, 800], ['a', 'o', 'k', 'm']]

Get a list from Pandas DataFrame column headers

You can get the values as a list by doing:

list(my_dataframe.columns.values)

Also you can simply use (as shown in Ed Chum's answer):

list(my_dataframe)

Lookup a Dataframe column with list and return list matching the row of another column?

You could make Col1 the index and use .loc:

q = df.set_index('Col1').loc[query]

Output:

>>> q
Col2 Col3
Col1
A 5 9
D 8 12
D 8 12
B 6 10
B 6 1

>>> q['Col2'].tolist()
[5, 8, 8, 6, 6]

>>> q['Col3'].tolist()
[9, 12, 12, 10, 10]

Dataframe with a column which every row is a list of values

Here's an alternative approach:

result = pd.concat(
[df[["name", "favourite_fruits"]],
pd.DataFrame(lst for lst in df["votes"]).rename(columns=lambda n: f"vote{n + 1}")],
axis=1
)

Find rows in dataframe that must contain at least 2 elements from a list

You can use set operations:

S = set(list1)

out = df[[len(set(l.split())&S)>=2 for l in df['A']]]

# or
# out = df[[len(S.intersection(l.split()))>=2 for l in df['A']]]

Output:

                 A      B
2 brit dave red terri


Related Topics



Leave a reply



Submit