Pandas Dataframe Column to List

Get list from pandas dataframe column or row?

Pandas DataFrame columns are Pandas Series when you pull them out, which you can then call x.tolist() on to turn them into a Python list. Alternatively you cast it with list(x).

import pandas as pd

data_dict = {'one': pd.Series([1, 2, 3], index=['a', 'b', 'c']),
'two': pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}

df = pd.DataFrame(data_dict)

print(f"DataFrame:\n{df}\n")
print(f"column types:\n{df.dtypes}")

col_one_list = df['one'].tolist()

col_one_arr = df['one'].to_numpy()

print(f"\ncol_one_list:\n{col_one_list}\ntype:{type(col_one_list)}")
print(f"\ncol_one_arr:\n{col_one_arr}\ntype:{type(col_one_arr)}")

Output:

DataFrame:
one two
a 1.0 1
b 2.0 2
c 3.0 3
d NaN 4

column types:
one float64
two int64
dtype: object

col_one_list:
[1.0, 2.0, 3.0, nan]
type:<class 'list'>

col_one_arr:
[ 1. 2. 3. nan]
type:<class 'numpy.ndarray'>

Convert List to Pandas Dataframe Column

Use:

L = ['Thanks You', 'Its fine no problem', 'Are you sure']

#create new df
df = pd.DataFrame({'col':L})
print (df)

col
0 Thanks You
1 Its fine no problem
2 Are you sure

df = pd.DataFrame({'oldcol':[1,2,3]})

#add column to existing df
df['col'] = L
print (df)
oldcol col
0 1 Thanks You
1 2 Its fine no problem
2 3 Are you sure

Thank you DYZ:

#default column name 0
df = pd.DataFrame(L)
print (df)
0
0 Thanks You
1 Its fine no problem
2 Are you sure

Read lists into columns of pandas DataFrame

Someone just recommended creating a dictionary from the data then loading that into the DataFrame like this:

In [8]: data = pd.DataFrame({'x': x, 'sin(x)': y})
In [9]: data
Out[9]:
x sin(x)
0 0.000000 0.000000e+00
1 0.349066 3.420201e-01
2 0.698132 6.427876e-01
3 1.047198 8.660254e-01
4 1.396263 9.848078e-01
5 1.745329 9.848078e-01
6 2.094395 8.660254e-01
7 2.443461 6.427876e-01
8 2.792527 3.420201e-01
9 3.141593 1.224647e-16

[10 rows x 2 columns]

Note than a dictionary is an unordered set of key-value pairs. If you care about the column orders, you should pass a list of the ordered key values to be used (you can also use this list to only include some of the dict entries):

data = pd.DataFrame({'x': x, 'sin(x)': y}, columns=['x', 'sin(x)'])

Pandas column of lists, create a row for each list element

UPDATE: the solution below was helpful for older Pandas versions, because the DataFrame.explode() wasn’t available. Starting from Pandas 0.25.0 you can simply use DataFrame.explode().



lst_col = 'samples'

r = pd.DataFrame({
col:np.repeat(df[col].values, df[lst_col].str.len())
for col in df.columns.drop(lst_col)}
).assign(**{lst_col:np.concatenate(df[lst_col].values)})[df.columns]

Result:

In [103]: r
Out[103]:
samples subject trial_num
0 0.10 1 1
1 -0.20 1 1
2 0.05 1 1
3 0.25 1 2
4 1.32 1 2
5 -0.17 1 2
6 0.64 1 3
7 -0.22 1 3
8 -0.71 1 3
9 -0.03 2 1
10 -0.65 2 1
11 0.76 2 1
12 1.77 2 2
13 0.89 2 2
14 0.65 2 2
15 -0.98 2 3
16 0.65 2 3
17 -0.30 2 3

PS here you may find a bit more generic solution


UPDATE: some explanations: IMO the easiest way to understand this code is to try to execute it step-by-step:

in the following line we are repeating values in one column N times where N - is the length of the corresponding list:

In [10]: np.repeat(df['trial_num'].values, df[lst_col].str.len())
Out[10]: array([1, 1, 1, 2, 2, 2, 3, 3, 3, 1, 1, 1, 2, 2, 2, 3, 3, 3], dtype=int64)

this can be generalized for all columns, containing scalar values:

In [11]: pd.DataFrame({
...: col:np.repeat(df[col].values, df[lst_col].str.len())
...: for col in df.columns.drop(lst_col)}
...: )
Out[11]:
trial_num subject
0 1 1
1 1 1
2 1 1
3 2 1
4 2 1
5 2 1
6 3 1
.. ... ...
11 1 2
12 2 2
13 2 2
14 2 2
15 3 2
16 3 2
17 3 2

[18 rows x 2 columns]

using np.concatenate() we can flatten all values in the list column (samples) and get a 1D vector:

In [12]: np.concatenate(df[lst_col].values)
Out[12]: array([-1.04, -0.58, -1.32, 0.82, -0.59, -0.34, 0.25, 2.09, 0.12, 0.83, -0.88, 0.68, 0.55, -0.56, 0.65, -0.04, 0.36, -0.31])

putting all this together:

In [13]: pd.DataFrame({
...: col:np.repeat(df[col].values, df[lst_col].str.len())
...: for col in df.columns.drop(lst_col)}
...: ).assign(**{lst_col:np.concatenate(df[lst_col].values)})
Out[13]:
trial_num subject samples
0 1 1 -1.04
1 1 1 -0.58
2 1 1 -1.32
3 2 1 0.82
4 2 1 -0.59
5 2 1 -0.34
6 3 1 0.25
.. ... ... ...
11 1 2 0.68
12 2 2 0.55
13 2 2 -0.56
14 2 2 0.65
15 3 2 -0.04
16 3 2 0.36
17 3 2 -0.31

[18 rows x 3 columns]

using pd.DataFrame()[df.columns] will guarantee that we are selecting columns in the original order...

How to transform dataframe column containing list of values in to its own individual column with count of occurrence?

edited based on feedback from Henry Ecker in comments, might as well have the better answer here:

You can use pd.explode() to get everything within the lists to become separate rows, and then use pd.crosstab() to count the occurrences.

df = presence_data.explode('presence')
pd.crosstab(index=df['id'],columns=df['presence'])

This gave me the following:

presence  A  B  C  G  I
id
id1 2 1 1 0 0
id2 1 2 0 1 1

Get a list from Pandas DataFrame column headers

You can get the values as a list by doing:

list(my_dataframe.columns.values)

Also you can simply use (as shown in Ed Chum's answer):

list(my_dataframe)

Pandas convert column where every cell is list of strings to list of integers

You can try

df['l'] = df['l'].apply(lambda lst: list(map(int, lst)))
print(df)

C1 C2 l
0 1 7 [5, 9, 1]
1 3 1 [7, 1, 6]

Pandas: Convert columns of lists into a single list

You can use a nested list comprehension:

dataSet['combined'] = [[e for l in x for e in l]
for _,x in dataSet.filter(like='value').iterrows()]

Output:

   key                    valueA                    valueB                    valueN                                                                  combined
0 1_1 [1, 2, 3, 4, 5, 6] [1, 2, 3, 4, 5, 6] [1, 2, 3, 4, 5, 6] [1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6]
1 1_2 [7, 8, 9, 10, 11, 12] [7, 8, 9, 10, 11, 12] [7, 8, 9, 10, 11, 12] [7, 8, 9, 10, 11, 12, 7, 8, 9, 10, 11, 12, 7, 8, 9, 10, 11, 12]
2 1_3 [13, 14, 15, 16, 17, 18] [13, 14, 15, 16, 17, 18] [13, 14, 15, 16, 17, 18] [13, 14, 15, 16, 17, 18, 13, 14, 15, 16, 17, 18, 13, 14, 15, 16, 17, 18]

Timing comparison with repeated addition (100 rows, 100 columns, 1000 items per list):

# repeated addition of the lists
8.66 s ± 309 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

# nested list comprehension
729 ms ± 285 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


Related Topics



Leave a reply



Submit