How to Convert Column with List of Values into Rows in Pandas Dataframe

How to convert column with list of values into rows in Pandas DataFrame

you can do it this way:

In [84]: df
Out[84]:
A B
0 some value [[L1, L2]]
1 another value [[L3, L4, L5]]

In [85]: (df['B'].apply(lambda x: pd.Series(x[0]))
....: .stack()
....: .reset_index(level=1, drop=True)
....: .to_frame('B')
....: .join(df[['A']], how='left')
....: )
Out[85]:
B A
0 L1 some value
0 L2 some value
1 L3 another value
1 L4 another value
1 L5 another value

UPDATE: a more generic solution

Pandas column of lists, create a row for each list element

UPDATE: the solution below was helpful for older Pandas versions, because the DataFrame.explode() wasn’t available. Starting from Pandas 0.25.0 you can simply use DataFrame.explode().



lst_col = 'samples'

r = pd.DataFrame({
col:np.repeat(df[col].values, df[lst_col].str.len())
for col in df.columns.drop(lst_col)}
).assign(**{lst_col:np.concatenate(df[lst_col].values)})[df.columns]

Result:

In [103]: r
Out[103]:
samples subject trial_num
0 0.10 1 1
1 -0.20 1 1
2 0.05 1 1
3 0.25 1 2
4 1.32 1 2
5 -0.17 1 2
6 0.64 1 3
7 -0.22 1 3
8 -0.71 1 3
9 -0.03 2 1
10 -0.65 2 1
11 0.76 2 1
12 1.77 2 2
13 0.89 2 2
14 0.65 2 2
15 -0.98 2 3
16 0.65 2 3
17 -0.30 2 3

PS here you may find a bit more generic solution


UPDATE: some explanations: IMO the easiest way to understand this code is to try to execute it step-by-step:

in the following line we are repeating values in one column N times where N - is the length of the corresponding list:

In [10]: np.repeat(df['trial_num'].values, df[lst_col].str.len())
Out[10]: array([1, 1, 1, 2, 2, 2, 3, 3, 3, 1, 1, 1, 2, 2, 2, 3, 3, 3], dtype=int64)

this can be generalized for all columns, containing scalar values:

In [11]: pd.DataFrame({
...: col:np.repeat(df[col].values, df[lst_col].str.len())
...: for col in df.columns.drop(lst_col)}
...: )
Out[11]:
trial_num subject
0 1 1
1 1 1
2 1 1
3 2 1
4 2 1
5 2 1
6 3 1
.. ... ...
11 1 2
12 2 2
13 2 2
14 2 2
15 3 2
16 3 2
17 3 2

[18 rows x 2 columns]

using np.concatenate() we can flatten all values in the list column (samples) and get a 1D vector:

In [12]: np.concatenate(df[lst_col].values)
Out[12]: array([-1.04, -0.58, -1.32, 0.82, -0.59, -0.34, 0.25, 2.09, 0.12, 0.83, -0.88, 0.68, 0.55, -0.56, 0.65, -0.04, 0.36, -0.31])

putting all this together:

In [13]: pd.DataFrame({
...: col:np.repeat(df[col].values, df[lst_col].str.len())
...: for col in df.columns.drop(lst_col)}
...: ).assign(**{lst_col:np.concatenate(df[lst_col].values)})
Out[13]:
trial_num subject samples
0 1 1 -1.04
1 1 1 -0.58
2 1 1 -1.32
3 2 1 0.82
4 2 1 -0.59
5 2 1 -0.34
6 3 1 0.25
.. ... ... ...
11 1 2 0.68
12 2 2 0.55
13 2 2 -0.56
14 2 2 0.65
15 3 2 -0.04
16 3 2 0.36
17 3 2 -0.31

[18 rows x 3 columns]

using pd.DataFrame()[df.columns] will guarantee that we are selecting columns in the original order...

How to convert dataframe columns with list of values into rows in Pandas DataFrame

You can use explode method chained like this,

df.explode('A').explode('B').explode('C').reset_index(drop=True)

A B C
0 X 1 aa
1 X 1 bb
2 X 1 cc
3 Y 2 xx
4 Y 2 yy

Alternatively, you can apply pd.Series.explode on the dataframe like this,

df.apply(pd.Series.explode).reset_index(drop=True)

In pandas 1.3+ you can use a list of columns to explode on,

So the code will look like,

df.explode(['A', 'B', 'C']).reset_index(drop=True)

converting list like column values into multiple rows using Pandas DataFrame

You can use findall or extractall for get lists from hobbies colum, then flatten with chain.from_iterable and repeat another columns:

a = df['hobbies'].str.findall("'(.*?)'").astype(np.object)
lens = a.str.len()

from itertools import chain

df1 = pd.DataFrame({
'Location_City' : df['Location_City'].values.repeat(lens),
'Location_State' : df['Location_State'].values.repeat(lens),
'Name' : df['Name'].values.repeat(lens),
'hobbies' : list(chain.from_iterable(a.tolist())),
})

Or create Series, remove first level and join to original DataFrame:

df1 = (df.join(df.pop('hobbies').str.extractall("'(.*?)'")[0]
.reset_index(level=1, drop=True)
.rename('hobbies'))
.reset_index(drop=True))

print (df1)

Location_City Location_State Name hobbies
0 Los Angeles CA John Music
1 Los Angeles CA John Running
2 Texas TX Jack Swimming
3 Texas TX Jack Trekking

Get list from pandas dataframe column or row?

Pandas DataFrame columns are Pandas Series when you pull them out, which you can then call x.tolist() on to turn them into a Python list. Alternatively you cast it with list(x).

import pandas as pd

data_dict = {'one': pd.Series([1, 2, 3], index=['a', 'b', 'c']),
'two': pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}

df = pd.DataFrame(data_dict)

print(f"DataFrame:\n{df}\n")
print(f"column types:\n{df.dtypes}")

col_one_list = df['one'].tolist()

col_one_arr = df['one'].to_numpy()

print(f"\ncol_one_list:\n{col_one_list}\ntype:{type(col_one_list)}")
print(f"\ncol_one_arr:\n{col_one_arr}\ntype:{type(col_one_arr)}")

Output:

DataFrame:
one two
a 1.0 1
b 2.0 2
c 3.0 3
d NaN 4

column types:
one float64
two int64
dtype: object

col_one_list:
[1.0, 2.0, 3.0, nan]
type:<class 'list'>

col_one_arr:
[ 1. 2. 3. nan]
type:<class 'numpy.ndarray'>

Pandas: Convert columns of lists into a single list

You can use a nested list comprehension:

dataSet['combined'] = [[e for l in x for e in l]
for _,x in dataSet.filter(like='value').iterrows()]

Output:

   key                    valueA                    valueB                    valueN                                                                  combined
0 1_1 [1, 2, 3, 4, 5, 6] [1, 2, 3, 4, 5, 6] [1, 2, 3, 4, 5, 6] [1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6]
1 1_2 [7, 8, 9, 10, 11, 12] [7, 8, 9, 10, 11, 12] [7, 8, 9, 10, 11, 12] [7, 8, 9, 10, 11, 12, 7, 8, 9, 10, 11, 12, 7, 8, 9, 10, 11, 12]
2 1_3 [13, 14, 15, 16, 17, 18] [13, 14, 15, 16, 17, 18] [13, 14, 15, 16, 17, 18] [13, 14, 15, 16, 17, 18, 13, 14, 15, 16, 17, 18, 13, 14, 15, 16, 17, 18]

Timing comparison with repeated addition (100 rows, 100 columns, 1000 items per list):

# repeated addition of the lists
8.66 s ± 309 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

# nested list comprehension
729 ms ± 285 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Dataframe with a column which every row is a list of values

Here's an alternative approach:

result = pd.concat(
[df[["name", "favourite_fruits"]],
pd.DataFrame(lst for lst in df["votes"]).rename(columns=lambda n: f"vote{n + 1}")],
axis=1
)

How to transform dataframe column containing list of values in to its own individual column with count of occurrence?

edited based on feedback from Henry Ecker in comments, might as well have the better answer here:

You can use pd.explode() to get everything within the lists to become separate rows, and then use pd.crosstab() to count the occurrences.

df = presence_data.explode('presence')
pd.crosstab(index=df['id'],columns=df['presence'])

This gave me the following:

presence  A  B  C  G  I
id
id1 2 1 1 0 0
id2 1 2 0 1 1

Python dataframe : converting columns into rows

If possible number in column names more like 9 use Series.str.extract for get integers with values before to MultiIndex to columns, so possible DataFrame.stack:

df = df.set_index('Movie')
df1 = df.columns.to_series().str.extract('([a-zA-Z]+)(\d+)')
df.columns = pd.MultiIndex.from_arrays([df1[0], df1[1].astype(int)])

df = df.rename_axis((None, None), axis=1).stack().reset_index(level=1, drop=True).reset_index()
print (df)
Movie FirstName ID LastName
0 The Shawshank Redemption Tim TM Robbins
1 The Shawshank Redemption Morgan MF Freeman
2 The Godfather Marlon MB Brando
3 The Godfather Al AP Pacino

If not use indexing for get last values of columns names with all previous and pass to MultiIndex.from_arrays:

df = df.set_index('Movie')
df.columns = pd.MultiIndex.from_arrays([df.columns.str[:-1], df.columns.str[-1].astype(int)])
df = df.stack().reset_index(level=1, drop=True).reset_index()
print (df)
Movie FirstName ID LastName
0 The Shawshank Redemption Tim TM Robbins
1 The Shawshank Redemption Morgan MF Freeman
2 The Godfather Marlon MB Brando
3 The Godfather Al AP Pacino


Related Topics



Leave a reply



Submit