Pandas Expand Rows from List Data Available in Column

Pandas column of lists, create a row for each list element

UPDATE: the solution below was helpful for older Pandas versions, because the DataFrame.explode() wasn’t available. Starting from Pandas 0.25.0 you can simply use DataFrame.explode().



lst_col = 'samples'

r = pd.DataFrame({
col:np.repeat(df[col].values, df[lst_col].str.len())
for col in df.columns.drop(lst_col)}
).assign(**{lst_col:np.concatenate(df[lst_col].values)})[df.columns]

Result:

In [103]: r
Out[103]:
samples subject trial_num
0 0.10 1 1
1 -0.20 1 1
2 0.05 1 1
3 0.25 1 2
4 1.32 1 2
5 -0.17 1 2
6 0.64 1 3
7 -0.22 1 3
8 -0.71 1 3
9 -0.03 2 1
10 -0.65 2 1
11 0.76 2 1
12 1.77 2 2
13 0.89 2 2
14 0.65 2 2
15 -0.98 2 3
16 0.65 2 3
17 -0.30 2 3

PS here you may find a bit more generic solution


UPDATE: some explanations: IMO the easiest way to understand this code is to try to execute it step-by-step:

in the following line we are repeating values in one column N times where N - is the length of the corresponding list:

In [10]: np.repeat(df['trial_num'].values, df[lst_col].str.len())
Out[10]: array([1, 1, 1, 2, 2, 2, 3, 3, 3, 1, 1, 1, 2, 2, 2, 3, 3, 3], dtype=int64)

this can be generalized for all columns, containing scalar values:

In [11]: pd.DataFrame({
...: col:np.repeat(df[col].values, df[lst_col].str.len())
...: for col in df.columns.drop(lst_col)}
...: )
Out[11]:
trial_num subject
0 1 1
1 1 1
2 1 1
3 2 1
4 2 1
5 2 1
6 3 1
.. ... ...
11 1 2
12 2 2
13 2 2
14 2 2
15 3 2
16 3 2
17 3 2

[18 rows x 2 columns]

using np.concatenate() we can flatten all values in the list column (samples) and get a 1D vector:

In [12]: np.concatenate(df[lst_col].values)
Out[12]: array([-1.04, -0.58, -1.32, 0.82, -0.59, -0.34, 0.25, 2.09, 0.12, 0.83, -0.88, 0.68, 0.55, -0.56, 0.65, -0.04, 0.36, -0.31])

putting all this together:

In [13]: pd.DataFrame({
...: col:np.repeat(df[col].values, df[lst_col].str.len())
...: for col in df.columns.drop(lst_col)}
...: ).assign(**{lst_col:np.concatenate(df[lst_col].values)})
Out[13]:
trial_num subject samples
0 1 1 -1.04
1 1 1 -0.58
2 1 1 -1.32
3 2 1 0.82
4 2 1 -0.59
5 2 1 -0.34
6 3 1 0.25
.. ... ... ...
11 1 2 0.68
12 2 2 0.55
13 2 2 -0.56
14 2 2 0.65
15 3 2 -0.04
16 3 2 0.36
17 3 2 -0.31

[18 rows x 3 columns]

using pd.DataFrame()[df.columns] will guarantee that we are selecting columns in the original order...

Is there any way to split the columns of list to rows in pandas

use explode:

df = df.explode('Column B')

Another way via list comprehension:

d = {'Column A': {0: 'A', 1: 'B'}, 'Column B': {0: [1, 2, 3], 1: [4, 5, 6]}}
df = pd.DataFrame(d)

df = pd.DataFrame([[x] + [z] for x, y in df.values for z in y],columns=df.columns)

Pandas: list of lists to expanded rows

It seems you need flatten nested lists:

from collections import Iterable

def flatten(coll):
for i in coll:
if isinstance(i, Iterable) and not isinstance(i, str):
for subc in flatten(i):
yield subc
else:
yield i

d1['column1'] = d1['column1'].apply(lambda x: list(flatten(x)))
print (d1)
column1 column2 column3
0 [ana, bob, 1, 2, 3] 10 44
1 [dona, elf, 4, 5, 6] 20 55
2 [gear, hope, 7, 8, 9] 30 66

And then use your solution:

d2 = (pd.DataFrame(d1.column1.tolist())
.stack()
.reset_index(level=1, drop=True)
.rename('column1'))

d1_d2 = (d1.drop('column1', axis=1)
.join(d2)
.reset_index(drop=True)[['column1','column2', 'column3']])

print (d1_d2)
column1 column2 column3
0 ana 10 44
1 bob 10 44
2 1 10 44
3 2 10 44
4 3 10 44
5 dona 20 55
6 elf 20 55
7 4 20 55
8 5 20 55
9 6 20 55
10 gear 30 66
11 hope 30 66
12 7 30 66
13 8 30 66
14 9 30 66

Expand a column of lists into multiple rows in Pandas

Use np.hstack to stack the lists in column players horizontally and create a new dataframe :

df1 = pd.DataFrame(np.hstack(df['players']).tolist())

Or use Series.explode (available in pandas version >= 0.25),

df1 = pd.DataFrame(df['players'].explode().tolist())

Another option using itertools.chain as suggested by @cs95

from itertools import chain

df1 = pd.DataFrame(chain.from_iterable(df['players']))

Result:

print(df1)

first last
0 jon McSmith
1 Jennifer Foobar
2 dan Raizman
3 Alden Lowe

Expand lists in a dataframe, but with two columns containing the lists

Use solution for one column for both Series, concat together and last join:

s1 = pd.DataFrame(df.pop('next_d').values.tolist(), 
index=df.index).stack().rename('next_d').reset_index(level=2, drop=True)
s2 = pd.DataFrame(df.pop('next_p').values.tolist(),
index=df.index).stack().rename('next_p').reset_index(level=2, drop=True)

df = df.join(pd.concat([s1, s2], axis=1))
print (df)
begin end comp p_n next_d next_p
c_n ml
1 1234 2013-09-02 2014-12-16 comp1 111 20000.0 0.01
1234 2013-09-02 2014-12-16 comp1 111 25000.0 0.01
1234 2013-09-02 2014-12-16 comp1 111 50000.0 0.01
1235 2013-09-02 2014-12-16 comp2 222 25000.0 0.10
1235 2013-09-02 2014-12-16 comp2 222 50000.0 0.10
1235 2013-09-02 2014-12-16 comp2 222 75000.0 0.10
1235 2013-09-02 2014-12-16 comp2 222 100000.0 0.10
2 1236 2013-09-02 2014-12-16 comp3 333 5000.0 0.10
1236 2013-09-02 2014-12-16 comp3 333 10000.0 0.10
1236 2013-09-02 2014-12-16 comp3 333 15000.0 0.10
1236 2013-09-02 2014-12-16 comp3 333 170000.0 0.10
1236 2013-09-02 2014-12-16 comp3 333 25000.0 0.10
1237 2013-09-02 2014-12-16 comp4 444 5000.0 0.01
1237 2013-09-02 2014-12-16 comp4 444 10000.0 0.01
1237 2013-09-02 2014-12-16 comp4 444 25000.0 0.01
1237 2013-09-02 2014-12-16 comp4 444 50000.0 0.01

Expand pandas column list of string values into multiple columns

Assuming this example input:

df = pd.DataFrame({'col1': ['X', 'Y'],
'col2': [['ABC', 'DEF'],['GHI', 'JLK', 'MNO']]})

# col1 col2
# 0 X [ABC, DEF]
# 1 Y [GHI, JLK, MNO]

You could apply(pd.Series) and add a custom prefix with add_prefix before doing a join with the original dataframe:

out = (df.drop(columns=['col2'])
.join(df['col2'].apply(pd.Series).add_prefix('col2_'))
.fillna('') # optional
)

output:

  col1 col2_0 col2_1 col2_2
0 X ABC DEF
1 Y GHI JLK MNO

Pandas expand number of rows

Try reindex and fill the new nan rows with 0:

df.reindex(range(df.index.min(), df.index.max()+1)).fillna(0)

Output:

       M0       M2      M3
Index
0 121.0 2520.0 -3.0
1 0.0 0.0 0.0
2 0.0 0.0 0.0
3 0.0 0.0 0.0
4 121.0 2521.0 -3.0
5 161.0 2321.0 -2.0


Related Topics



Leave a reply



Submit