Efficient way to unnest (explode) multiple list columns in a pandas DataFrame
pandas >= 0.25Assuming all columns have the same number of lists, you can call Series.explode
on each column.
df.set_index(['A']).apply(pd.Series.explode).reset_index()
A B C D E
0 x1 v1 c1 d1 e1
1 x1 v2 c2 d2 e2
2 x2 v3 c3 d3 e3
3 x2 v4 c4 d4 e4
4 x3 v5 c5 d5 e5
5 x3 v6 c6 d6 e6
6 x4 v7 c7 d7 e7
7 x4 v8 c8 d8 e8
The idea is to set as the index all columns that must NOT be exploded first, then reset the index after.
It's also faster.
%timeit df.set_index(['A']).apply(pd.Series.explode).reset_index()
%%timeit
(df.set_index('A')
.apply(lambda x: x.apply(pd.Series).stack())
.reset_index()
.drop('level_1', 1))
2.22 ms ± 98.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
9.14 ms ± 329 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Pandas explode multiple columns
You could set col1
as index and apply pd.Series.explode
across the columns:
df.set_index('col1').apply(pd.Series.explode).reset_index()
Or:
df.apply(pd.Series.explode)
col1 col2 col3
0 aa 1 1.1
1 aa 2 2.2
2 aa 3 3.3
3 bb 4 4.4
4 bb 5 5.5
5 bb 6 6.6
6 cc 7 7.7
7 cc 8 8.8
8 cc 9 9.9
9 cc 7 7.7
10 cc 8 8.8
11 cc 9 9.9
Explode multiple list columns pairs to more rows in Pandas
You can consider first exploding the dataframe with id
and words
and the dataframe with id
and tags
then you can concat them.
import pandas as pd
df = pd.DataFrame(
{"id":[1,2,3,4],
"words":[['Φ', '20mm'],['Φ', '80mm'], ['EVA'], ['Q345']],
"tags": [['xc', 'PER'], ['xc', 'm'], ['nz'], ['nz']]})
a = df[["id", "words"]].explode("words")
b = df[["id", "tags"]].explode("tags")
pd.concat([a, b], axis=1)
How to unnest (explode) a column in a pandas DataFrame, into multiple rows
I know object
dtype columns makes the data hard to convert with pandas functions. When I receive data like this, the first thing that came to mind was to "flatten" or unnest the columns.
I am using pandas and Python functions for this type of question. If you are worried about the speed of the above solutions, check out user3483203's answer, since it's using numpy and most of the time numpy is faster. I recommend Cython or numba if speed matters.
Method 0 [pandas >= 0.25]
Starting from pandas 0.25, if you only need to explode one column, you can use the pandas.DataFrame.explode
function:
df.explode('B')
A B
0 1 1
1 1 2
0 2 1
1 2 2
Given a dataframe with an empty list
or a NaN
in the column. An empty list will not cause an issue, but a NaN
will need to be filled with a list
df = pd.DataFrame({'A': [1, 2, 3, 4],'B': [[1, 2], [1, 2], [], np.nan]})
df.B = df.B.fillna({i: [] for i in df.index}) # replace NaN with []
df.explode('B')
A B
0 1 1
0 1 2
1 2 1
1 2 2
2 3 NaN
3 4 NaN
Method 1apply + pd.Series
(easy to understand but in terms of performance not recommended . )
df.set_index('A').B.apply(pd.Series).stack().reset_index(level=0).rename(columns={0:'B'})
Out[463]:
A B
0 1 1
1 1 2
0 2 1
1 2 2
Method 2
Using repeat
with DataFrame
constructor , re-create your dataframe (good at performance, not good at multiple columns )
df=pd.DataFrame({'A':df.A.repeat(df.B.str.len()),'B':np.concatenate(df.B.values)})
df
Out[465]:
A B
0 1 1
0 1 2
1 2 1
1 2 2
Method 2.1
for example besides A we have A.1 .....A.n. If we still use the method(Method 2) above it is hard for us to re-create the columns one by one .
Solution : join
or merge
with the index
after 'unnest' the single columns
s=pd.DataFrame({'B':np.concatenate(df.B.values)},index=df.index.repeat(df.B.str.len()))
s.join(df.drop('B',1),how='left')
Out[477]:
B A
0 1 1
0 2 1
1 1 2
1 2 2
If you need the column order exactly the same as before, add reindex
at the end.
s.join(df.drop('B',1),how='left').reindex(columns=df.columns)
Method 3
recreate the list
pd.DataFrame([[x] + [z] for x, y in df.values for z in y],columns=df.columns)
Out[488]:
A B
0 1 1
1 1 2
2 2 1
3 2 2
If more than two columns, use
s=pd.DataFrame([[x] + [z] for x, y in zip(df.index,df.B) for z in y])
s.merge(df,left_on=0,right_index=True)
Out[491]:
0 1 A B
0 0 1 1 [1, 2]
1 0 2 1 [1, 2]
2 1 1 2 [1, 2]
3 1 2 2 [1, 2]
Method 4
using reindex
or loc
df.reindex(df.index.repeat(df.B.str.len())).assign(B=np.concatenate(df.B.values))
Out[554]:
A B
0 1 1
0 1 2
1 2 1
1 2 2
#df.loc[df.index.repeat(df.B.str.len())].assign(B=np.concatenate(df.B.values))
Method 5
when the list only contains unique values:
df=pd.DataFrame({'A':[1,2],'B':[[1,2],[3,4]]})
from collections import ChainMap
d = dict(ChainMap(*map(dict.fromkeys, df['B'], df['A'])))
pd.DataFrame(list(d.items()),columns=df.columns[::-1])
Out[574]:
B A
0 1 1
1 2 1
2 3 2
3 4 2
Method 6
using numpy
for high performance:
newvalues=np.dstack((np.repeat(df.A.values,list(map(len,df.B.values))),np.concatenate(df.B.values)))
pd.DataFrame(data=newvalues[0],columns=df.columns)
A B
0 1 1
1 1 2
2 2 1
3 2 2
Method 7
using base function itertools
cycle
and chain
: Pure python solution just for fun
from itertools import cycle,chain
l=df.values.tolist()
l1=[list(zip([x[0]], cycle(x[1])) if len([x[0]]) > len(x[1]) else list(zip(cycle([x[0]]), x[1]))) for x in l]
pd.DataFrame(list(chain.from_iterable(l1)),columns=df.columns)
A B
0 1 1
1 1 2
2 2 1
3 2 2
Generalizing to multiple columns
df=pd.DataFrame({'A':[1,2],'B':[[1,2],[3,4]],'C':[[1,2],[3,4]]})
df
Out[592]:
A B C
0 1 [1, 2] [1, 2]
1 2 [3, 4] [3, 4]
Self-def function:
def unnesting(df, explode):
idx = df.index.repeat(df[explode[0]].str.len())
df1 = pd.concat([
pd.DataFrame({x: np.concatenate(df[x].values)}) for x in explode], axis=1)
df1.index = idx
return df1.join(df.drop(explode, 1), how='left')
unnesting(df,['B','C'])
Out[609]:
B C A
0 1 1 1
0 2 2 1
1 3 3 2
1 4 4 2
Column-wise Unnesting
All above method is talking about the vertical unnesting and explode , If you do need expend the list horizontal, Check with pd.DataFrame
constructor
df.join(pd.DataFrame(df.B.tolist(),index=df.index).add_prefix('B_'))
Out[33]:
A B C B_0 B_1
0 1 [1, 2] [1, 2] 1 2
1 2 [3, 4] [3, 4] 3 4
Updated function
def unnesting(df, explode, axis):
if axis==1:
idx = df.index.repeat(df[explode[0]].str.len())
df1 = pd.concat([
pd.DataFrame({x: np.concatenate(df[x].values)}) for x in explode], axis=1)
df1.index = idx
return df1.join(df.drop(explode, 1), how='left')
else :
df1 = pd.concat([
pd.DataFrame(df[x].tolist(), index=df.index).add_prefix(x) for x in explode], axis=1)
return df1.join(df.drop(explode, 1), how='left')
Test Output
unnesting(df, ['B','C'], axis=0)
Out[36]:
B0 B1 C0 C1 A
0 1 2 1 2 1
1 3 4 3 4 2
Update 2021-02-17 with original explode function
def unnesting(df, explode, axis):
if axis==1:
df1 = pd.concat([df[x].explode() for x in explode], axis=1)
return df1.join(df.drop(explode, 1), how='left')
else :
df1 = pd.concat([
pd.DataFrame(df[x].tolist(), index=df.index).add_prefix(x) for x in explode], axis=1)
return df1.join(df.drop(explode, 1), how='left')
Partially exploding dataframe with nested lists items
Assuming you really have lists of lists, a simple explode
on all columns should work:
df.explode(df.columns.to_list())
output:
A B C
0 [1, 2] [5, 6] [9, 10]
0 [3, 4] [7, 8] [11, 12]
used input:
df = pd.DataFrame([[[[1,2],[3,4]], [[5,6],[7,8]], [[9,10],[11,12]]]],
columns=['A', 'B', 'C'])
Unpack 2 columns for list in Dataframe to get its corresponding values to rows
Try:
df_test.explode(['Time', 'Values'])
unnest (explode) multiple list 2.0
Just fix your output by adding ffill
df.set_index('Ban').apply(lambda x: x.apply(pd.Series).stack()).groupby(level=0).ffill().reset_index(drop=True)
Out[794]:
Ban App C D E
0 v1 x1 c2 d1 e1
1 v1 x1 c2 d2 e2
2 v2 x2 c3 d3 e3
3 v2 x2 c4 d4 e4
4 v3 x3 c5 d5 e5
5 v3 x3 c6 d6 e6
6 v4 x4 c7 d7 e7
7 v4 x4 c8 d8 e8
Related Topics
Finding Out Who Got the Highest Mark Among the Students
Importing Modules from Parent Folder
Python - How to Pad the Output of a MySQL Table
How to Check If a String Column in Pyspark Dataframe Is All Numeric
How to Find a Word That Starts With a Specific Character
How to Get Maximum Length of Each Column in the Data Frame Using Pandas Python
How to Map True/False to 1/0 in a Pandas Dataframe
How to Run Linux Terminal Command in Python in New Terminal
How to Remove Hashtag, @User, Link of a Tweet Using Regular Expression
Json Dump in Python Writing Newline Character and Carriage Returns in File.
Converting a List into Comma Separated and Add Quotes in Python
Printing the Number of Days in a Given Month and Year [Python]
Use a Loop to Plot N Charts Python
Regular Expression to Check Whitespace in the Beginning and End of a String