Pandas column of lists, create a row for each list element
UPDATE: the solution below was helpful for older Pandas versions, because the DataFrame.explode() wasn’t available. Starting from Pandas 0.25.0 you can simply use DataFrame.explode()
.
lst_col = 'samples'
r = pd.DataFrame({
col:np.repeat(df[col].values, df[lst_col].str.len())
for col in df.columns.drop(lst_col)}
).assign(**{lst_col:np.concatenate(df[lst_col].values)})[df.columns]
Result:In [103]: r
Out[103]:
samples subject trial_num
0 0.10 1 1
1 -0.20 1 1
2 0.05 1 1
3 0.25 1 2
4 1.32 1 2
5 -0.17 1 2
6 0.64 1 3
7 -0.22 1 3
8 -0.71 1 3
9 -0.03 2 1
10 -0.65 2 1
11 0.76 2 1
12 1.77 2 2
13 0.89 2 2
14 0.65 2 2
15 -0.98 2 3
16 0.65 2 3
17 -0.30 2 3
PS here you may find a bit more generic solutionUPDATE: some explanations: IMO the easiest way to understand this code is to try to execute it step-by-step:
in the following line we are repeating values in one column N
times where N
- is the length of the corresponding list:
In [10]: np.repeat(df['trial_num'].values, df[lst_col].str.len())
Out[10]: array([1, 1, 1, 2, 2, 2, 3, 3, 3, 1, 1, 1, 2, 2, 2, 3, 3, 3], dtype=int64)
this can be generalized for all columns, containing scalar values:In [11]: pd.DataFrame({
...: col:np.repeat(df[col].values, df[lst_col].str.len())
...: for col in df.columns.drop(lst_col)}
...: )
Out[11]:
trial_num subject
0 1 1
1 1 1
2 1 1
3 2 1
4 2 1
5 2 1
6 3 1
.. ... ...
11 1 2
12 2 2
13 2 2
14 2 2
15 3 2
16 3 2
17 3 2
[18 rows x 2 columns]
using np.concatenate()
we can flatten all values in the list
column (samples
) and get a 1D vector:In [12]: np.concatenate(df[lst_col].values)
Out[12]: array([-1.04, -0.58, -1.32, 0.82, -0.59, -0.34, 0.25, 2.09, 0.12, 0.83, -0.88, 0.68, 0.55, -0.56, 0.65, -0.04, 0.36, -0.31])
putting all this together:In [13]: pd.DataFrame({
...: col:np.repeat(df[col].values, df[lst_col].str.len())
...: for col in df.columns.drop(lst_col)}
...: ).assign(**{lst_col:np.concatenate(df[lst_col].values)})
Out[13]:
trial_num subject samples
0 1 1 -1.04
1 1 1 -0.58
2 1 1 -1.32
3 2 1 0.82
4 2 1 -0.59
5 2 1 -0.34
6 3 1 0.25
.. ... ... ...
11 1 2 0.68
12 2 2 0.55
13 2 2 -0.56
14 2 2 0.65
15 3 2 -0.04
16 3 2 0.36
17 3 2 -0.31
[18 rows x 3 columns]
using pd.DataFrame()[df.columns]
will guarantee that we are selecting columns in the original order... Is there any way to split the columns of list to rows in pandas
use explode
:
df = df.explode('Column B')
Another way via list comprehension
:d = {'Column A': {0: 'A', 1: 'B'}, 'Column B': {0: [1, 2, 3], 1: [4, 5, 6]}}
df = pd.DataFrame(d)
df = pd.DataFrame([[x] + [z] for x, y in df.values for z in y],columns=df.columns)
Pandas: list of lists to expanded rows
It seems you need flatten nested list
s:
from collections import Iterable
def flatten(coll):
for i in coll:
if isinstance(i, Iterable) and not isinstance(i, str):
for subc in flatten(i):
yield subc
else:
yield i
d1['column1'] = d1['column1'].apply(lambda x: list(flatten(x)))
print (d1)
column1 column2 column3
0 [ana, bob, 1, 2, 3] 10 44
1 [dona, elf, 4, 5, 6] 20 55
2 [gear, hope, 7, 8, 9] 30 66
And then use your solution:d2 = (pd.DataFrame(d1.column1.tolist())
.stack()
.reset_index(level=1, drop=True)
.rename('column1'))
d1_d2 = (d1.drop('column1', axis=1)
.join(d2)
.reset_index(drop=True)[['column1','column2', 'column3']])
print (d1_d2)
column1 column2 column3
0 ana 10 44
1 bob 10 44
2 1 10 44
3 2 10 44
4 3 10 44
5 dona 20 55
6 elf 20 55
7 4 20 55
8 5 20 55
9 6 20 55
10 gear 30 66
11 hope 30 66
12 7 30 66
13 8 30 66
14 9 30 66
Expand a column of lists into multiple rows in Pandas
Use np.hstack
to stack the lists in column players
horizontally and create a new dataframe :
df1 = pd.DataFrame(np.hstack(df['players']).tolist())
Or use Series.explode
(available in pandas version >= 0.25
),df1 = pd.DataFrame(df['players'].explode().tolist())
Another option using itertools.chain
as suggested by @cs95from itertools import chain
df1 = pd.DataFrame(chain.from_iterable(df['players']))
Result:print(df1)
first last
0 jon McSmith
1 Jennifer Foobar
2 dan Raizman
3 Alden Lowe
Expand lists in a dataframe, but with two columns containing the lists
Use solution for one column for both Series
, concat
together and last join
:
s1 = pd.DataFrame(df.pop('next_d').values.tolist(),
index=df.index).stack().rename('next_d').reset_index(level=2, drop=True)
s2 = pd.DataFrame(df.pop('next_p').values.tolist(),
index=df.index).stack().rename('next_p').reset_index(level=2, drop=True)
df = df.join(pd.concat([s1, s2], axis=1))
print (df)
begin end comp p_n next_d next_p
c_n ml
1 1234 2013-09-02 2014-12-16 comp1 111 20000.0 0.01
1234 2013-09-02 2014-12-16 comp1 111 25000.0 0.01
1234 2013-09-02 2014-12-16 comp1 111 50000.0 0.01
1235 2013-09-02 2014-12-16 comp2 222 25000.0 0.10
1235 2013-09-02 2014-12-16 comp2 222 50000.0 0.10
1235 2013-09-02 2014-12-16 comp2 222 75000.0 0.10
1235 2013-09-02 2014-12-16 comp2 222 100000.0 0.10
2 1236 2013-09-02 2014-12-16 comp3 333 5000.0 0.10
1236 2013-09-02 2014-12-16 comp3 333 10000.0 0.10
1236 2013-09-02 2014-12-16 comp3 333 15000.0 0.10
1236 2013-09-02 2014-12-16 comp3 333 170000.0 0.10
1236 2013-09-02 2014-12-16 comp3 333 25000.0 0.10
1237 2013-09-02 2014-12-16 comp4 444 5000.0 0.01
1237 2013-09-02 2014-12-16 comp4 444 10000.0 0.01
1237 2013-09-02 2014-12-16 comp4 444 25000.0 0.01
1237 2013-09-02 2014-12-16 comp4 444 50000.0 0.01
Expand pandas column list of string values into multiple columns
Assuming this example input:
df = pd.DataFrame({'col1': ['X', 'Y'],
'col2': [['ABC', 'DEF'],['GHI', 'JLK', 'MNO']]})
# col1 col2
# 0 X [ABC, DEF]
# 1 Y [GHI, JLK, MNO]
You could apply(pd.Series)
and add a custom prefix with add_prefix
before doing a join
with the original dataframe:out = (df.drop(columns=['col2'])
.join(df['col2'].apply(pd.Series).add_prefix('col2_'))
.fillna('') # optional
)
output: col1 col2_0 col2_1 col2_2
0 X ABC DEF
1 Y GHI JLK MNO
Pandas expand number of rows
Try reindex
and fill the new nan
rows with 0
:
df.reindex(range(df.index.min(), df.index.max()+1)).fillna(0)
Output: M0 M2 M3
Index
0 121.0 2520.0 -3.0
1 0.0 0.0 0.0
2 0.0 0.0 0.0
3 0.0 0.0 0.0
4 121.0 2521.0 -3.0
5 161.0 2321.0 -2.0
Related Topics
How to Specify Python Version Used to Create Virtual Environment
How to Get Two Random Records with Django
Create Spark Dataframe. Can Not Infer Schema for Type
How to Crop the Internal Area of a Contour
Selenium Webdriver in Python - Files Download Directory Change in Chrome Preferences
How Dangerous Is Setting Self._Class_ to Something Else
Inline CSV File Editing with Python
Cheap Way to Search a Large Text File for a String
Python: Give Start and End of Week Data from a Given Date
How to Read Class Attributes in the Same Order as Declared
Is There an Platform Independent Equivalent of Os.Startfile()
Safely Create a File If and Only If It Does Not Exist with Python
How to Display Utf-8 in Windows Console