Pandas Expand Rows from List Data Available in Column

Pandas column of lists, create a row for each list element

UPDATE: the solution below was helpful for older Pandas versions, because the DataFrame.explode() wasn’t available. Starting from Pandas 0.25.0 you can simply use DataFrame.explode().

lst_col = 'samples'

r = pd.DataFrame({
      col:np.repeat(df[col].values, df[lst_col].str.len())
      for col in df.columns.drop(lst_col)}
    ).assign(**{lst_col:np.concatenate(df[lst_col].values)})[df.columns]

Result:

In [103]: r
Out[103]:
    samples  subject  trial_num
0      0.10        1          1
1     -0.20        1          1
2      0.05        1          1
3      0.25        1          2
4      1.32        1          2
5     -0.17        1          2
6      0.64        1          3
7     -0.22        1          3
8     -0.71        1          3
9     -0.03        2          1
10    -0.65        2          1
11     0.76        2          1
12     1.77        2          2
13     0.89        2          2
14     0.65        2          2
15    -0.98        2          3
16     0.65        2          3
17    -0.30        2          3

PS here you may find a bit more generic solution

UPDATE: some explanations: IMO the easiest way to understand this code is to try to execute it step-by-step:

in the following line we are repeating values in one column N times where N - is the length of the corresponding list:

In [10]: np.repeat(df['trial_num'].values, df[lst_col].str.len())
Out[10]: array([1, 1, 1, 2, 2, 2, 3, 3, 3, 1, 1, 1, 2, 2, 2, 3, 3, 3], dtype=int64)

this can be generalized for all columns, containing scalar values:

In [11]: pd.DataFrame({
    ...:           col:np.repeat(df[col].values, df[lst_col].str.len())
    ...:           for col in df.columns.drop(lst_col)}
    ...:         )
Out[11]:
    trial_num  subject
0           1        1
1           1        1
2           1        1
3           2        1
4           2        1
5           2        1
6           3        1
..        ...      ...
11          1        2
12          2        2
13          2        2
14          2        2
15          3        2
16          3        2
17          3        2

[18 rows x 2 columns]

using np.concatenate() we can flatten all values in the list column (samples) and get a 1D vector:

In [12]: np.concatenate(df[lst_col].values)
Out[12]: array([-1.04, -0.58, -1.32,  0.82, -0.59, -0.34,  0.25,  2.09,  0.12,  0.83, -0.88,  0.68,  0.55, -0.56,  0.65, -0.04,  0.36, -0.31])

putting all this together:

In [13]: pd.DataFrame({
    ...:           col:np.repeat(df[col].values, df[lst_col].str.len())
    ...:           for col in df.columns.drop(lst_col)}
    ...:         ).assign(**{lst_col:np.concatenate(df[lst_col].values)})
Out[13]:
    trial_num  subject  samples
0           1        1    -1.04
1           1        1    -0.58
2           1        1    -1.32
3           2        1     0.82
4           2        1    -0.59
5           2        1    -0.34
6           3        1     0.25
..        ...      ...      ...
11          1        2     0.68
12          2        2     0.55
13          2        2    -0.56
14          2        2     0.65
15          3        2    -0.04
16          3        2     0.36
17          3        2    -0.31

[18 rows x 3 columns]

using pd.DataFrame()[df.columns] will guarantee that we are selecting columns in the original order...

Is there any way to split the columns of list to rows in pandas

use explode:

df = df.explode('Column B')

Another way via list comprehension:

d = {'Column A': {0: 'A', 1: 'B'}, 'Column B': {0: [1, 2, 3], 1: [4, 5, 6]}}
df = pd.DataFrame(d)

df = pd.DataFrame([[x] + [z] for x, y in df.values for z in y],columns=df.columns)

Pandas: list of lists to expanded rows

It seems you need flatten nested lists:

from collections import Iterable

def flatten(coll):
    for i in coll:
            if isinstance(i, Iterable) and not isinstance(i, str):
                for subc in flatten(i):
                    yield subc
            else:
                yield i

d1['column1'] = d1['column1'].apply(lambda x: list(flatten(x)))
print (d1)
                 column1  column2  column3
0    [ana, bob, 1, 2, 3]       10       44
1   [dona, elf, 4, 5, 6]       20       55
2  [gear, hope, 7, 8, 9]       30       66

And then use your solution:

d2 = (pd.DataFrame(d1.column1.tolist())
        .stack()
        .reset_index(level=1, drop=True)
        .rename('column1'))

d1_d2 = (d1.drop('column1', axis=1)
          .join(d2)
          .reset_index(drop=True)[['column1','column2', 'column3']])

print (d1_d2)
   column1  column2  column3
0      ana       10       44
1      bob       10       44
2        1       10       44
3        2       10       44
4        3       10       44
5     dona       20       55
6      elf       20       55
7        4       20       55
8        5       20       55
9        6       20       55
10    gear       30       66
11    hope       30       66
12       7       30       66
13       8       30       66
14       9       30       66

Expand a column of lists into multiple rows in Pandas

Use np.hstack to stack the lists in column players horizontally and create a new dataframe :

df1 = pd.DataFrame(np.hstack(df['players']).tolist())

Or use Series.explode (available in pandas version >= 0.25),

df1 = pd.DataFrame(df['players'].explode().tolist())

Another option using itertools.chain as suggested by @cs95

from itertools import chain

df1 = pd.DataFrame(chain.from_iterable(df['players']))

Result:

print(df1)

      first     last
0       jon  McSmith
1  Jennifer   Foobar
2       dan  Raizman
3     Alden     Lowe

Expand lists in a dataframe, but with two columns containing the lists

Use solution for one column for both Series, concat together and last join:

s1 = pd.DataFrame(df.pop('next_d').values.tolist(), 
                   index=df.index).stack().rename('next_d').reset_index(level=2, drop=True)
s2 = pd.DataFrame(df.pop('next_p').values.tolist(), 
                   index=df.index).stack().rename('next_p').reset_index(level=2, drop=True)

df = df.join(pd.concat([s1, s2], axis=1))
print (df)
               begin         end   comp  p_n    next_d  next_p
c_n ml                                                        
1   1234  2013-09-02  2014-12-16  comp1  111   20000.0    0.01
    1234  2013-09-02  2014-12-16  comp1  111   25000.0    0.01
    1234  2013-09-02  2014-12-16  comp1  111   50000.0    0.01
    1235  2013-09-02  2014-12-16  comp2  222   25000.0    0.10
    1235  2013-09-02  2014-12-16  comp2  222   50000.0    0.10
    1235  2013-09-02  2014-12-16  comp2  222   75000.0    0.10
    1235  2013-09-02  2014-12-16  comp2  222  100000.0    0.10
2   1236  2013-09-02  2014-12-16  comp3  333    5000.0    0.10
    1236  2013-09-02  2014-12-16  comp3  333   10000.0    0.10
    1236  2013-09-02  2014-12-16  comp3  333   15000.0    0.10
    1236  2013-09-02  2014-12-16  comp3  333  170000.0    0.10
    1236  2013-09-02  2014-12-16  comp3  333   25000.0    0.10
    1237  2013-09-02  2014-12-16  comp4  444    5000.0    0.01
    1237  2013-09-02  2014-12-16  comp4  444   10000.0    0.01
    1237  2013-09-02  2014-12-16  comp4  444   25000.0    0.01
    1237  2013-09-02  2014-12-16  comp4  444   50000.0    0.01

Expand pandas column list of string values into multiple columns

Assuming this example input:

df = pd.DataFrame({'col1': ['X', 'Y'],
                   'col2': [['ABC', 'DEF'],['GHI', 'JLK', 'MNO']]})

#   col1             col2
# 0    X       [ABC, DEF]
# 1    Y  [GHI, JLK, MNO]

You could apply(pd.Series) and add a custom prefix with add_prefix before doing a join with the original dataframe:

out = (df.drop(columns=['col2'])
         .join(df['col2'].apply(pd.Series).add_prefix('col2_'))
         .fillna('') # optional
      )

output:

  col1 col2_0 col2_1 col2_2
0    X    ABC    DEF       
1    Y    GHI    JLK    MNO

Pandas expand number of rows

Try reindex and fill the new nan rows with 0:

df.reindex(range(df.index.min(), df.index.max()+1)).fillna(0)

Output:

       M0       M2      M3
Index           
0      121.0    2520.0  -3.0
1      0.0      0.0      0.0
2      0.0      0.0      0.0
3      0.0      0.0      0.0
4      121.0    2521.0  -3.0
5      161.0    2321.0  -2.0