Split Cell into Multiple Rows in Pandas Dataframe

Split cell into multiple rows in pandas dataframe

Here's one way using numpy.repeat and itertools.chain. Conceptually, this is exactly what you want to do: repeat some values, chain others. Recommended for small numbers of columns, otherwise stack based methods may fare better.

import numpy as np
from itertools import chain

# return list from series of comma-separated strings
def chainer(s):
return list(chain.from_iterable(s.str.split(',')))

# calculate lengths of splits
lens = df['package'].str.split(',').map(len)

# create new dataframe, repeating or chaining as appropriate
res = pd.DataFrame({'order_id': np.repeat(df['order_id'], lens),
'order_date': np.repeat(df['order_date'], lens),
'package': chainer(df['package']),
'package_code': chainer(df['package_code'])})

print(res)

order_id order_date package package_code
0 1 20/5/2018 p1 #111
0 1 20/5/2018 p2 #222
0 1 20/5/2018 p3 #333
1 3 22/5/2018 p4 #444
2 7 23/5/2018 p5 #555
2 7 23/5/2018 p6 #666

Split (explode) pandas dataframe string entry to separate rows

How about something like this:

In [55]: pd.concat([Series(row['var2'], row['var1'].split(','))              
for _, row in a.iterrows()]).reset_index()
Out[55]:
index 0
0 a 1
1 b 1
2 c 1
3 d 2
4 e 2
5 f 2

Then you just have to rename the columns

Split cells into multiple rows and make groupby counts in Pandas

You can split by , with space both columns and then create product of them, last count them:

dt = df[['fruit1','fruit2']].apply(lambda x: x.str.split(', '))

from itertools import product
dt = pd.DataFrame([j for i in dt.to_numpy() for j in product(*i)],
columns=['fruit1','fruit2'])

df = dt.groupby(['fruit1','fruit2']).size().reset_index(name='counts')

print (df)
fruit1 fruit2 counts
0 apple apple 1
1 apple organge 1
2 apple others 1
3 dragon fruit dragon fruit 1
4 dragon fruit organge 1
5 dragon fruit watermelon 1
6 organge apple 1
7 organge dragon fruit 1
8 organge organge 2
9 organge watermelon 2
10 others others 1
11 others watermelon 1
12 watermelon dragon fruit 1
13 watermelon organge 1
14 watermelon watermelon 1

How to split text in a column into multiple rows

This splits the Seatblocks by space and gives each its own row.

In [43]: df
Out[43]:
CustNum CustomerName ItemQty Item Seatblocks ItemExt
0 32363 McCartney, Paul 3 F04 2:218:10:4,6 60
1 31316 Lennon, John 25 F01 1:13:36:1,12 1:13:37:1,13 300

In [44]: s = df['Seatblocks'].str.split(' ').apply(Series, 1).stack()

In [45]: s.index = s.index.droplevel(-1) # to line up with df's index

In [46]: s.name = 'Seatblocks' # needs a name to join

In [47]: s
Out[47]:
0 2:218:10:4,6
1 1:13:36:1,12
1 1:13:37:1,13
Name: Seatblocks, dtype: object

In [48]: del df['Seatblocks']

In [49]: df.join(s)
Out[49]:
CustNum CustomerName ItemQty Item ItemExt Seatblocks
0 32363 McCartney, Paul 3 F04 60 2:218:10:4,6
1 31316 Lennon, John 25 F01 300 1:13:36:1,12
1 31316 Lennon, John 25 F01 300 1:13:37:1,13

Or, to give each colon-separated string in its own column:

In [50]: df.join(s.apply(lambda x: Series(x.split(':'))))
Out[50]:
CustNum CustomerName ItemQty Item ItemExt 0 1 2 3
0 32363 McCartney, Paul 3 F04 60 2 218 10 4,6
1 31316 Lennon, John 25 F01 300 1 13 36 1,12
1 31316 Lennon, John 25 F01 300 1 13 37 1,13

This is a little ugly, but maybe someone will chime in with a prettier solution.

Split cells in one column by comma into multiple rows in Pandas

Another solution is extract column by DataFrame.pop, split, stack for Series and DataFrame.join to original:

s = (df.pop('office_number')
.str.split(',', expand=True)
.stack()
.reset_index(1, drop=True)
.rename('office_number'))

res = df.join(s).reset_index(drop=True)
result = res[['id', 'building_name', 'floor', 'office_number', 'company_name']]

print(result)
id building_name floor office_number company_name
0 1010084420 A 1 101-105 Ariel Resources Ltd.
1 1010084420 A 1 106 A.O. Tatneft
2 1010084420 A 2 201-203
3 1010084420 A 2 205
4 1010084420 A 2 208
5 1010084421 East Tower 10 1001-1005 Agrium Inc.
6 1010084421 East Tower 10 1006 Creo Products Inc.
7 1010084421 East Tower 10 1008 Creo Products Inc.
8 1010084421 East Tower 10 1010 Creo Products Inc.
9 1010084421 West Tower 11 1101-1103 Cott Corp.
10 1010084425 T1 11 1101-1105 Creo Products Inc.


Related Topics



Leave a reply



Submit