Split cell into multiple rows in pandas dataframe
Here's one way using numpy.repeat
and itertools.chain
. Conceptually, this is exactly what you want to do: repeat some values, chain others. Recommended for small numbers of columns, otherwise stack
based methods may fare better.
import numpy as np
from itertools import chain
# return list from series of comma-separated strings
def chainer(s):
return list(chain.from_iterable(s.str.split(',')))
# calculate lengths of splits
lens = df['package'].str.split(',').map(len)
# create new dataframe, repeating or chaining as appropriate
res = pd.DataFrame({'order_id': np.repeat(df['order_id'], lens),
'order_date': np.repeat(df['order_date'], lens),
'package': chainer(df['package']),
'package_code': chainer(df['package_code'])})
print(res)
order_id order_date package package_code
0 1 20/5/2018 p1 #111
0 1 20/5/2018 p2 #222
0 1 20/5/2018 p3 #333
1 3 22/5/2018 p4 #444
2 7 23/5/2018 p5 #555
2 7 23/5/2018 p6 #666
Split (explode) pandas dataframe string entry to separate rows
How about something like this:
In [55]: pd.concat([Series(row['var2'], row['var1'].split(','))
for _, row in a.iterrows()]).reset_index()
Out[55]:
index 0
0 a 1
1 b 1
2 c 1
3 d 2
4 e 2
5 f 2
Then you just have to rename the columns
Split cells into multiple rows and make groupby counts in Pandas
You can split by ,
with space both columns and then create product of them, last count them:
dt = df[['fruit1','fruit2']].apply(lambda x: x.str.split(', '))
from itertools import product
dt = pd.DataFrame([j for i in dt.to_numpy() for j in product(*i)],
columns=['fruit1','fruit2'])
df = dt.groupby(['fruit1','fruit2']).size().reset_index(name='counts')
print (df)
fruit1 fruit2 counts
0 apple apple 1
1 apple organge 1
2 apple others 1
3 dragon fruit dragon fruit 1
4 dragon fruit organge 1
5 dragon fruit watermelon 1
6 organge apple 1
7 organge dragon fruit 1
8 organge organge 2
9 organge watermelon 2
10 others others 1
11 others watermelon 1
12 watermelon dragon fruit 1
13 watermelon organge 1
14 watermelon watermelon 1
How to split text in a column into multiple rows
This splits the Seatblocks by space and gives each its own row.
In [43]: df
Out[43]:
CustNum CustomerName ItemQty Item Seatblocks ItemExt
0 32363 McCartney, Paul 3 F04 2:218:10:4,6 60
1 31316 Lennon, John 25 F01 1:13:36:1,12 1:13:37:1,13 300
In [44]: s = df['Seatblocks'].str.split(' ').apply(Series, 1).stack()
In [45]: s.index = s.index.droplevel(-1) # to line up with df's index
In [46]: s.name = 'Seatblocks' # needs a name to join
In [47]: s
Out[47]:
0 2:218:10:4,6
1 1:13:36:1,12
1 1:13:37:1,13
Name: Seatblocks, dtype: object
In [48]: del df['Seatblocks']
In [49]: df.join(s)
Out[49]:
CustNum CustomerName ItemQty Item ItemExt Seatblocks
0 32363 McCartney, Paul 3 F04 60 2:218:10:4,6
1 31316 Lennon, John 25 F01 300 1:13:36:1,12
1 31316 Lennon, John 25 F01 300 1:13:37:1,13
Or, to give each colon-separated string in its own column:
In [50]: df.join(s.apply(lambda x: Series(x.split(':'))))
Out[50]:
CustNum CustomerName ItemQty Item ItemExt 0 1 2 3
0 32363 McCartney, Paul 3 F04 60 2 218 10 4,6
1 31316 Lennon, John 25 F01 300 1 13 36 1,12
1 31316 Lennon, John 25 F01 300 1 13 37 1,13
This is a little ugly, but maybe someone will chime in with a prettier solution.
Split cells in one column by comma into multiple rows in Pandas
Another solution is extract column by DataFrame.pop
, split
, stack
for Series
and DataFrame.join
to original:
s = (df.pop('office_number')
.str.split(',', expand=True)
.stack()
.reset_index(1, drop=True)
.rename('office_number'))
res = df.join(s).reset_index(drop=True)
result = res[['id', 'building_name', 'floor', 'office_number', 'company_name']]
print(result)
id building_name floor office_number company_name
0 1010084420 A 1 101-105 Ariel Resources Ltd.
1 1010084420 A 1 106 A.O. Tatneft
2 1010084420 A 2 201-203
3 1010084420 A 2 205
4 1010084420 A 2 208
5 1010084421 East Tower 10 1001-1005 Agrium Inc.
6 1010084421 East Tower 10 1006 Creo Products Inc.
7 1010084421 East Tower 10 1008 Creo Products Inc.
8 1010084421 East Tower 10 1010 Creo Products Inc.
9 1010084421 West Tower 11 1101-1103 Cott Corp.
10 1010084425 T1 11 1101-1105 Creo Products Inc.
Related Topics
Convert JSON String to Dict Using Python
How to Sort Two Lists (Which Reference Each Other) in the Exact Same Way
Execute Code When Django Starts Once Only
Replace Console Output in Python
How to Check If a Float Value Is a Whole Number
Pandas: Adding New Column to Dataframe Which Is a Copy of the Index Column
Efficiently Using Multiple Numpy Slices for Random Image Cropping
Download File from Web in Python 3
Elegant Ways to Support Equivalence ("Equality") in Python Classes
Importerror: No Module Named Requests
Main() Function Doesn't Run When Running Script
How to Get User Ip Address in Django
How to Compare Version Numbers in Python
Create Nice Column Output in Python
Django Media_Url and Media_Root
Python Append() VS. + Operator on Lists, Why Do These Give Different Results