How to Split Text in a Column into Multiple Rows

How to split text in a column into multiple rows

This splits the Seatblocks by space and gives each its own row.

In [43]: df
Out[43]: 
   CustNum     CustomerName  ItemQty Item                 Seatblocks  ItemExt
0    32363  McCartney, Paul        3  F04               2:218:10:4,6       60
1    31316     Lennon, John       25  F01  1:13:36:1,12 1:13:37:1,13      300

In [44]: s = df['Seatblocks'].str.split(' ').apply(Series, 1).stack()

In [45]: s.index = s.index.droplevel(-1) # to line up with df's index

In [46]: s.name = 'Seatblocks' # needs a name to join

In [47]: s
Out[47]: 
0    2:218:10:4,6
1    1:13:36:1,12
1    1:13:37:1,13
Name: Seatblocks, dtype: object

In [48]: del df['Seatblocks']

In [49]: df.join(s)
Out[49]: 
   CustNum     CustomerName  ItemQty Item  ItemExt    Seatblocks
0    32363  McCartney, Paul        3  F04       60  2:218:10:4,6
1    31316     Lennon, John       25  F01      300  1:13:36:1,12
1    31316     Lennon, John       25  F01      300  1:13:37:1,13

Or, to give each colon-separated string in its own column:

In [50]: df.join(s.apply(lambda x: Series(x.split(':'))))
Out[50]: 
   CustNum     CustomerName  ItemQty Item  ItemExt  0    1   2     3
0    32363  McCartney, Paul        3  F04       60  2  218  10   4,6
1    31316     Lennon, John       25  F01      300  1   13  36  1,12
1    31316     Lennon, John       25  F01      300  1   13  37  1,13

This is a little ugly, but maybe someone will chime in with a prettier solution.

Split cell into multiple rows in pandas dataframe

Here's one way using numpy.repeat and itertools.chain. Conceptually, this is exactly what you want to do: repeat some values, chain others. Recommended for small numbers of columns, otherwise stack based methods may fare better.

import numpy as np
from itertools import chain

# return list from series of comma-separated strings
def chainer(s):
    return list(chain.from_iterable(s.str.split(',')))

# calculate lengths of splits
lens = df['package'].str.split(',').map(len)

# create new dataframe, repeating or chaining as appropriate
res = pd.DataFrame({'order_id': np.repeat(df['order_id'], lens),
                    'order_date': np.repeat(df['order_date'], lens),
                    'package': chainer(df['package']),
                    'package_code': chainer(df['package_code'])})

print(res)

   order_id order_date package package_code
0         1  20/5/2018      p1         #111
0         1  20/5/2018      p2         #222
0         1  20/5/2018      p3         #333
1         3  22/5/2018      p4         #444
2         7  23/5/2018      p5         #555
2         7  23/5/2018      p6         #666

Split (explode) pandas dataframe string entry to separate rows

How about something like this:

In [55]: pd.concat([Series(row['var2'], row['var1'].split(','))              
                    for _, row in a.iterrows()]).reset_index()
Out[55]: 
  index  0
0     a  1
1     b  1
2     c  1
3     d  2
4     e  2
5     f  2

Then you just have to rename the columns

Splitting a column into multiple rows

You can first split Code column on comma , then explode it to get the desired output.

df['Code']=df['Code'].str.split(',')
df=df.explode('Code')

OUTPUT:

  ID  A  B  C  D Code
0  1  a  z  s  m   AB
0  1  a  z  s  m   BC
0  1  a  z  s  m    A
1  2  b  x  d  j   AD
1  2  b  x  d  j   KL
2  3  c  y  w  j   AD
2  3  c  y  w  j   KL
3  4  a  x     h   AB
3  4  a  x     h   BC
4  5  b  y  s  m    A
5  6  b  z  s  h    A
6  7  c  x  s  h    B

If needed, you can replace empty string by NaN

How do I split text in a column into multiple rows?

With your shown samples could you please try following.

import pandas as pd
df=pd.DataFrame({'time':['07:2507:3007:57:21:39','07:1817:2517:5521:23','07:2018:35']})
pd.DataFrame(list(df['time'].str.findall(r'\d{2}:\d{2}')))

Output will be as follows:

       0      1      2      3
0  07:25  07:30  07:57  21:39
1  07:18  17:25  17:55  21:23
2  07:20  18:35   None   None

Split delimited strings in multiple columns and separate them into rows

We may do this in an easier way if we make the delimiter same

library(dplyr)
library(tidyr)
library(stringr)
to_expand %>% 
    mutate(first = str_replace(first, "~", "|")) %>% 
    separate_rows(first, second, sep = "\\|")
# A tibble: 2 x 2
  first second
  <chr> <chr> 
1 a     1~2~3 
2 b     4~5~6

Fast way to split column into multiple rows in Pandas

TBH I think we need a fast built-in way of normalizing elements like this.. although since I've been out of the loop for a bit for all I know there is one by now, and I just don't know it. :-) In the meantime I've been using methods like this:

def create(n):
    df = pd.DataFrame({ 'gene':["foo",
                                "bar // lal",
                                "qux",
                                "woz"], 
                        'cell1':[5,9,1,7], 'cell2':[12,90,13,87]})
    df = df[["gene","cell1","cell2"]]
    df = pd.concat([df]*n)
    df = df.reset_index(drop=True)
    return df

def orig(df):
    s = df["gene"].str.split(' // ').apply(pd.Series,1).stack()
    s.index = s.index.droplevel(-1)
    s.name = "Genes"
    del df["gene"]
    return df.join(s)

def faster(df):
    s = df["gene"].str.split(' // ', expand=True).stack()
    i = s.index.get_level_values(0)
    df2 = df.loc[i].copy()
    df2["gene"] = s.values
    return df2

which gives me

>>> df = create(1)
>>> df
         gene  cell1  cell2
0         foo      5     12
1  bar // lal      9     90
2         qux      1     13
3         woz      7     87
>>> %time orig(df.copy())
CPU times: user 12 ms, sys: 0 ns, total: 12 ms
Wall time: 10.2 ms
   cell1  cell2 Genes
0      5     12   foo
1      9     90   bar
1      9     90   lal
2      1     13   qux
3      7     87   woz
>>> %time faster(df.copy())
CPU times: user 16 ms, sys: 0 ns, total: 16 ms
Wall time: 12.4 ms
  gene  cell1  cell2
0  foo      5     12
1  bar      9     90
1  lal      9     90
2  qux      1     13
3  woz      7     87

for comparable speeds at low sizes, and

>>> df = create(10000)
>>> %timeit z = orig(df.copy())
1 loops, best of 3: 14.2 s per loop
>>> %timeit z = faster(df.copy())
1 loops, best of 3: 231 ms per loop

a 60-fold speedup in the larger case. Note that the only reason I'm using df.copy() here is because orig is destructive.

How to split strings in multiple columns into multiple rows

In your situation, how about the following modified script?

Modified script:

const result = range =>
  range.flatMap(([a, b, c, d, e, f, g, ...v]) => {
    const { vv, len } = v.reduce((o, c) => {
      const t = typeof c != "string" ? c.toString().split(",") : c.split(",");
      o.vv.push(t);
      o.len = o.len < t.length ? t.length : o.len;
      return o;
    }, { vv: [], len: 0 });
    const temp = vv.map(e => e.concat(Array(len - e.length).fill("")));
    return temp[0].map((_, i) => [...(i == 0 ? [a, b, c, d] : Array(4).fill("")), e, f, g, ...temp.map(r => isNaN(r[i].trim()) ? r[i].trim() : r[i].trim() && Number(r[i]))]);
  });

In this modification, the columns "A" to "D" are not processed. And, the columns "E" to "G" are copied for every row. And, the columns after "H" split the values.

Result:

When this script is run, the following result is obtained.

Sample Image

Note:

In this script, your sample Spreadsheet is used. So when your actual Spreadsheet is different from your sample one, the script might not be able to be used. Please be careful about this.

References:

map()
reduce()

How to Split Text in a Column into Multiple Rows