Separate a Row of Strings into Separate Rows

Split (explode) pandas dataframe string entry to separate rows

How about something like this:

In [55]: pd.concat([Series(row['var2'], row['var1'].split(','))              
for _, row in a.iterrows()]).reset_index()
Out[55]:
index 0
0 a 1
1 b 1
2 c 1
3 d 2
4 e 2
5 f 2

Then you just have to rename the columns

Split delimited strings in a column and insert as new rows

Here is another way of doing it..

df <- read.table(textConnection("1|a,b,c\n2|a,c\n3|b,d\n4|e,f"), header = F, sep = "|", stringsAsFactors = F)

df
## V1 V2
## 1 1 a,b,c
## 2 2 a,c
## 3 3 b,d
## 4 4 e,f

s <- strsplit(df$V2, split = ",")
data.frame(V1 = rep(df$V1, sapply(s, length)), V2 = unlist(s))
## V1 V2
## 1 1 a
## 2 1 b
## 3 1 c
## 4 2 a
## 5 2 c
## 6 3 b
## 7 3 d
## 8 4 e
## 9 4 f

Split pandas dataframe string into separate rows

Try with explode

df=df_input.assign(var2=df_input.var2.str.split('/')).explode('var2')
var1 var2 var3
0 A x abc1
0 A y abc1
0 A z abc1
1 B xx abc2
1 B yy abc2
2 c zz abcd

Then groupby + shift

df.var1=df.groupby(level=0).var2.shift().fillna(df.var1)
df
var1 var2 var3
0 A x abc1
0 x y abc1
0 y z abc1
1 B xx abc2
1 xx yy abc2
2 c zz abcd

Split delimited strings in multiple columns and separate them into rows

We may do this in an easier way if we make the delimiter same

library(dplyr)
library(tidyr)
library(stringr)
to_expand %>%
mutate(first = str_replace(first, "~", "|")) %>%
separate_rows(first, second, sep = "\\|")
# A tibble: 2 x 2
first second
<chr> <chr>
1 a 1~2~3
2 b 4~5~6

Split strings into separate rows excluding some pattern matches

We could do this in base R with strsplit by splitting the 'IV' column at the , while SKIPping the characters inside the parentheses, and then replicate the rows if the data by the lengths of the list created with strsplit

lst1 <-  strsplit(df1$IV, "\\([^)]+(*SKIP)(*FAIL)|,\\s*", perl = TRUE)
df2 <- transform(df1[setdiff(names(df1), "IV")][rep(seq_len(nrow(df1)),
lengths(lst1)),], IV = unlist(lst1))[names(df1)]

-output

> df2
Article.Title Sample IV Moderator Mediator DV
1 Random title Sample information Union voice <NA> <NA> Performance
2 Random title Sample information HRM practices (participation, teams, incentives, development, recruitment) <NA> <NA> Performance
3 Random title Sample information implict contracts <NA> <NA> Performance
4 Random title Sample information Crisis impact <NA> <NA> Performance
5 Random title Sample information dominant individual or family owner <NA> <NA> Performance
6 Random title Sample information no dominant individual or family owner <NA> <NA> Performance
7 Random title Sample information market growth <NA> <NA> Performance
8 Random title Sample information no market growth <NA> <NA> Performance

Or use the same regex in separate_rows (as in the comments)

library(tidyr)
separate_rows(df1, IV, sep = "\\([^)]+(*SKIP)(*FAIL)|,\\s*")

-output

# A tibble: 9 × 6
Article.Title Sample IV Moderator Mediator DV
<chr> <chr> <chr> <chr> <chr> <chr>
1 Random title Sample information "Union voice" <NA> <NA> Performance
2 Random title Sample information "HRM practices (participation, teams, incentives, development, recruitment)" <NA> <NA> Performance
3 Random title Sample information "implict contracts" <NA> <NA> Performance
4 Random title Sample information "Crisis impact" <NA> <NA> Performance
5 Random title Sample information "dominant individual or family owner" <NA> <NA> Performance
6 Random title Sample information "no dominant individual or family owner" <NA> <NA> Performance
7 Random title Sample information "market growth" <NA> <NA> Performance
8 Random title Sample information "no market growth" <NA> <NA> Performance
9 Random title Sample information "" <NA> <NA> Performance

Split pandas dataframe column string with multiple values into separate rows

Here is one way from join + explode then shift

df_input['New']=df_input[['var1','var2']].agg('/'.join,1).str.split('/')
df=df_input.explode('New')
df['New2']=df.groupby(level=0).New.shift(-1)
df=df.dropna(subset=['New2'],axis=0)
df
var1 var2 var3 New New2
0 A/A1 x/y/z abc1 A A1
0 A/A1 x/y/z abc1 A1 x
0 A/A1 x/y/z abc1 x y
0 A/A1 x/y/z abc1 y z
1 B xx/yy abc2 B xx
1 B xx/yy abc2 xx yy
2 c zz abcd c zz

Splitting a string into new rows in R

Try the cSplit function (as you already using @Anandas package). Note that is will return a data.table object, so make sure you have this package installed. You can revert back to data.frame (if you want to) by doing something like setDF(df2)

library(splitstackshape)
df2 <- cSplit(df1, "Item.Code", sep = "/", direction = "long")
df2
# Country Region Molecule Item.Code
# 1: IND NA PB102 FR206985511
# 2: THAI AP PB103 BA-107603
# 3: THAI AP PB103 F000113361
# 4: THAI AP PB103 107603
# 5: LUXE NA PB105 1012701
# 6: LUXE NA PB105 SGP-1012701
# 7: LUXE NA PB105 F041701000
# 8: IND AP PB106 AU206985211
# 9: IND AP PB106 CA-F206985211
# 10: THAI HP PB107 F034702000
# 11: THAI HP PB107 1010701
# 12: THAI HP PB107 SGP-1010701
# 13: BANG NA PB108 F000007970
# 14: BANG NA PB108 25781
# 15: BANG NA PB108 20009021

Split a string in R into rows and columns

We could use separate_rows to split the column created at the space before the digit, then separate into two columns at the first spaces

library(dplyr)
library(tidyr)
tibble(col1 = rows) %>%
separate_rows(col1, sep="\\s+(?=[0-9])") %>%
separate(col1, into = c("Code", "Item"), extra = 'merge')
# A tibble: 4 x 2
# Code Item
# <chr> <chr>
#1 70150 Markers, Times, Places
#2 72588 Times, Places, Things
#3 51256 Items, Shelves, Cats
#4 99201 Widget, Places, Locations

Split one row into multiple rows based on comma-separated string column

Use unnest on the array returned by split.

SELECT a,split_b 
FROM tbl
CROSS JOIN UNNEST(SPLIT(b,',')) AS t (split_b)


Related Topics



Leave a reply



Submit