Split (explode) pandas dataframe string entry to separate rows
How about something like this:
In [55]: pd.concat([Series(row['var2'], row['var1'].split(','))
for _, row in a.iterrows()]).reset_index()
Out[55]:
index 0
0 a 1
1 b 1
2 c 1
3 d 2
4 e 2
5 f 2
Then you just have to rename the columns
Split delimited strings in a column and insert as new rows
Here is another way of doing it..
df <- read.table(textConnection("1|a,b,c\n2|a,c\n3|b,d\n4|e,f"), header = F, sep = "|", stringsAsFactors = F)
df
## V1 V2
## 1 1 a,b,c
## 2 2 a,c
## 3 3 b,d
## 4 4 e,f
s <- strsplit(df$V2, split = ",")
data.frame(V1 = rep(df$V1, sapply(s, length)), V2 = unlist(s))
## V1 V2
## 1 1 a
## 2 1 b
## 3 1 c
## 4 2 a
## 5 2 c
## 6 3 b
## 7 3 d
## 8 4 e
## 9 4 f
Split pandas dataframe string into separate rows
Try with explode
df=df_input.assign(var2=df_input.var2.str.split('/')).explode('var2')
var1 var2 var3
0 A x abc1
0 A y abc1
0 A z abc1
1 B xx abc2
1 B yy abc2
2 c zz abcd
Then groupby
+ shift
df.var1=df.groupby(level=0).var2.shift().fillna(df.var1)
df
var1 var2 var3
0 A x abc1
0 x y abc1
0 y z abc1
1 B xx abc2
1 xx yy abc2
2 c zz abcd
Split delimited strings in multiple columns and separate them into rows
We may do this in an easier way if we make the delimiter same
library(dplyr)
library(tidyr)
library(stringr)
to_expand %>%
mutate(first = str_replace(first, "~", "|")) %>%
separate_rows(first, second, sep = "\\|")
# A tibble: 2 x 2
first second
<chr> <chr>
1 a 1~2~3
2 b 4~5~6
Split strings into separate rows excluding some pattern matches
We could do this in base R
with strsplit
by splitting the 'IV' column at the ,
while SKIP
ping the characters inside the parentheses, and then rep
licate the rows if the data by the lengths
of the list
created with strsplit
lst1 <- strsplit(df1$IV, "\\([^)]+(*SKIP)(*FAIL)|,\\s*", perl = TRUE)
df2 <- transform(df1[setdiff(names(df1), "IV")][rep(seq_len(nrow(df1)),
lengths(lst1)),], IV = unlist(lst1))[names(df1)]
-output
> df2
Article.Title Sample IV Moderator Mediator DV
1 Random title Sample information Union voice <NA> <NA> Performance
2 Random title Sample information HRM practices (participation, teams, incentives, development, recruitment) <NA> <NA> Performance
3 Random title Sample information implict contracts <NA> <NA> Performance
4 Random title Sample information Crisis impact <NA> <NA> Performance
5 Random title Sample information dominant individual or family owner <NA> <NA> Performance
6 Random title Sample information no dominant individual or family owner <NA> <NA> Performance
7 Random title Sample information market growth <NA> <NA> Performance
8 Random title Sample information no market growth <NA> <NA> Performance
Or use the same regex in separate_rows
(as in the comments)
library(tidyr)
separate_rows(df1, IV, sep = "\\([^)]+(*SKIP)(*FAIL)|,\\s*")
-output
# A tibble: 9 × 6
Article.Title Sample IV Moderator Mediator DV
<chr> <chr> <chr> <chr> <chr> <chr>
1 Random title Sample information "Union voice" <NA> <NA> Performance
2 Random title Sample information "HRM practices (participation, teams, incentives, development, recruitment)" <NA> <NA> Performance
3 Random title Sample information "implict contracts" <NA> <NA> Performance
4 Random title Sample information "Crisis impact" <NA> <NA> Performance
5 Random title Sample information "dominant individual or family owner" <NA> <NA> Performance
6 Random title Sample information "no dominant individual or family owner" <NA> <NA> Performance
7 Random title Sample information "market growth" <NA> <NA> Performance
8 Random title Sample information "no market growth" <NA> <NA> Performance
9 Random title Sample information "" <NA> <NA> Performance
Split pandas dataframe column string with multiple values into separate rows
Here is one way from join
+ explode
then shift
df_input['New']=df_input[['var1','var2']].agg('/'.join,1).str.split('/')
df=df_input.explode('New')
df['New2']=df.groupby(level=0).New.shift(-1)
df=df.dropna(subset=['New2'],axis=0)
df
var1 var2 var3 New New2
0 A/A1 x/y/z abc1 A A1
0 A/A1 x/y/z abc1 A1 x
0 A/A1 x/y/z abc1 x y
0 A/A1 x/y/z abc1 y z
1 B xx/yy abc2 B xx
1 B xx/yy abc2 xx yy
2 c zz abcd c zz
Splitting a string into new rows in R
Try the cSplit
function (as you already using @Anandas package). Note that is will return a data.table
object, so make sure you have this package installed. You can revert back to data.frame
(if you want to) by doing something like setDF(df2)
library(splitstackshape)
df2 <- cSplit(df1, "Item.Code", sep = "/", direction = "long")
df2
# Country Region Molecule Item.Code
# 1: IND NA PB102 FR206985511
# 2: THAI AP PB103 BA-107603
# 3: THAI AP PB103 F000113361
# 4: THAI AP PB103 107603
# 5: LUXE NA PB105 1012701
# 6: LUXE NA PB105 SGP-1012701
# 7: LUXE NA PB105 F041701000
# 8: IND AP PB106 AU206985211
# 9: IND AP PB106 CA-F206985211
# 10: THAI HP PB107 F034702000
# 11: THAI HP PB107 1010701
# 12: THAI HP PB107 SGP-1010701
# 13: BANG NA PB108 F000007970
# 14: BANG NA PB108 25781
# 15: BANG NA PB108 20009021
Split a string in R into rows and columns
We could use separate_rows
to split the column created at the space before the digit, then separate
into two columns at the first spaces
library(dplyr)
library(tidyr)
tibble(col1 = rows) %>%
separate_rows(col1, sep="\\s+(?=[0-9])") %>%
separate(col1, into = c("Code", "Item"), extra = 'merge')
# A tibble: 4 x 2
# Code Item
# <chr> <chr>
#1 70150 Markers, Times, Places
#2 72588 Times, Places, Things
#3 51256 Items, Shelves, Cats
#4 99201 Widget, Places, Locations
Split one row into multiple rows based on comma-separated string column
Use unnest
on the array returned by split
.
SELECT a,split_b
FROM tbl
CROSS JOIN UNNEST(SPLIT(b,',')) AS t (split_b)
Related Topics
Generate Correlated Data in Python (3.3)
How Can One Find the Unicode Codepoints That a Font Has Glyphs For, on a Debian-Based System
Ruby Equivalent to Python's Help()
Python - Use List as Function Parameters
Splitting a String into Words and Punctuation
How to Make a Character Jump in Pygame
Open File in a Relative Location in Python
Floating Point Math in Different Programming Languages
Comparison of R, Statmodels, Sklearn for a Classification Task with Logistic Regression
Python Equivalent of Ruby's .Select
Output Seckeycopyexternalrepresentation
Multiprocessing Example Giving Attributeerror
Handling Backreferences to Capturing Groups in Re.Sub Replacement Pattern