Separate a Column into Multiple Columns Using Tidyr::Separate with Sep=""

tidyr: separate column while retaining delimiter in the first column

You can use tidyr::extract with capture groups.

tidyr::extract(duplicates, sample, c("strain", "sample"), '(.*_)(\\w+)')

# strain sample
#1 a_1_ b1
#2 a1_2_ b1
#3 a1_c_1_ b2

The same regex can also be used with strcapture in base R -

strcapture('(.*_)(\\w+)', duplicates$sample, 
proto = list(strain = character(), sample = character()))

Separate a column into multiple columns using tidyr::separate with sep=

You could do this with extract from tidyr

library(tidyr)
extract(df, sequence, into=paste0('V', 1:5), '(.)(.)(.)(.)(.)')
# category V1 V2 V3 V4 V5
#1 X A A T . G
#2 Y C C G - T

Or create a delimiter with gsub and use that as sep for the separator

library(dplyr)
library(tidyr)
df %>%
mutate(sequence=gsub('(?<=.)(?=.)', ',', sequence, perl=TRUE)) %>%
separate(sequence, into=paste0('V', 1:5), sep=",")
# category V1 V2 V3 V4 V5
#1 X A A T . G
#2 Y C C G - T

Or you can use cSplit

library(splitstackshape)
setnames(cSplit(df, 'sequence', '', stripWhite=FALSE),
2:6, paste0('V', 1:5))[]
# category V1 V2 V3 V4 V5
#1: X A A T . G
#2: Y C C G - T

How to use separate in tidyverse to split a column?

We can use extra argument. Also, by default, the sep is in regex mode - according to ?separate documentation

sep - If character, sep is interpreted as a regular expression. The default value is a regular expression that matches any sequence of non-alphanumeric values.

and . is a metacharacter which can match any character. Therefore, we may need to either escape (\\.) or place it in square brackets ([.]). Also, based on the dput, the column is a list, which should be unnested first before doing the separate

library(dplyr)
library(tidyr)
jimma3 %>%
select(Enterdateofexam2, Enterdayofexam, UniqueKey,MEDICALRECORD)%>%
unnest(Enterdateofexam2) %>%
separate(Enterdateofexam2,into=c("day", "month"),
sep="\\.", convert = TRUE, extra = "merge") %>%
na.omit

-output

# A tibble: 6 x 5
day month Enterdayofexam UniqueKey MEDICALRECORD
<int> <int> <chr> <chr> <chr>
1 7 6 1 530 577207
2 8 6 2 530 577207
3 9 6 3 530 577207
4 2 12 1 531 575333
5 3 12 2 531 575333
6 4 12 3 531 575333

Basically, with sep = ".", it is splitting at every character element and thus the warning popped up

data

jimma3 <- structure(list(Enterdateofexam2 = list(c("", "7.06"), c("", "8.06"
), c("", "9.06"), c("", "2.12"), c("", "3.12"), c("", "4.12")),
Enterdayofexam = c("1", "2", "3", "1", "2", "3"), UniqueKey = c("530",
"530", "530", "531", "531", "531"), MEDICALRECORD = c("577207",
"577207", "577207", "575333", "575333", "575333")), row.names = c(NA,
-6L), class = c("tbl_df", "tbl", "data.frame"))

Split column at delimiter in data frame

@Taesung Shin is right, but then just some more magic to make it into a data.frame.
I added a "x|y" line to avoid ambiguities:

df <- data.frame(ID=11:13, FOO=c('a|b','b|c','x|y'))
foo <- data.frame(do.call('rbind', strsplit(as.character(df$FOO),'|',fixed=TRUE)))

Or, if you want to replace the columns in the existing data.frame:

within(df, FOO<-data.frame(do.call('rbind', strsplit(as.character(FOO), '|', fixed=TRUE))))

Which produces:

  ID FOO.X1 FOO.X2
1 11 a b
2 12 b c
3 13 x y

tidyr: Separate a column into a variable number of columns

You can first get data in long format with separate_rows, then separate into different columns, for each row create a row number column and get data in wide format.

library(dplyr)
library(tidyr)

data %>%
mutate(id = row_number()) %>%
separate_rows(variables, sep = ',') %>%
separate(variables, c('question', 'time'), sep = ':') %>%
group_by(id) %>%
mutate(time = row_number()) %>%
ungroup %>%
pivot_wider(names_from = question,values_from=time, names_prefix = 'pos_') %>%
select(-id)

# A tibble: 3 x 5
# pos_q1 pos_q2 pos_q3 pos_q4 pos_q5
# <int> <int> <int> <int> <int>
#1 1 2 3 4 5
#2 2 1 3 5 4
#3 1 2 NA NA 3

How do I separate a string with different (& repeated) separators into multiple columns?

many good answers, one other variation below

#replace all punctuation with a space then seperate
df %>%
mutate(game=str_replace_all(game,"[:punct:]"," ")) %>%
separate(col = game,into = c("year", "day", "month", "monthday", "site", "team", "decision", "runs1", "runs2"))


Related Topics



Leave a reply



Submit