tidyr: separate column while retaining delimiter in the first column
You can use tidyr::extract
with capture groups.
tidyr::extract(duplicates, sample, c("strain", "sample"), '(.*_)(\\w+)')
# strain sample
#1 a_1_ b1
#2 a1_2_ b1
#3 a1_c_1_ b2
The same regex can also be used with strcapture
in base R -
strcapture('(.*_)(\\w+)', duplicates$sample,
proto = list(strain = character(), sample = character()))
Separate a column into multiple columns using tidyr::separate with sep=
You could do this with extract
from tidyr
library(tidyr)
extract(df, sequence, into=paste0('V', 1:5), '(.)(.)(.)(.)(.)')
# category V1 V2 V3 V4 V5
#1 X A A T . G
#2 Y C C G - T
Or create a delimiter with gsub
and use that as sep
for the separator
library(dplyr)
library(tidyr)
df %>%
mutate(sequence=gsub('(?<=.)(?=.)', ',', sequence, perl=TRUE)) %>%
separate(sequence, into=paste0('V', 1:5), sep=",")
# category V1 V2 V3 V4 V5
#1 X A A T . G
#2 Y C C G - T
Or you can use cSplit
library(splitstackshape)
setnames(cSplit(df, 'sequence', '', stripWhite=FALSE),
2:6, paste0('V', 1:5))[]
# category V1 V2 V3 V4 V5
#1: X A A T . G
#2: Y C C G - T
How to use separate in tidyverse to split a column?
We can use extra
argument. Also, by default, the sep
is in regex
mode - according to ?separate
documentation
sep - If character, sep is interpreted as a regular expression. The default value is a regular expression that matches any sequence of non-alphanumeric values.
and .
is a metacharacter which can match any character. Therefore, we may need to either escape (\\.
) or place it in square brackets ([.]
). Also, based on the dput
, the column is a list
, which should be unnest
ed first before doing the separate
library(dplyr)
library(tidyr)
jimma3 %>%
select(Enterdateofexam2, Enterdayofexam, UniqueKey,MEDICALRECORD)%>%
unnest(Enterdateofexam2) %>%
separate(Enterdateofexam2,into=c("day", "month"),
sep="\\.", convert = TRUE, extra = "merge") %>%
na.omit
-output
# A tibble: 6 x 5
day month Enterdayofexam UniqueKey MEDICALRECORD
<int> <int> <chr> <chr> <chr>
1 7 6 1 530 577207
2 8 6 2 530 577207
3 9 6 3 530 577207
4 2 12 1 531 575333
5 3 12 2 531 575333
6 4 12 3 531 575333
Basically, with sep = "."
, it is splitting at every character element and thus the warning popped up
data
jimma3 <- structure(list(Enterdateofexam2 = list(c("", "7.06"), c("", "8.06"
), c("", "9.06"), c("", "2.12"), c("", "3.12"), c("", "4.12")),
Enterdayofexam = c("1", "2", "3", "1", "2", "3"), UniqueKey = c("530",
"530", "530", "531", "531", "531"), MEDICALRECORD = c("577207",
"577207", "577207", "575333", "575333", "575333")), row.names = c(NA,
-6L), class = c("tbl_df", "tbl", "data.frame"))
Split column at delimiter in data frame
@Taesung Shin is right, but then just some more magic to make it into a data.frame
.
I added a "x|y" line to avoid ambiguities:
df <- data.frame(ID=11:13, FOO=c('a|b','b|c','x|y'))
foo <- data.frame(do.call('rbind', strsplit(as.character(df$FOO),'|',fixed=TRUE)))
Or, if you want to replace the columns in the existing data.frame:
within(df, FOO<-data.frame(do.call('rbind', strsplit(as.character(FOO), '|', fixed=TRUE))))
Which produces:
ID FOO.X1 FOO.X2
1 11 a b
2 12 b c
3 13 x y
tidyr: Separate a column into a variable number of columns
You can first get data in long format with separate_rows
, then separate
into different columns, for each row create a row number column and get data in wide format.
library(dplyr)
library(tidyr)
data %>%
mutate(id = row_number()) %>%
separate_rows(variables, sep = ',') %>%
separate(variables, c('question', 'time'), sep = ':') %>%
group_by(id) %>%
mutate(time = row_number()) %>%
ungroup %>%
pivot_wider(names_from = question,values_from=time, names_prefix = 'pos_') %>%
select(-id)
# A tibble: 3 x 5
# pos_q1 pos_q2 pos_q3 pos_q4 pos_q5
# <int> <int> <int> <int> <int>
#1 1 2 3 4 5
#2 2 1 3 5 4
#3 1 2 NA NA 3
How do I separate a string with different (& repeated) separators into multiple columns?
many good answers, one other variation below
#replace all punctuation with a space then seperate
df %>%
mutate(game=str_replace_all(game,"[:punct:]"," ")) %>%
separate(col = game,into = c("year", "day", "month", "monthday", "site", "team", "decision", "runs1", "runs2"))
Related Topics
R: "Make" Not Found When Installing a R-Package from Local Tar.Gz
Creating Sequence of Dates for Each Group in R
Cant Create File Name with Time Stamp
Removing Everything After First 'Backslash' in a String
Combining Rows Based on a Column
How to Extend the 'Summary' Function to Include Sd, Kurtosis and Skew
Change Date Print Format from Yyyy-Mm-Dd to Dd-Mm-Yyyy
Adding Grouped Mean Values to Column in Data Frame
R Shiny: Multiple Use in UI of Same Renderui in Server
Get First Entries in Rows of List
Generating a Date from a String with a 'Month-Year' Format
R Function That Uses Its Output as Its Own Input Repeatedly
Importing Multiple .CSV Files with Variable Column Types into R