Tidyr Separate Only First N Instances

tidyr separate only first n instances

You need the extra argument with the "merge" option. This allows only as many splits as you have new columns defined.

separate(df, V1, c("V1", "V2", "V3", "V4"), extra = "merge")

V1 V2 V3 V4
1 Value is the best_one
2 This is the prettiest_thing_I've_ever_seen
3 Here is the next_example_of_what_I_want

tidyr separate only last n instances

When screening the already answered similar questions, I discovered tidyr::extract in this answer, which can be used to do the job:

 tmp2 %>% extract(
"varTreatName", c("varName","treatment","canopyPosition")
, regex = "(.*)_([^_]+)_([^_]+)$")

yielding the expected result:

  varName treatment canopyPosition
1 resp Nadd belowCanopy
2 resp NPadd belowCanopy
3 resp_sd Nadd belowCanopy
4 resp_sd NPadd belowCanopy

Applying tidyr separate only to specific rows

Another approach:

cols_to_split = c('here_do')

clean_df <-df %>%
filter(text %in% cols_to_split) %>%
tidyr::separate(text,into=c("first","sec"),sep="_",remove=F) %>%
bind_rows(filter(df, !text %in% cols_to_split))

# var_a var_b text first sec
#1 b 7 here_do here do
#2 a 26 foo_bla <NA> <NA>
#3 c 23 oh_yes <NA> <NA>
#4 d 2 baa <NA> <NA>
#5 e 67 land <NA> <NA>

If you need to keep rest of the rows in column 'first', you may use:

clean_df <-df %>% 
filter(text %in% cols_to_split) %>%
tidyr::separate(text,into=c("first","sec"),sep="_",remove=F) %>%
bind_rows(filter(df, !text %in% cols_to_split)) %>%
mutate(first = ifelse(is.na(first), as.character(text), first))

# var_a var_b text first sec
#1 b 7 here_do here do
#2 a 26 foo_bla foo_bla <NA>
#3 c 23 oh_yes oh_yes <NA>
#4 d 2 baa baa <NA>
#5 e 67 land land <NA>

Specify separator character in separate function from package tidyr

Here is a way to solve the problem.

d %>% separate(var, into = c("newcol1", "newcol2"), sep = "_(?=.*_)")

Here, the regex _(?=.*_) means: _ followed by a string including another _.

The result:

# A tibble: 5 x 2
newcol1 newcol2
<chr> <chr>
1 A 1_a
2 B 2_b
3 C 3_c
4 D 4_d
5 E 5_e

Using regex and tidyr in R to split column variable on first instance of match

You need to specify the extra parameter to be merge:

library(tidyr)
df %>% separate(date, c("day", "date"), extra = "merge")

# game day date
#1 1 Monday Apr 3
#2 2 Tuesday Apr 4
#3 3 Wednesday Apr 5
#4 4 Thursday Apr 6
#5 5 Friday Apr 7
#6 6 Saturday Apr 8

How to split a dataframe column by the first instance of a character in its values

Another option might be to use tidyr::separate:

separate(x,a,into = c("b","c"),sep = "_",remove = FALSE,extra = "merge")

Separate column into three columns with grouping

Use extra argument:

# dummy data
df1 <- data.frame(x = c(
"some name1",
"justOneName",
"some three name",
"Abdullaeva Mehseti Nuraddin Kyzy"))

library(tidyr)
library(dplyr)

df1 %>%
separate(x, c("a1", "a2", "a3"), extra = "merge")
# a1 a2 a3
# 1 some name1 <NA>
# 2 justOneName <NA> <NA>
# 3 some three name
# 4 Abdullaeva Mehseti Nuraddin Kyzy
# Warning message:
# Too few values at 2 locations: 1, 2

From manual:

extra

If sep is a character vector, this controls what happens when
there are too many pieces. There are three valid options:

- "warn" (the default): emit a warning and drop extra values.

- "drop": drop any extra values without a warning.

- "merge": only splits at most length(into) times



Related Topics



Leave a reply



Submit