Tidyr Separate Only First N Instances

tidyr separate only first n instances

You need the extra argument with the "merge" option. This allows only as many splits as you have new columns defined.

separate(df, V1, c("V1", "V2", "V3", "V4"), extra = "merge")

     V1 V2  V3                             V4
1 Value is the                       best_one
2  This is the prettiest_thing_I've_ever_seen
3  Here is the    next_example_of_what_I_want

tidyr separate only last n instances

When screening the already answered similar questions, I discovered tidyr::extract in this answer, which can be used to do the job:

 tmp2 %>% extract(
   "varTreatName", c("varName","treatment","canopyPosition")
   , regex = "(.*)_([^_]+)_([^_]+)$")

yielding the expected result:

  varName treatment canopyPosition
1    resp      Nadd    belowCanopy
2    resp     NPadd    belowCanopy
3 resp_sd      Nadd    belowCanopy
4 resp_sd     NPadd    belowCanopy

Applying tidyr separate only to specific rows

Another approach:

cols_to_split = c('here_do')

clean_df <-df %>% 
     filter(text %in% cols_to_split) %>% 
     tidyr::separate(text,into=c("first","sec"),sep="_",remove=F) %>% 
     bind_rows(filter(df, !text %in% cols_to_split))

#  var_a var_b    text first  sec
#1     b     7 here_do  here   do
#2     a    26 foo_bla  <NA> <NA>
#3     c    23  oh_yes  <NA> <NA>
#4     d     2     baa  <NA> <NA>
#5     e    67    land  <NA> <NA>

If you need to keep rest of the rows in column 'first', you may use:

clean_df <-df %>% 
     filter(text %in% cols_to_split) %>% 
     tidyr::separate(text,into=c("first","sec"),sep="_",remove=F) %>% 
     bind_rows(filter(df, !text %in% cols_to_split)) %>% 
     mutate(first = ifelse(is.na(first), as.character(text), first))

#  var_a var_b    text   first  sec
#1     b     7 here_do    here   do
#2     a    26 foo_bla foo_bla <NA>
#3     c    23  oh_yes  oh_yes <NA>
#4     d     2     baa     baa <NA>
#5     e    67    land    land <NA>

Specify separator character in separate function from package tidyr

Here is a way to solve the problem.

d %>% separate(var, into = c("newcol1", "newcol2"), sep = "_(?=.*_)")

Here, the regex _(?=.*_) means: _ followed by a string including another _.

The result:

# A tibble: 5 x 2
  newcol1 newcol2
  <chr>   <chr>  
1 A       1_a    
2 B       2_b    
3 C       3_c    
4 D       4_d    
5 E       5_e

Using regex and tidyr in R to split column variable on first instance of match

You need to specify the extra parameter to be merge:

library(tidyr)
df %>% separate(date, c("day", "date"), extra = "merge")

#  game       day  date
#1    1    Monday Apr 3
#2    2   Tuesday Apr 4
#3    3 Wednesday Apr 5
#4    4  Thursday Apr 6
#5    5    Friday Apr 7
#6    6  Saturday Apr 8

How to split a dataframe column by the first instance of a character in its values

Another option might be to use tidyr::separate:

separate(x,a,into = c("b","c"),sep = "_",remove = FALSE,extra = "merge")

Separate column into three columns with grouping

Use extra argument:

# dummy data
df1 <- data.frame(x = c(
  "some name1",
  "justOneName",
  "some three name",
  "Abdullaeva Mehseti Nuraddin Kyzy"))

library(tidyr)
library(dplyr)

df1 %>% 
  separate(x, c("a1", "a2", "a3"), extra = "merge")
#            a1      a2            a3
# 1        some   name1          <NA>
# 2 justOneName    <NA>          <NA>
# 3        some   three          name
# 4  Abdullaeva Mehseti Nuraddin Kyzy
# Warning message:
#   Too few values at 2 locations: 1, 2

From manual:

extra

If sep is a character vector, this controls what happens when
there are too many pieces. There are three valid options:

- "warn" (the default): emit a warning and drop extra values.

- "drop": drop any extra values without a warning.

- "merge": only splits at most length(into) times

Tidyr Separate Only First N Instances