How to Split a Column in R

Split data frame string column into multiple columns

Use stringr::str_split_fixed

library(stringr)
str_split_fixed(before$type, "_and_", 2)

How to split a column into multiple (non equal) columns in R

We could use cSplit from splitstackshape

library(splitstackshape)
cSplit(DF, "Col1",",")

-output

cSplit(DF, "Col1",",")
   Col1_1 Col1_2 Col1_3 Col1_4
1:      a      b      c   <NA>
2:      a      b   <NA>   <NA>
3:      a      b      c      d

How to split a dataframe column into two columns

read.table(text=df$X1, sep=':', fill=T, h=F, dec = '/')
   V1    V2
1  NA      
2 1.0  0.82
3 1.1 1.995
4 0.1 1.146
5  NA      
6 1.1 1.995

If you want columns in respective data.types:

type.convert(read.table(text=df$X1, sep=':', fill=T, h=F, dec = '/'), as.is = TRUE)
   V1    V2
1  NA    NA
2 1.0 0.820
3 1.1 1.995
4 0.1 1.146
5  NA    NA
6 1.1 1.995

df <- structure(list(X1 = c(NA, "1/0:0.82", "1/1:1.995", "0/1:1.146", NA,
                 "1/1:1.995")), class = "data.frame", row.names = c(NA, -6L))

Splitting a single column into multiple columns in R

A possible solution, based on tidyverse:

library(tidyverse)

df %>% 
  filter(table != "_________________________________________________" ) %>% 
  mutate(table = str_trim(table)) %>% 
  separate(table, sep = "\\s+(?=\\d+)", 
     into = c("Characteristic", "Urban", "Rural", "Total"), fill = "right") %>% 
  filter(Characteristic != "") %>% 
  slice(-1) 

#> # A tibble: 54 × 4
#>    Characteristic           Urban Rural Total
#>    <chr>                    <chr> <chr> <chr>
#>  1 Electricity              <NA>  <NA>  <NA> 
#>  2 Yes                      99.8  94.4  98.9 
#>  3 No                       0.2   5.6   1.1  
#>  4 Total                    100.0 100.0 100.0
#>  5 Source of drinking water <NA>  <NA>  <NA> 
#>  6 Piped into residence     97.1  81.4  94.4 
#>  7 Public tap               0.0   0.3   0.1  
#>  8 Well in residence        1.1   3.7   1.6  
#>  9 Public well              0.0   0.4   0.1  
#> 10 Spring                   0.0   2.3   0.4  
#> # … with 44 more rows

How to split up a column of a dataframe into new columns in R?

With tidyverse, we could create a new group everytime c appears in the x column, then we can pivot the data wide. Generally, duplicate names are discouraged, so I created a sequential c column names.

library(tidyverse)

results <- df %>% 
  group_by(idx = cumsum(x == "c")) %>% 
  filter(x != "c") %>% 
  mutate(rn = row_number()) %>% 
  pivot_wider(names_from = idx, values_from = x, names_prefix = "c_") %>% 
  select(-rn)

Output

  c_1   c_2   c_3  
  <chr> <chr> <chr>
1 a     b     d    
2 a     b     d    
3 a     b     d    
4 a     b     d

However, if you really want duplicate names, then we could add on set_names:

purrr::set_names(results, "c")

  c     c     c    
  <chr> <chr> <chr>
1 a     b     d    
2 a     b     d    
3 a     b     d    
4 a     b     d

Or in base R, we could create the grouping with cumsum, then split those groups, then bind back together with cbind. Then, we remove the first row that contains the c characters.

names(df) <- "c"
do.call(cbind, split(df, cumsum(df$c == "c")))[-1,]

#  c c c
#2 a b d
#3 a b d
#4 a b d
#5 a b d

How to split column by some rules in R?

One option would be to use an ifelse to find any rows that do not have an item in list_of_keep_data , then replace the hyphen with something else (like ;>) and leave the keep it rows alone. Then, we can use separate by using the new delimiter (;>). This simultaneously removes the text from product and puts the other text into the tags column.

library(tidyverse)

df_in %>%
  mutate(product = ifelse(
    !str_detect(product, str_c(list_of_keep_data, collapse = "|")),
    str_replace_all(product, pattern = " - ", " ;> "),
    product
  )) %>%
  separate(product, into = c("product", "tags"),  sep = " ;> ")

Output

               product tags
1:           Product 1 100g
2:           Product 2 150g
3:           Product 3 <NA>
4: Product 4 - keep it <NA>

Another option could be to filter to the rows that you do want to separate, separate on the -, then bind the rows back to the other rows.

df_in %>% 
  filter(!str_detect(product, str_c(list_of_keep_data, collapse = "|"))) %>% 
  separate(product, into = c("product", "tags"),  sep = " - ") %>% 
  bind_rows(filter(df_in, str_detect(product, str_c(list_of_keep_data, collapse = "|"))))

Or here is another option using data.table:

library(data.table)

df_in[, tags := as.character(tags)
      ][!str_detect(product, str_c(list_of_keep_data, collapse = "|")),
                             c("product", "tags") := tstrsplit(product, " - ")][]

Split one column into two, retain original value if there aren't two values

This is a good use case for dplyr::coalesce, which (akin to the SQL function it's named for) returns the first non-NA element from a set of vectors.

library(dplyr)
library(tidyr)
data %>% 
  separate(GeneLocation, c('Start_Position', 'Stop_Position')) %>%
  mutate(Stop_Position = coalesce(Stop_Position, Start_Position))

How to split a column in multiple columns using data.table

Use tstrsplit with keep = 1:3 to keep only the first three columns:

dt[, c("bins", "positions", "IDs") := tstrsplit(name, "_", fixed = TRUE, keep = 1:3)]

                                name  bin  position  ID
1:                bin1_position1_ID1 bin1 position1 ID1
2:                bin2_position2_ID2 bin2 position2 ID2
3:                bin3_position3_ID3 bin3 position3 ID3
4:                bin4_position4_ID4 bin4 position4 ID4
5: bin5_position5_ID5_another5_more5 bin5 position5 ID5