R: Split Variable Column into Multiple (Unbalanced) Columns by Comma

How to split a column into multiple (non equal) columns in R

We could use cSplit from splitstackshape

library(splitstackshape)
cSplit(DF, "Col1",",")

-output

cSplit(DF, "Col1",",")
   Col1_1 Col1_2 Col1_3 Col1_4
1:      a      b      c   <NA>
2:      a      b   <NA>   <NA>
3:      a      b      c      d

R: Split Variable Column into multiple (unbalanced) columns by comma

From Ananda's splitstackshape package:

cSplit(df, "Events", sep=",")
#    Name Age Number First      Events_1 Events_2 Events_3 Events_4
#1: Karen  24      8     0  Triathlon/IM Marathon      10k       5k
#2:  Kurt  39      2     0 Half-Marathon      10k       NA       NA
#3: Leah   18      0     1            NA       NA       NA       NA

Or with tidyr:

separate(df, 'Events', paste("Events", 1:4, sep="_"), sep=",", extra="drop")
#   Name Age Number               Events_1 Events_2 Events_3 Events_4 First
#1 Karen  24      8           Triathlon/IM Marathon      10k       5k     0
#2  Kurt  39      2          Half-Marathon      10k     <NA>     <NA>     0
#3 Leah   18      0                     NA     <NA>     <NA>     <NA>     1

With the data.table package:

setDT(df)[,paste0("Events_", 1:4) := tstrsplit(Events, ",")][,-"Events", with=F]
#    Name Age Number First               Events_1 Events_2 Events_3 Events_4
#1: Karen  24      8     0           Triathlon/IM Marathon      10k       5k
#2:  Kurt  39      2     0          Half-Marathon      10k       NA       NA
#3: Leah   18      0     1                     NA       NA       NA       NA

Data

df <- structure(list(Name = structure(1:3, .Label = c("Karen", "Kurt", 
"Leah "), class = "factor"), Age = c(24L, 39L, 18L), Number = c(8L, 
2L, 0L), Events = structure(c(3L, 2L, 1L), .Label = c("               NA", 
"         Half-Marathon,10k", "     Triathlon/IM,Marathon,10k,5k"
), class = "factor"), First = c(0L, 0L, 1L)), .Names = c("Name", 
"Age", "Number", "Events", "First"), class = "data.frame", row.names = c(NA, 
-3L))

Splitting a string column with unequal size into multiple columns using R

This is a good occasion to make use of extra = merge argument of separate:

library(dplyr)
df %>% 
  separate(str, c('A', 'B', 'C'), sep= ";", extra = 'merge')

  no    A     B     C
1  1 M 12  M 13  <NA>
2  2 M 24  <NA>  <NA>
3  3 <NA>  <NA>  <NA>
4  4 C 12  C 50  C 78

Separate a column of a dataframe in undefined number of columns with R/tidyverse

You can first count the number of columns it can take and then use separate.

nmax <- max(stringr::str_count(df$x, "\\.")) + 1
tidyr::separate(df, x, paste0("col", seq_len(nmax)), sep = "\\.", fill = "right")

#  col1 col2 col3
#1    a <NA> <NA>
#2    a    b <NA>
#3    a    b    c
#4    a    b    d
#5    a    d <NA>

R tidyr: use separate function to separate character column with comma-separated text into multiple columns using RegEx

With tidyverse, we can use separate_rows to split up the 'x' column, create a sequence column and use pivot_wider from tidyr

library(dplyr)
library(tidyr)
df %>% 
   filter(!(is.na(x)|x==""))%>% 
   mutate(rn = row_number()) %>% 
   separate_rows(x) %>%
   mutate(i1 = 1) %>% 
   pivot_wider(names_from = x, values_from = i1, , values_fill = list(i1 = 0)) %>%
   select(-rn)
# A tibble: 4 x 3
#    one   two three
#  <dbl> <dbl> <dbl>
#1     1     0     0
#2     1     1     0
#3     0     1     1
#4     1     1     1

In the above code, the rn column was added to have distinct identifier for each rows after we expand the rows with separate_rows, otherwise, it can result in a list output column in pivot_wider when there are duplicate elements. The 'i1' with value 1 is added to be used in the values_from. Another option is to specify values_fn = length

Or we can use table after splitting the 'x' column in base R

table(stack(setNames(strsplit(as.character(df$x), ",\\s+"), seq_len(nrow(df))))[2:1])

Split data frame string column into multiple columns

Use stringr::str_split_fixed

library(stringr)
str_split_fixed(before$type, "_and_", 2)

split a column into multiple columns - tidyr - error

You haven't named the 11 output columns:

> input %>% separate(name,into=letters[1:11], sep="\\.")
  var a b c   d   e    f  g   h    i j k tis     score
1   1 c 1 2 mi1 mi1 dup1 er er2 er33 0    t1  9.382829
2   2 c 2 2 mi1 mi1 dup1 er er2 er33 0    t2 99.382829
3   3 c 3 2 mi1 mi1 dup1 er er2 er33 0    t3 19.382829

Split a data frame column based on a comma

Using dplyr and base R:

library(dplyr)
 final_proj_data %>% 
 mutate(State=unlist(lapply(strsplit(County,", "),function(x) x[2])),
       County=gsub(",.*","",County))
    ID         County Population Year   State
1 1003 Baldwin County     169162 2006 Alabama
2 1015 Calhoun County     112903 2006 Alabama
3 1043 Cullman County      80187 2006 Alabama
4 1049  DeKalb County      68014 2006 Alabama

Original:

With dplyr and tidyr(Just seen that @Ronak Shah commented the same above):

library(dplyr)
library(tidyr)
final_proj_data %>% 
   separate(County,c("County","State"),sep=",")
    ID         County    State Population Year
1 1003 Baldwin County  Alabama     169162 2006
2 1015 Calhoun County  Alabama     112903 2006
3 1043 Cullman County  Alabama      80187 2006
4 1049  DeKalb County  Alabama      68014 2006