Convert Column With Pipe Delimited Data into Dummy Variables

Convert column with pipe delimited data into dummy variables

Another way is using cSplit_e from splitstackshape package.

splitting the dataframe by column a and fill it by 0 and drop the original column.

library(splitstackshape)
cSplit_e(df, "a", "|", type = "character", fill = 0, drop = T)

#   a_Ben a_Chris a_Greg a_Jim a_Steve
#1     1       1      0     1       0
#2     1       0      1     1       0
#3     1       0      0     1       1

How do I create dummy variables in a dataframe from a vector of characters?

Does this work:

library(dplyr)
library(tidyr)
data %>% separate_rows(x, sep = ',') %>% mutate(val = 1) %>% 
+   pivot_wider(names_from = x, values_from = val, values_fill = list(val = 0)) %>% 
+   select(1,2,3,5,4)
# A tibble: 4 x 5
     id     a     b     c     d
  <dbl> <dbl> <dbl> <dbl> <dbl>
1    10     1     1     0     0
2    20     1     0     0     1
3    30     0     1     1     0
4    40     0     1     0     1

Split a column into multiple binary dummy columns

We can use mtabulate from qdapTools after splitting (strsplit(..) the 'features' column.

library(qdapTools)
cbind(sampledf[1],mtabulate(strsplit(as.character(sampledf$features), ':')))
#  vin f1 f2 f3 f4 f5
#1  v1  1  1  1  0  0
#2  v2  0  1  0  1  1
#3  v3  1  0  0  1  1

Or we can use cSplit_e from library(splitstackshape)

library(splitstackshape)
df1 <- cSplit_e(sampledf, 'features', ':', type= 'character', fill=0, drop=TRUE)
names(df1) <-  sub('.*_', '', names(df1))

Or using base R methods, we split as before, set the names of the list elements from the strsplit with 'vin' column, convert to a key/value columns 'data.frame' using stack, get the table, transpose and cbind with the first column of 'sampledf'.

cbind(sampledf[1],  
 t(table(stack(setNames(strsplit(as.character(sampledf$features), ':'), 
              sampledf$vin)))))

How to split cell contents into several columns to create wide data?

An easier option is with mtabulate from qdapTools after splitting at the ,

library(qdapTools)
out <- +(mtabulate(strsplit(df$grades, ",\\s+")) > 0)
colnames(out) <- paste0("grade_", colnames(out))
cbind(df[1], out)

-output

  teacher grade_1 grade_2 grade_3 grade_4 grade_5 grade_K
1    Mary       1       1       1       1       0       1
2  Andrew       1       0       1       1       0       0
3    Rose       1       1       1       1       1       0
4   Julia       0       0       0       1       1       0
5 Richard       1       1       1       1       1       1

Or use splitstackshape

library(splitstackshape)
cSplit_e(df, "grades", sep = ",", type = "character", fill = 0, drop = TRUE)
  teacher grades_1 grades_2 grades_3 grades_4 grades_5 grades_K
1    Mary        1        1        1        1        0        1
2  Andrew        1        0        1        1        0        0
3    Rose        1        1        1        1        1        0
4   Julia        0        0        0        1        1        0
5 Richard        1        1        1        1        1        1

Create Multiple New Columns Based on Pipe-Delimited Column in Pandas

You can use get_dummies and add_prefix:

df.Parts.str.get_dummies().add_prefix('Part_')

Output:

   Part_12  Part_34  Part_56
0        1        1        1

Edit for comment and counting duplicates.

df = pd.DataFrame({'Parts':['12|34|56|12']}, index=[0])
pd.get_dummies(df.Parts.str.split('|',expand=True).stack()).sum(level=0).add_prefix('Part_')

Output:

   Part_12  Part_34  Part_56
0        2        1        1

How to convert one (comma split) column into multiple columns in R?

Using cSplit_e

library(splitstackshape)
out <- cSplit_e(data, 'keyword', sep= ',', type = 
      'character', fill = 0, drop = TRUE)
names(out) <- sub('keyword_', '', names(out))

-output

> out
    person c f g j k n p r u w x y
1 person_1 0 1 0 0 1 0 1 0 0 1 0 0
2 person_2 0 0 0 1 0 0 0 0 0 0 0 1
3 person_3 0 0 0 0 0 0 0 1 0 0 0 1
4 person_4 0 0 1 0 0 0 0 0 0 1 0 0
5 person_5 1 0 0 0 0 1 0 0 1 0 1 0

data

data <- structure(list(person = c("person_1", "person_2", "person_3", 
"person_4", "person_5"), keyword = c("k,f,p,w", "y,j", "y,r", 
"g,w", "u,x,c,n")), class = "data.frame", row.names = c("1", 
"2", "3", "4", "5"))

Convert Column With Pipe Delimited Data into Dummy Variables