Convert Column With Pipe Delimited Data into Dummy Variables

Convert column with pipe delimited data into dummy variables

Another way is using cSplit_e from splitstackshape package.

splitting the dataframe by column a and fill it by 0 and drop the original column.

library(splitstackshape)
cSplit_e(df, "a", "|", type = "character", fill = 0, drop = T)

# a_Ben a_Chris a_Greg a_Jim a_Steve
#1 1 1 0 1 0
#2 1 0 1 1 0
#3 1 0 0 1 1

How do I create dummy variables in a dataframe from a vector of characters?

Does this work:

library(dplyr)
library(tidyr)
data %>% separate_rows(x, sep = ',') %>% mutate(val = 1) %>%
+ pivot_wider(names_from = x, values_from = val, values_fill = list(val = 0)) %>%
+ select(1,2,3,5,4)
# A tibble: 4 x 5
id a b c d
<dbl> <dbl> <dbl> <dbl> <dbl>
1 10 1 1 0 0
2 20 1 0 0 1
3 30 0 1 1 0
4 40 0 1 0 1

Split a column into multiple binary dummy columns

We can use mtabulate from qdapTools after splitting (strsplit(..) the 'features' column.

library(qdapTools)
cbind(sampledf[1],mtabulate(strsplit(as.character(sampledf$features), ':')))
# vin f1 f2 f3 f4 f5
#1 v1 1 1 1 0 0
#2 v2 0 1 0 1 1
#3 v3 1 0 0 1 1

Or we can use cSplit_e from library(splitstackshape)

library(splitstackshape)
df1 <- cSplit_e(sampledf, 'features', ':', type= 'character', fill=0, drop=TRUE)
names(df1) <- sub('.*_', '', names(df1))

Or using base R methods, we split as before, set the names of the list elements from the strsplit with 'vin' column, convert to a key/value columns 'data.frame' using stack, get the table, transpose and cbind with the first column of 'sampledf'.

cbind(sampledf[1],  
t(table(stack(setNames(strsplit(as.character(sampledf$features), ':'),
sampledf$vin)))))

How to split cell contents into several columns to create wide data?

An easier option is with mtabulate from qdapTools after splitting at the ,

library(qdapTools)
out <- +(mtabulate(strsplit(df$grades, ",\\s+")) > 0)
colnames(out) <- paste0("grade_", colnames(out))
cbind(df[1], out)

-output

  teacher grade_1 grade_2 grade_3 grade_4 grade_5 grade_K
1 Mary 1 1 1 1 0 1
2 Andrew 1 0 1 1 0 0
3 Rose 1 1 1 1 1 0
4 Julia 0 0 0 1 1 0
5 Richard 1 1 1 1 1 1

Or use splitstackshape

library(splitstackshape)
cSplit_e(df, "grades", sep = ",", type = "character", fill = 0, drop = TRUE)
teacher grades_1 grades_2 grades_3 grades_4 grades_5 grades_K
1 Mary 1 1 1 1 0 1
2 Andrew 1 0 1 1 0 0
3 Rose 1 1 1 1 1 0
4 Julia 0 0 0 1 1 0
5 Richard 1 1 1 1 1 1

Create Multiple New Columns Based on Pipe-Delimited Column in Pandas

You can use get_dummies and add_prefix:

df.Parts.str.get_dummies().add_prefix('Part_')

Output:

   Part_12  Part_34  Part_56
0 1 1 1

Edit for comment and counting duplicates.

df = pd.DataFrame({'Parts':['12|34|56|12']}, index=[0])
pd.get_dummies(df.Parts.str.split('|',expand=True).stack()).sum(level=0).add_prefix('Part_')

Output:

   Part_12  Part_34  Part_56
0 2 1 1

How to convert one (comma split) column into multiple columns in R?

Using cSplit_e

library(splitstackshape)
out <- cSplit_e(data, 'keyword', sep= ',', type =
'character', fill = 0, drop = TRUE)
names(out) <- sub('keyword_', '', names(out))

-output

> out
person c f g j k n p r u w x y
1 person_1 0 1 0 0 1 0 1 0 0 1 0 0
2 person_2 0 0 0 1 0 0 0 0 0 0 0 1
3 person_3 0 0 0 0 0 0 0 1 0 0 0 1
4 person_4 0 0 1 0 0 0 0 0 0 1 0 0
5 person_5 1 0 0 0 0 1 0 0 1 0 1 0

data

data <- structure(list(person = c("person_1", "person_2", "person_3", 
"person_4", "person_5"), keyword = c("k,f,p,w", "y,j", "y,r",
"g,w", "u,x,c,n")), class = "data.frame", row.names = c("1",
"2", "3", "4", "5"))


Related Topics



Leave a reply



Submit