Split Character Column into Several Binary (0/1) Columns

Split character column into several binary (0/1) columns

You can try cSplit_e from my "splitstackshape" package:

library(splitstackshape)
a <- c("a,b,c", "a,b", "a,b,c,d")
cSplit_e(as.data.table(a), "a", ",", type = "character", fill = 0)
#          a a_a a_b a_c a_d
# 1:   a,b,c   1   1   1   0
# 2:     a,b   1   1   0   0
# 3: a,b,c,d   1   1   1   1
cSplit_e(as.data.table(a), "a", ",", type = "character", fill = 0, drop = TRUE)
#    a_a a_b a_c a_d
# 1:   1   1   1   0
# 2:   1   1   0   0
# 3:   1   1   1   1

There's also mtabulate from "qdapTools":

library(qdapTools)
mtabulate(strsplit(a, ","))
#   a b c d
# 1 1 1 1 0
# 2 1 1 0 0
# 3 1 1 1 1

A very direct base R approach is to use table along with stack and strsplit:

table(rev(stack(setNames(strsplit(a, ",", TRUE), seq_along(a)))))
#    values
# ind a b c d
#   1 1 1 1 0
#   2 1 1 0 0
#   3 1 1 1 1

Split string column to create new binary columns

Using mtabuate from the qdapTools package that I maintain:

library(qdapTools)
mtabulate(strsplit(as.character(dat[[1]]), "/"))

##   V1 ca cbr_LBL cni_at.p3x.4 eq2_off eq2_on fe.gr hi.on hi.ov put sent_1 sent_1fe.gr
## 1  1  1       0            0       1      1     1     0     0   1      1           0
## 2  1  1       0            0       1      1     1     1     1   1      1           0
## 3  1  1       0            0       1      1     0     1     1   1      0           1
## 4  1  1       0            1       1      1     1     0     0   1      1           0
## 5  1  1       1            0       1      1     1     0     0   1      1           0

Split a column into multiple binary dummy columns

We can use mtabulate from qdapTools after splitting (strsplit(..) the 'features' column.

library(qdapTools)
cbind(sampledf[1],mtabulate(strsplit(as.character(sampledf$features), ':')))
#  vin f1 f2 f3 f4 f5
#1  v1  1  1  1  0  0
#2  v2  0  1  0  1  1
#3  v3  1  0  0  1  1

Or we can use cSplit_e from library(splitstackshape)

library(splitstackshape)
df1 <- cSplit_e(sampledf, 'features', ':', type= 'character', fill=0, drop=TRUE)
names(df1) <-  sub('.*_', '', names(df1))

Or using base R methods, we split as before, set the names of the list elements from the strsplit with 'vin' column, convert to a key/value columns 'data.frame' using stack, get the table, transpose and cbind with the first column of 'sampledf'.

cbind(sampledf[1],  
 t(table(stack(setNames(strsplit(as.character(sampledf$features), ':'), 
              sampledf$vin)))))

r split a string of data into multiple columns, sorted by individual variables

We can do an strsplit and then with mtabulate get the frequency

library(qdapTools)
do.call(cbind, lapply(df, function(x) mtabulate(strsplit(x, ","))))
#    indication.1 indication.2 indication.3 treatment.1 treatment.2 treatment.3
#1            1            1            0           0           0           1
#2            0            1            0           1           1           0
#3            1            0            1           0           1           1

Separate character string variable into several variables

Perhaps, using cSplit_e would be an option

library(splitstackshape)  
library(dplyr)
cSplit_e(df, 'var', sep=";", type = 'character', fill = 0, drop = TRUE)%>%
     mutate(var_NA = +(is.na(df$var)))
#    var_1 var_2 var_3 var_4 var_5 var_NA
#1      1     1     0     0     0      0
#2      0     0     0     0     0      1
#3      1     1     1     1     1      0
#4      0     0     1     0     1      0
#5      1     0     0     0     0      0
#6      1     0     0     1     0      0
#7      0     0     1     0     0      0
#8      0     0     0     0     0      1
#9      0     0     0     1     0      0
#10     1     0     0     0     1      0

Or using base R

t(sapply(strsplit(df$var, "[:;]"), function(x) +(1:5 %in% x)))

How to split a dataframe column into multiple columns

I don't know if it can be done simpler (without the for loop), but this does the trick:

for i in range(16):
    dfs['B'+str(i)] = dfs['BINDATA'].str[i]

The str attribute of the Series gives access to some vectorized string methods which act upon each element (see docs: http://pandas.pydata.org/pandas-docs/stable/basics.html#vectorized-string-methods). In this case we just index the string to acces the different characters.

This gives me:

In [20]: dfs
Out[20]:
            BINDATA B0 B1 B2 B3 B4 B5 B6 B7 B8 B9 B10 B11 B12 B13 B14 B15
0  1011111111101101  1  0  1  1  1  1  1  1  1  1   1   0   1   1   0   1
1  1011101101111101  1  0  1  1  1  0  1  1  0  1   1   1   1   1   0   1
2  1111111111110111  1  1  1  1  1  1  1  1  1  1   1   1   0   1   1   1
3  1110011111111111  1  1  1  0  0  1  1  1  1  1   1   1   1   1   1   1
4  1111101111111000  1  1  1  1  1  0  1  1  1  1   1   1   1   0   0   0
5  1101111001110101  1  1  0  1  1  1  1  0  0  1   1   1   0   1   0   1
6  1101111111111110  1  1  0  1  1  1  1  1  1  1   1   1   1   1   1   0

If you want them as ints instead of strings, you can add .astype(int) in the for loop.

EDIT: Another way to do it (a oneliner, but you have to change the column names in a second step):

In [34]: splitted = dfs['BINDATA'].apply(lambda x: pd.Series(list(x)))

In [35]: splitted.columns = ['B'+str(x) for x in splitted.columns]

In [36]: dfs.join(splitted)
Out[36]:
            BINDATA B0 B1 B2 B3 B4 B5 B6 B7 B8 B9 B10 B11 B12 B13 B14 B15
0  1011111111101101  1  0  1  1  1  1  1  1  1  1   1   0   1   1   0   1
1  1011101101111101  1  0  1  1  1  0  1  1  0  1   1   1   1   1   0   1
2  1111111111110111  1  1  1  1  1  1  1  1  1  1   1   1   0   1   1   1
3  1110011111111111  1  1  1  0  0  1  1  1  1  1   1   1   1   1   1   1
4  1111101111111000  1  1  1  1  1  0  1  1  1  1   1   1   1   0   0   0
5  1101111001110101  1  1  0  1  1  1  1  0  0  1   1   1   0   1   0   1
6  1101111111111110  1  1  0  1  1  1  1  1  1  1   1   1   1   1   1   0

Split Character Column into Several Binary (0/1) Columns