Convert Comma Separated String to Numeric Columns

Convert comma separated string to numeric columns

I think you are looking for the strsplit() function;

a = "2000,1450,1800,2200"
strsplit(a, ",")
[[1]]                                                                                                                                                       
[1] "2000" "1450" "1800" "2200"

Notice that strsplit returns a list, in this case with only one element. This is because strsplit takes vectors as input. Therefore, you can also put a long vector of your single cell characters into the function and get back a splitted list of that vector. In a more relevant example this look like:

# Create some example data
dat = data.frame(reaction_time = 
       apply(matrix(round(runif(100, 1, 2000)), 
                     25, 4), 1, paste, collapse = ","),
                     stringsAsFactors=FALSE)
splitdat = do.call("rbind", strsplit(dat$reaction_time, ","))
splitdat = data.frame(apply(splitdat, 2, as.numeric))
names(splitdat) = paste("trial", 1:4, sep = "")
head(splitdat)
  trial1 trial2 trial3 trial4
1    597   1071   1430    997
2    614    322   1242   1140
3   1522   1679     51   1120
4    225   1988   1938   1068
5    621    623   1174     55
6   1918   1828    136   1816

and finally, to calculate the mean per person:

apply(splitdat, 1, mean)
[1] 1187.50  361.25  963.75 1017.00  916.25 1409.50  730.00 1310.75 1133.75
[10]  851.25  914.75  881.25  889.00 1014.75  676.75  850.50  805.00 1460.00
[19]  901.00 1443.50  507.25  691.50 1090.00  833.25  669.25

Read comma separated string to numeric and create dataframe

 library(MADAM)
 lapply(strsplit(oneLine, ","),
              function(x) fisher.method(as.data.frame.list(as.numeric(x))))
  #[[1]]
  #         S num.p p.value p.adj
  #1 89.26501     2       0     0

If you have a vector

  Lines <- c("0.0001879, 2.2e-16", "0.001435, 1.2e-14", "0.00014353, 2.5e-13")
  do.call(rbind,lapply(strsplit(Lines, ","), 
        function(x) fisher.method(as.data.frame.list(as.numeric(x)))))
  #         S num.p      p.value        p.adj
 # 1 89.26501     2 0.000000e+00 0.000000e+00
 #11 77.20092     2 6.661338e-16 6.661338e-16
 #12 75.73256     2 1.443290e-15 1.443290e-15

If you are reading from a file, for example if the contents of file1.csv is

 0.0001879, 2.2e-16
 0.001435, 1.2e-14
 0.00014353, 2.5e-13

Then, you can use read.csv to read the file and apply fisher.method directly on the two column data.frame

 dat <- read.csv("file1.csv", header=FALSE)
 fisher.method(dat)
  #          S num.p      p.value        p.adj
  #1 89.26501     2 0.000000e+00 0.000000e+00
  #2 77.20092     2 6.661338e-16 9.992007e-16
  #3 75.73256     2 1.443290e-15 1.443290e-15

Convert comma separated string to integer in R

Here'a an approach using scan:

oneLine <- "IP1: IP2: 0.1,0.5,0.9"
myVector <- strsplit(oneLine, ":")
listofPValues <- myVector[[1]][[3]]
listofPValues
# [1] " 0.1,0.5,0.9"
scan(text = listofPValues, sep = ",")
# Read 3 items
# [1] 0.1 0.5 0.9

And one using strsplit:

as.numeric(unlist(strsplit(listofPValues, ",")))
# [1] 0.1 0.5 0.9

How to convert comma separated numbers from a dataframe to to numbers and get the avg value

You can simply define a function that unpack those values and then get the mean of those.

def get_mean(x):
    #split into list of strings
    splited = x.split(',')
    #Transform into numbers
    y = [float(n) for n in splited]
    return sum(y)/len(y)

#Apply on desired column
df['col'] = df['col'].apply(get_mean)

Scala - Convert column having comma separated numbers (currently string) to Array of Double in Dataframe

split + cast should do the job:

import org.apache.spark.sql.functions.{col, split}

val df = Seq(("619.619620621622, 123.12412512699")).toDF("MyCol")

val df2 = df.withColumn("myCol", split(col("MyCol"), ",").cast("array<double>"))

df2.printSchema

//root
// |-- myCol: array (nullable = true)
// |    |-- element: double (containsNull = true)

Convert numbers in comma-separated string within a data.table column into a long table form

strsplit and matrix are both fast, but you're not using them in an optimized manner. Here's the approach I'd suggest:

foo_a5 <- function(DT) {
  # unlist the relevant column and use strsplit, but don't make your matrices yet
  a <- strsplit(unlist(DT$c, use.names = FALSE), ",", TRUE)
  # expand all the other columns of the input data.table...
  cbind(DT[rep(seq.int(nrow(DT)), lengths(a)/3), 1:2], 
        # ... and bind it with your newly formed (single) matrix
        matrix(as.integer(unlist(a, use.names=FALSE)),
               ncol = 3, byrow = TRUE, 
               dimnames = list(NULL, c("x", "y", "z"))))
}

foo_a5(DT)
##     id  b  x  y  z
##  1:  1 10 80 96 40
##  2:  1 10 83 86 12
##  3:  2 92 86 18 38
##  4:  2 92 51 17 80
##  5:  2 92 33 38 23
##  6:  2 92 49 71 97
##  7:  2 92 10 13 70
##  8:  3 76 84 39 86
##  9:  4 81 48 99  8
## 10:  5 56 53 92 27
## 11:  5 56  2 39 62

An alternative to @zx8754's answer that uses a similar logic is the following:

foo_zx2 <- function(DT) {
  L <- DT[, list(c = unlist(strsplit(unlist(c, use.names = FALSE), ",", TRUE), 
                            use.names = FALSE)), .(id, b)]
  L[, time := rep(c("x", "y", "z"), length.out = nrow(L))][
    , dcast(.SD, id + b + rowid(time) ~ time, value.var = "c")]
}

This tests faster than @Ronak's approach for me, but still slower than just using base R.

R: converting comma separated entry to columns with non-characters

In 2 steps :

You can use read.table with fill=TRUE, to read all lines (You can also use readLines)
treat without commas as seprator.

The code is something like this :

aa <- read.table(text='John, Doe
Rebecca, Homes
Organization LLC',sep=',',fill=TRUE,colClasses='character')

## treat lines without comma
aa[nchar(aa$V2) ==0,] <- 
      do.call(rbind,strsplit(aa[nchar(aa$V2) ==0,]$V1,' ')) ## space as separator :I assume you 
                                                               don't have compound  name

> aa
            V1     V2
1         John    Doe
2      Rebecca  Homes
3 Organization    LLC

EDIT better method : I use a reglar expression to replace any space by a comma to have regular separator. I assume that you don't have any compound name.

ff <- readLines(textConnection('John, Doe
Rebecca, Homes
Organization LLC'))
do.call(rbind,
strsplit(gsub('[ ]|, |,[ ]',',',ff),','))

How to change comma separated values in a single element to multiple columns and assign numeric coding

dplyr, stringi and reshape2 will do all the work you need

install.packages("dplyr")
install.packages("stringi")
install.packages("reshape2")

library(dplyr)
library(stringi)
library(reshape2)

xx_df <- data.frame(Param1 = c("Private Bus,Private Car,Public Bus", "Private Car,Private Van,Public Bus", "Private Car", "Private Bus,Private Car")
          , stringsAsFactors = F)

cbind(xx_df, stringi::stri_split_fixed(xx_df$Param1, ",", simplify = T) ) %>% 
  data.frame(stringsAsFactors = F) %>% 
  reshape2::melt(id.vars = "Param1", na.rm = T) %>% 
  mutate(variable = 1)  %>% filter(value != '') %>% 
  reshape2::dcast(Param1~value, value.var = "variable", fill = 0) %>% 
  data.frame()

result is

                              Param1 Private.Bus Private.Car Private.Van Public.Bus
1            Private Bus,Private Car           1           1           0          0
2 Private Bus,Private Car,Public Bus           1           1           0          1
3                        Private Car           0           1           0          0
4 Private Car,Private Van,Public Bus           0           1           1          1

Convert Comma Separated String to Numeric Columns