Convert Comma Separated String to Numeric Columns

Convert comma separated string to numeric columns

I think you are looking for the strsplit() function;

a = "2000,1450,1800,2200"
strsplit(a, ",")
[[1]]
[1] "2000" "1450" "1800" "2200"

Notice that strsplit returns a list, in this case with only one element. This is because strsplit takes vectors as input. Therefore, you can also put a long vector of your single cell characters into the function and get back a splitted list of that vector. In a more relevant example this look like:

# Create some example data
dat = data.frame(reaction_time =
apply(matrix(round(runif(100, 1, 2000)),
25, 4), 1, paste, collapse = ","),
stringsAsFactors=FALSE)
splitdat = do.call("rbind", strsplit(dat$reaction_time, ","))
splitdat = data.frame(apply(splitdat, 2, as.numeric))
names(splitdat) = paste("trial", 1:4, sep = "")
head(splitdat)
trial1 trial2 trial3 trial4
1 597 1071 1430 997
2 614 322 1242 1140
3 1522 1679 51 1120
4 225 1988 1938 1068
5 621 623 1174 55
6 1918 1828 136 1816

and finally, to calculate the mean per person:

apply(splitdat, 1, mean)
[1] 1187.50 361.25 963.75 1017.00 916.25 1409.50 730.00 1310.75 1133.75
[10] 851.25 914.75 881.25 889.00 1014.75 676.75 850.50 805.00 1460.00
[19] 901.00 1443.50 507.25 691.50 1090.00 833.25 669.25

Read comma separated string to numeric and create dataframe

 library(MADAM)
lapply(strsplit(oneLine, ","),
function(x) fisher.method(as.data.frame.list(as.numeric(x))))
#[[1]]
# S num.p p.value p.adj
#1 89.26501 2 0 0

If you have a vector

  Lines <- c("0.0001879, 2.2e-16", "0.001435, 1.2e-14", "0.00014353, 2.5e-13")
do.call(rbind,lapply(strsplit(Lines, ","),
function(x) fisher.method(as.data.frame.list(as.numeric(x)))))
# S num.p p.value p.adj
# 1 89.26501 2 0.000000e+00 0.000000e+00
#11 77.20092 2 6.661338e-16 6.661338e-16
#12 75.73256 2 1.443290e-15 1.443290e-15

If you are reading from a file, for example if the contents of file1.csv is

 0.0001879, 2.2e-16
0.001435, 1.2e-14
0.00014353, 2.5e-13

Then, you can use read.csv to read the file and apply fisher.method directly on the two column data.frame

 dat <- read.csv("file1.csv", header=FALSE)
fisher.method(dat)
# S num.p p.value p.adj
#1 89.26501 2 0.000000e+00 0.000000e+00
#2 77.20092 2 6.661338e-16 9.992007e-16
#3 75.73256 2 1.443290e-15 1.443290e-15

Convert comma separated string to integer in R

Here'a an approach using scan:

oneLine <- "IP1: IP2: 0.1,0.5,0.9"
myVector <- strsplit(oneLine, ":")
listofPValues <- myVector[[1]][[3]]
listofPValues
# [1] " 0.1,0.5,0.9"
scan(text = listofPValues, sep = ",")
# Read 3 items
# [1] 0.1 0.5 0.9

And one using strsplit:

as.numeric(unlist(strsplit(listofPValues, ",")))
# [1] 0.1 0.5 0.9

How to convert comma separated numbers from a dataframe to to numbers and get the avg value

You can simply define a function that unpack those values and then get the mean of those.

def get_mean(x):
#split into list of strings
splited = x.split(',')
#Transform into numbers
y = [float(n) for n in splited]
return sum(y)/len(y)

#Apply on desired column
df['col'] = df['col'].apply(get_mean)

Scala - Convert column having comma separated numbers (currently string) to Array of Double in Dataframe

split + cast should do the job:

import org.apache.spark.sql.functions.{col, split}

val df = Seq(("619.619620621622, 123.12412512699")).toDF("MyCol")

val df2 = df.withColumn("myCol", split(col("MyCol"), ",").cast("array<double>"))

df2.printSchema

//root
// |-- myCol: array (nullable = true)
// | |-- element: double (containsNull = true)

Convert numbers in comma-separated string within a data.table column into a long table form

strsplit and matrix are both fast, but you're not using them in an optimized manner. Here's the approach I'd suggest:

foo_a5 <- function(DT) {
# unlist the relevant column and use strsplit, but don't make your matrices yet
a <- strsplit(unlist(DT$c, use.names = FALSE), ",", TRUE)
# expand all the other columns of the input data.table...
cbind(DT[rep(seq.int(nrow(DT)), lengths(a)/3), 1:2],
# ... and bind it with your newly formed (single) matrix
matrix(as.integer(unlist(a, use.names=FALSE)),
ncol = 3, byrow = TRUE,
dimnames = list(NULL, c("x", "y", "z"))))
}

foo_a5(DT)
## id b x y z
## 1: 1 10 80 96 40
## 2: 1 10 83 86 12
## 3: 2 92 86 18 38
## 4: 2 92 51 17 80
## 5: 2 92 33 38 23
## 6: 2 92 49 71 97
## 7: 2 92 10 13 70
## 8: 3 76 84 39 86
## 9: 4 81 48 99 8
## 10: 5 56 53 92 27
## 11: 5 56 2 39 62

An alternative to @zx8754's answer that uses a similar logic is the following:

foo_zx2 <- function(DT) {
L <- DT[, list(c = unlist(strsplit(unlist(c, use.names = FALSE), ",", TRUE),
use.names = FALSE)), .(id, b)]
L[, time := rep(c("x", "y", "z"), length.out = nrow(L))][
, dcast(.SD, id + b + rowid(time) ~ time, value.var = "c")]
}

This tests faster than @Ronak's approach for me, but still slower than just using base R.

R: converting comma separated entry to columns with non-characters

In 2 steps :

  1. You can use read.table with fill=TRUE, to read all lines (You can also use readLines)
  2. treat without commas as seprator.

The code is something like this :

aa <- read.table(text='John, Doe
Rebecca, Homes
Organization LLC',sep=',',fill=TRUE,colClasses='character')

## treat lines without comma
aa[nchar(aa$V2) ==0,] <-
do.call(rbind,strsplit(aa[nchar(aa$V2) ==0,]$V1,' ')) ## space as separator :I assume you
don't have compound name

> aa
V1 V2
1 John Doe
2 Rebecca Homes
3 Organization LLC

EDIT better method : I use a reglar expression to replace any space by a comma to have regular separator. I assume that you don't have any compound name.

ff <- readLines(textConnection('John, Doe
Rebecca, Homes
Organization LLC'))
do.call(rbind,
strsplit(gsub('[ ]|, |,[ ]',',',ff),','))

How to change comma separated values in a single element to multiple columns and assign numeric coding

dplyr, stringi and reshape2 will do all the work you need

install.packages("dplyr")
install.packages("stringi")
install.packages("reshape2")

library(dplyr)
library(stringi)
library(reshape2)

xx_df <- data.frame(Param1 = c("Private Bus,Private Car,Public Bus", "Private Car,Private Van,Public Bus", "Private Car", "Private Bus,Private Car")
, stringsAsFactors = F)

cbind(xx_df, stringi::stri_split_fixed(xx_df$Param1, ",", simplify = T) ) %>%
data.frame(stringsAsFactors = F) %>%
reshape2::melt(id.vars = "Param1", na.rm = T) %>%
mutate(variable = 1) %>% filter(value != '') %>%
reshape2::dcast(Param1~value, value.var = "variable", fill = 0) %>%
data.frame()

result is

                              Param1 Private.Bus Private.Car Private.Van Public.Bus
1 Private Bus,Private Car 1 1 0 0
2 Private Bus,Private Car,Public Bus 1 1 0 1
3 Private Car 0 1 0 0
4 Private Car,Private Van,Public Bus 0 1 1 1


Related Topics



Leave a reply



Submit