Convert Comma Separated String to Integer in R

Convert comma separated string to integer in R

Here'a an approach using scan:

oneLine <- "IP1: IP2: 0.1,0.5,0.9"
myVector <- strsplit(oneLine, ":")
listofPValues <- myVector[[1]][[3]]
listofPValues
# [1] " 0.1,0.5,0.9"
scan(text = listofPValues, sep = ",")
# Read 3 items
# [1] 0.1 0.5 0.9

And one using strsplit:

as.numeric(unlist(strsplit(listofPValues, ",")))
# [1] 0.1 0.5 0.9

How to read data when some numbers contain commas as thousand separator?

I want to use R rather than pre-processing the data as it makes it easier when the data are revised. Following Shane's suggestion of using gsub, I think this is about as neat as I can do:

x <- read.csv("file.csv",header=TRUE,colClasses="character")
col2cvt <- 15:41
x[,col2cvt] <- lapply(x[,col2cvt],function(x){as.numeric(gsub(",", "", x))})

Read comma separated string to numeric and create dataframe

 library(MADAM)
lapply(strsplit(oneLine, ","),
function(x) fisher.method(as.data.frame.list(as.numeric(x))))
#[[1]]
# S num.p p.value p.adj
#1 89.26501 2 0 0

If you have a vector

  Lines <- c("0.0001879, 2.2e-16", "0.001435, 1.2e-14", "0.00014353, 2.5e-13")
do.call(rbind,lapply(strsplit(Lines, ","),
function(x) fisher.method(as.data.frame.list(as.numeric(x)))))
# S num.p p.value p.adj
# 1 89.26501 2 0.000000e+00 0.000000e+00
#11 77.20092 2 6.661338e-16 6.661338e-16
#12 75.73256 2 1.443290e-15 1.443290e-15

If you are reading from a file, for example if the contents of file1.csv is

 0.0001879, 2.2e-16
0.001435, 1.2e-14
0.00014353, 2.5e-13

Then, you can use read.csv to read the file and apply fisher.method directly on the two column data.frame

 dat <- read.csv("file1.csv", header=FALSE)
fisher.method(dat)
# S num.p p.value p.adj
#1 89.26501 2 0.000000e+00 0.000000e+00
#2 77.20092 2 6.661338e-16 9.992007e-16
#3 75.73256 2 1.443290e-15 1.443290e-15

Split a comma separated string into defined number of pieces in R

We could split the text on comma (',') and divide them into group of 5.

temp <- strsplit(txt, ",")[[1]]
split(temp, rep(seq_along(temp), each = 5, length.out = length(temp)))

#$`1`
#[1] "120923" "120417" "120416" "105720" "120925"

#$`2`
#[1] "120790" "120792" "120922" "120928" "120930"

#$`3`
#[1] "120918" "120929" "61065" "120421"

If you want them as one concatenated string we can use by

as.character(by(temp, rep(seq_along(temp), each  = 5, 
length.out = length(temp)), toString))

Convert comma separated string to numeric columns

I think you are looking for the strsplit() function;

a = "2000,1450,1800,2200"
strsplit(a, ",")
[[1]]
[1] "2000" "1450" "1800" "2200"

Notice that strsplit returns a list, in this case with only one element. This is because strsplit takes vectors as input. Therefore, you can also put a long vector of your single cell characters into the function and get back a splitted list of that vector. In a more relevant example this look like:

# Create some example data
dat = data.frame(reaction_time =
apply(matrix(round(runif(100, 1, 2000)),
25, 4), 1, paste, collapse = ","),
stringsAsFactors=FALSE)
splitdat = do.call("rbind", strsplit(dat$reaction_time, ","))
splitdat = data.frame(apply(splitdat, 2, as.numeric))
names(splitdat) = paste("trial", 1:4, sep = "")
head(splitdat)
trial1 trial2 trial3 trial4
1 597 1071 1430 997
2 614 322 1242 1140
3 1522 1679 51 1120
4 225 1988 1938 1068
5 621 623 1174 55
6 1918 1828 136 1816

and finally, to calculate the mean per person:

apply(splitdat, 1, mean)
[1] 1187.50 361.25 963.75 1017.00 916.25 1409.50 730.00 1310.75 1133.75
[10] 851.25 914.75 881.25 889.00 1014.75 676.75 850.50 805.00 1460.00
[19] 901.00 1443.50 507.25 691.50 1090.00 833.25 669.25

Convert numbers in comma-separated string within a data.table column into a long table form

strsplit and matrix are both fast, but you're not using them in an optimized manner. Here's the approach I'd suggest:

foo_a5 <- function(DT) {
# unlist the relevant column and use strsplit, but don't make your matrices yet
a <- strsplit(unlist(DT$c, use.names = FALSE), ",", TRUE)
# expand all the other columns of the input data.table...
cbind(DT[rep(seq.int(nrow(DT)), lengths(a)/3), 1:2],
# ... and bind it with your newly formed (single) matrix
matrix(as.integer(unlist(a, use.names=FALSE)),
ncol = 3, byrow = TRUE,
dimnames = list(NULL, c("x", "y", "z"))))
}

foo_a5(DT)
## id b x y z
## 1: 1 10 80 96 40
## 2: 1 10 83 86 12
## 3: 2 92 86 18 38
## 4: 2 92 51 17 80
## 5: 2 92 33 38 23
## 6: 2 92 49 71 97
## 7: 2 92 10 13 70
## 8: 3 76 84 39 86
## 9: 4 81 48 99 8
## 10: 5 56 53 92 27
## 11: 5 56 2 39 62

An alternative to @zx8754's answer that uses a similar logic is the following:

foo_zx2 <- function(DT) {
L <- DT[, list(c = unlist(strsplit(unlist(c, use.names = FALSE), ",", TRUE),
use.names = FALSE)), .(id, b)]
L[, time := rep(c("x", "y", "z"), length.out = nrow(L))][
, dcast(.SD, id + b + rowid(time) ~ time, value.var = "c")]
}

This tests faster than @Ronak's approach for me, but still slower than just using base R.

Return a number as a comma-delimited string

Unfortunately, a number of Renjin packages are not fully implemented, so until a fix is in place, the code in the question is a decent work around:

commas <- function( n ) {
s <- sprintf( "%03.0f", n %% 1000 )
n <- n %/% 1000

while( n > 0 ) {
s <- concat( sprintf( "%03.0f", n %% 1000 ), ',', s )
n <- n %/% 1000
}

gsub( '^0*', '', s )
}

Convert string with comma to integer

How about this?

 "1,112".delete(',').to_i


Related Topics



Leave a reply



Submit