Convert comma separated string to integer in R
Here'a an approach using scan
:
oneLine <- "IP1: IP2: 0.1,0.5,0.9"
myVector <- strsplit(oneLine, ":")
listofPValues <- myVector[[1]][[3]]
listofPValues
# [1] " 0.1,0.5,0.9"
scan(text = listofPValues, sep = ",")
# Read 3 items
# [1] 0.1 0.5 0.9
And one using strsplit
:
as.numeric(unlist(strsplit(listofPValues, ",")))
# [1] 0.1 0.5 0.9
How to read data when some numbers contain commas as thousand separator?
I want to use R rather than pre-processing the data as it makes it easier when the data are revised. Following Shane's suggestion of using gsub
, I think this is about as neat as I can do:
x <- read.csv("file.csv",header=TRUE,colClasses="character")
col2cvt <- 15:41
x[,col2cvt] <- lapply(x[,col2cvt],function(x){as.numeric(gsub(",", "", x))})
Read comma separated string to numeric and create dataframe
library(MADAM)
lapply(strsplit(oneLine, ","),
function(x) fisher.method(as.data.frame.list(as.numeric(x))))
#[[1]]
# S num.p p.value p.adj
#1 89.26501 2 0 0
If you have a vector
Lines <- c("0.0001879, 2.2e-16", "0.001435, 1.2e-14", "0.00014353, 2.5e-13")
do.call(rbind,lapply(strsplit(Lines, ","),
function(x) fisher.method(as.data.frame.list(as.numeric(x)))))
# S num.p p.value p.adj
# 1 89.26501 2 0.000000e+00 0.000000e+00
#11 77.20092 2 6.661338e-16 6.661338e-16
#12 75.73256 2 1.443290e-15 1.443290e-15
If you are reading from a file, for example if the contents of file1.csv
is
0.0001879, 2.2e-16
0.001435, 1.2e-14
0.00014353, 2.5e-13
Then, you can use read.csv
to read the file and apply fisher.method
directly on the two column data.frame
dat <- read.csv("file1.csv", header=FALSE)
fisher.method(dat)
# S num.p p.value p.adj
#1 89.26501 2 0.000000e+00 0.000000e+00
#2 77.20092 2 6.661338e-16 9.992007e-16
#3 75.73256 2 1.443290e-15 1.443290e-15
Split a comma separated string into defined number of pieces in R
We could split the text
on comma (','
) and divide them into group of 5.
temp <- strsplit(txt, ",")[[1]]
split(temp, rep(seq_along(temp), each = 5, length.out = length(temp)))
#$`1`
#[1] "120923" "120417" "120416" "105720" "120925"
#$`2`
#[1] "120790" "120792" "120922" "120928" "120930"
#$`3`
#[1] "120918" "120929" "61065" "120421"
If you want them as one concatenated string we can use by
as.character(by(temp, rep(seq_along(temp), each = 5,
length.out = length(temp)), toString))
Convert comma separated string to numeric columns
I think you are looking for the strsplit() function;
a = "2000,1450,1800,2200"
strsplit(a, ",")
[[1]]
[1] "2000" "1450" "1800" "2200"
Notice that strsplit returns a list, in this case with only one element. This is because strsplit takes vectors as input. Therefore, you can also put a long vector of your single cell characters into the function and get back a splitted list of that vector. In a more relevant example this look like:
# Create some example data
dat = data.frame(reaction_time =
apply(matrix(round(runif(100, 1, 2000)),
25, 4), 1, paste, collapse = ","),
stringsAsFactors=FALSE)
splitdat = do.call("rbind", strsplit(dat$reaction_time, ","))
splitdat = data.frame(apply(splitdat, 2, as.numeric))
names(splitdat) = paste("trial", 1:4, sep = "")
head(splitdat)
trial1 trial2 trial3 trial4
1 597 1071 1430 997
2 614 322 1242 1140
3 1522 1679 51 1120
4 225 1988 1938 1068
5 621 623 1174 55
6 1918 1828 136 1816
and finally, to calculate the mean per person:
apply(splitdat, 1, mean)
[1] 1187.50 361.25 963.75 1017.00 916.25 1409.50 730.00 1310.75 1133.75
[10] 851.25 914.75 881.25 889.00 1014.75 676.75 850.50 805.00 1460.00
[19] 901.00 1443.50 507.25 691.50 1090.00 833.25 669.25
Convert numbers in comma-separated string within a data.table column into a long table form
strsplit
and matrix
are both fast, but you're not using them in an optimized manner. Here's the approach I'd suggest:
foo_a5 <- function(DT) {
# unlist the relevant column and use strsplit, but don't make your matrices yet
a <- strsplit(unlist(DT$c, use.names = FALSE), ",", TRUE)
# expand all the other columns of the input data.table...
cbind(DT[rep(seq.int(nrow(DT)), lengths(a)/3), 1:2],
# ... and bind it with your newly formed (single) matrix
matrix(as.integer(unlist(a, use.names=FALSE)),
ncol = 3, byrow = TRUE,
dimnames = list(NULL, c("x", "y", "z"))))
}
foo_a5(DT)
## id b x y z
## 1: 1 10 80 96 40
## 2: 1 10 83 86 12
## 3: 2 92 86 18 38
## 4: 2 92 51 17 80
## 5: 2 92 33 38 23
## 6: 2 92 49 71 97
## 7: 2 92 10 13 70
## 8: 3 76 84 39 86
## 9: 4 81 48 99 8
## 10: 5 56 53 92 27
## 11: 5 56 2 39 62
An alternative to @zx8754's answer that uses a similar logic is the following:
foo_zx2 <- function(DT) {
L <- DT[, list(c = unlist(strsplit(unlist(c, use.names = FALSE), ",", TRUE),
use.names = FALSE)), .(id, b)]
L[, time := rep(c("x", "y", "z"), length.out = nrow(L))][
, dcast(.SD, id + b + rowid(time) ~ time, value.var = "c")]
}
This tests faster than @Ronak's approach for me, but still slower than just using base R.
Return a number as a comma-delimited string
Unfortunately, a number of Renjin packages are not fully implemented, so until a fix is in place, the code in the question is a decent work around:
commas <- function( n ) {
s <- sprintf( "%03.0f", n %% 1000 )
n <- n %/% 1000
while( n > 0 ) {
s <- concat( sprintf( "%03.0f", n %% 1000 ), ',', s )
n <- n %/% 1000
}
gsub( '^0*', '', s )
}
Convert string with comma to integer
How about this?
"1,112".delete(',').to_i
Related Topics
Adding Curved Flight Path Using R's Leaflet Package
Convert from Lowercase to Uppercase All Values in All Character Variables in Dataframe
Passing List of Named Parameters to Function
Force Ggplot2 Scatter Plot to Be Square Shaped
Cor Shows Only Na or 1 for Correlations - Why
Where Should I Put Data for Automated Tests with Testthat
Differencebetween Names and Colnames
"Factor Has New Levels" Error for Variable I'm Not Using
Aesthetics Must Either Be Length One, or the Same Length as the Dataproblems
Figure Captions, References Using Knitr and Markdown to HTML
Subtract a Constant Vector from Each Row in a Matrix in R
How to Draw Gridlines Using Abline() That Are Behind the Data
How to Get Geom_Vline to Honor Facet_Wrap
Cartogram + Choropleth Map in R
Change Stringsasfactors Settings for Data.Frame
Colorize Parts of the Title in a Plot
Programmatically Insert Text, Headers and Lists with R Markdown
Dependency 'Slam' Is Not Available When Installing Tm Package