Using: = in Data.Table with Paste()

paste two data.table columns

Arun's comment answered this question:

dt[,new:=paste0(A,B)]

Using := in data.table with paste()

## Start with 1st three columns of example data
dt <- exampleTable[,1:3]

## Run for 1st five years
nYears <- 5
for(ii in seq_len(nYears)-1) {
y0 <- as.symbol(paste0("popYears", ii))
y1 <- paste0("popYears", ii+1)
dt[, (y1) := eval(y0)*growthRate]
}

## Check that it worked
dt
# Site growthRate popYears0 popYears1 popYears2 popYears3 popYears4 popYears5
#1: Site 1 1.1 10 11.0 12.10 13.310 14.6410 16.10510
#2: Site 2 1.2 12 14.4 17.28 20.736 24.8832 29.85984
#3: Site 3 1.3 13 16.9 21.97 28.561 37.1293 48.26809

Edit:

Because the possibility of speeding this up using set() keeps coming up in the comments, I'll throw this additional option out there.

nYears <- 5

## Things that only need to be calculated once can be taken out of the loop
r <- dt[["growthRate"]]
yy <- paste0("popYears", seq_len(nYears+1)-1)

## A loop using set() and data.table's nice compact syntax
for(ii in seq_len(nYears)) {
set(dt, , yy[ii+1], r*dt[[yy[ii]]])
}

## Check results
dt
# Site growthRate popYears0 popYears1 popYears2 popYears3 popYears4 popYears5
#1: Site 1 1.1 10 11.0 12.10 13.310 14.6410 16.10510
#2: Site 2 1.2 12 14.4 17.28 20.736 24.8832 29.85984
#3: Site 3 1.3 13 16.9 21.97 28.561 37.1293 48.26809

Use paste inside datatables, with a string vector as an input

If you pass a vector to .SDcols, .SD is a data frame (and therefore list) of those columns. You can't directly paste a data frame usefully, which is why the original code fails.

You can, however, use do.call to invoke a function like paste on a list to be passed as parameters, e.g.

library(data.table)

# passing parameters directly to `paste` works...
paste(x = c('a', 'b'), y = c(1, 2))
#> [1] "a 1" "b 2"

# ...but passing it a data frame gets weird (working in series instead of parallel)...
paste(data.table(x = c('a', 'b'), y = c(1, 2)))
#> [1] "c(\"a\", \"b\")" "c(1, 2)"

# ...so `do.call` turns the call here into the first version
do.call(paste, data.table(x = c('a', 'b'), y = c(1, 2)))
#> [1] "a 1" "b 2"

In context, then,

data(iris)
setDT(iris)
cols <- c("Species", "Petal.Width")

iris[, pasted := do.call(paste, .SD), .SDcols = cols]

iris[, c(cols, "pasted"), with = FALSE]
#> Species Petal.Width pasted
#> 1: setosa 0.2 setosa 0.2
#> 2: setosa 0.2 setosa 0.2
#> 3: setosa 0.2 setosa 0.2
#> 4: setosa 0.2 setosa 0.2
#> 5: setosa 0.2 setosa 0.2
#> ---
#> 146: virginica 2.3 virginica 2.3
#> 147: virginica 1.9 virginica 1.9
#> 148: virginica 2.0 virginica 2
#> 149: virginica 2.3 virginica 2.3
#> 150: virginica 1.8 virginica 1.8

Alternatives to using .SDcols are the experimental .. notation:

iris[, pasted := do.call(paste, .SD[, ..cols])]

or Ananda's elegant mget, which returns a list of the variable whose names you pass it:

iris[, pasted := do.call(paste, mget(cols))]

All return the same thing.

Paste two character columns with `data.table`

Just use sep as parameter to paste() instead of collapse:

dt[, new := paste(A, B, sep = ".")]
dt
# L A B new
#1: 1 g l g.l
#2: 2 h m h.m
#3: 3 i n i.n
#4: 4 j o j.o
#5: 5 k p k.p

paste0() doesn't honor the sep parameter (see ?paste0).

R Using paste in data.table to subset variable number of columns and calculate rowMeans

I believe this solves the problem you were having with paste0:

tmp  <- paste0("TRAVELTIME", dt$minhr, "." , dt$minhr+1, "avg")
tmp1 <- paste0("TRAVELTIME", dt$maxhr, "." , dt$maxhr+1, "avg")
dt1 <- dt[,avg:=rowMeans(.SD[,get(tmp):get(tmp1), with=FALSE]),by=.(dt$id, dt$seqid)]

Someone will probably point out that you don't strictly need the $ in the last line, but due to the nature of the problem you were having I felt this was useful for identifying and solving the problem.

Using set() from data.table package to copy and paste values from a data frame to another, within a loop of data frame creation

I'm not quite sure why (i suspect assign), but it seems that your data.frame (A_id, B_id...) are linked in are not different, they are just different names pointing to the same object in RAM.

A work around is to use data.table::copy to make a copy in RAM of the object.

for (i in 1:length(letters)){
assign(paste0(letters[i],"_id"), copy(basedata))
set(get(paste0(letters[i],"_id")), NULL,j = 1L, value = Values_df[,i]) #PROBLEM
}

NB: It will solve your problem, but as @MichaelChirico said cluttering your namespace with loads of tiny tables is probably the wrong way to do this.

References: As suggested by @○Frank, here is a reference on copy versus reference of data.table objects.

paste, by and data.table in r

For completeness' sake, an official answer:

If you use paste(y,collapse=",") instead, it should work.

How to paste in data.table using a vector as column reference?

We could use get

dt[, (var) := paste0('id_', get(var))]

-output

> dt
id amount
1: id_a 1
2: id_b 2
3: id_c 3
4: id_d 4
5: id_e 5
6: id_f 6
7: id_g 7
8: id_h 8
9: id_i 9
10: id_j 10

Or the standard way is .SD or .SDcols

dt[, (var) := paste0('id_', .SD[[var]])]


Related Topics



Leave a reply



Submit