Data Table String Concatenation of Sd Columns for by Group Values

data table string concatenation of SD columns for by group values

You can concatenate all columns in using lapply.

dt[, lapply(.SD, paste0, collapse=" "), by = ID]
## ID a b
## 1: 1 a b c A B C
## 2: 2 d e f g D E F G
## 3: 3 h i j H I J

Using newline characters as a ollapse argument instead of " " does work, but does not print as you seem to expect in your desired output.

dt[, lapply(.SD, paste0, collapse="\n"), by = ID]
## ID a b
## 1: 1 a\nb\nc A\nB\nC
## 2: 2 d\ne\nf\ng D\nE\nF\nG
## 3: 3 h\ni\nj H\nI\nJ

As pointed out in the comments by @Frank, the question has been changed to have , as a seperator instead of \n. Of course you can just change the collapse argument to ",". If you want to have a space as well ", ", then the solution by @DavidArenburg is preferable.

dt[, lapply(.SD, paste0, collapse=","), by = ID]
dt[, lapply(.SD, toString), by = ID]

R data table group values into vector by group

We need a list column. The .( is a concise syntax for list( in data.table.

dat[, .(j = .(j)), by = .(gr, day)]

-output

   gr day     j
1: 9 1 3,8
2: 9 2 9,11
3: 10 1 10,28
4: 10 2 5

i.e. it can be otherwise written as

dat[, list(j = list(j)), .(gr, day)]
gr day j
1: 9 1 3,8
2: 9 2 9,11
3: 10 1 10,28
4: 10 2 5

concatenate names and values across columns in data.table, row by row

We could paste the corresponding names with Map

d[,
.(x, y, labs = do.call(paste, c(Map(function(u, v)
paste0(v, ": ", u), .SD, labs_to_get), sep = ", "))),
.SDcols = labs_to_get
]

-output

     x     y       labs
<num> <num> <char>
1: 1 3 x: 1, y: 3
2: 2 4 x: 2, y: 4

or another option is write.dcf

d[, labs := do.call(paste, 
c(as.list(setdiff(capture.output(write.dcf(.SD)), "")),
sep = ", ")), 1:nrow(d)]
> d
x y labs
<num> <num> <char>
1: 1 3 x: 1, y: 3
2: 2 4 x: 2, y: 4

Or use apply to loop over the rows

d[, labs := apply(.SD, 1, \(x) paste(names(x), x, sep = ": ", 
collapse = ", ")), .SDcols = labs_to_get]

Or using tidyverse

library(dplyr)
library(purrr)
library(stringr)
d %>%
mutate(labs = invoke(str_c, c(across(all_of(labs_to_get),
~str_c(cur_column(), ": ", .x)), sep = ", ")))
x y labs
<num> <num> <char>
1: 1 3 x: 1, y: 3
2: 2 4 x: 2, y: 4

R data.table join with concatenation of rows for a column

Another solution based data.table:

dt2[dt1[, paste(Name, collapse="; "), by=Id], Name := i.V1, on="Id"]
dt2

Id Flavor Name
1: 1 Sweet Apple; Orange
2: 2 Bland Banana

concatenate values across columns in data.table, row by row

You can use do.call(), using .SDcols to supply the columns.

x[, key_ := do.call(paste, c(.SD, sep = "_")), .SDcols = names(x)]

.SDcols = names(x) supplies all the columns of x. You can supply any vector of names or column numbers there.

Concatenate varying number of columns by partial match (R)

You can use the Reduce function to paste selected columns together via specifying the columns by grep in the .SD syntax. Here is an example of getting the results using data.table package:

library(stringi); library(data.table)
myTable2[, paste(stri_trans_totitle(whatToMatch), "final", sep = "_") :=
lapply(whatToMatch, function(wtm) Reduce(function(x,y) paste(x, y, sep = ""),
.SD[, grep(wtm, names(myTable2)), with = F]))]

myTable2
# herenow before1 before2 before3 after1 after2 after3 Before_final After_final
# 1: 0.3399679 if and where not here blank ifandwhere nothereblank
# 2: 0.8181909 for in by through blank blank forinby throughblankblank
# 3: 0.2237681 and where mine yours ours andwhere mineyoursours
# 4: 0.6161998 and where ha hey hon andwhere haheyhon
# 5: 0.7606252 fifth eighth and where not beet fiftheighthand wherenotbeet
# 6: 0.5525105 and where not fill are andwherenot filler

Some benchmark of do.call and Reduce:

dim(myTable2)
# [1] 1572864 9

reduce <- function() myTable2[, paste(stri_trans_totitle(whatToMatch[1:2]), "final", sep = "_") := lapply(whatToMatch[1:2], function(wtm) Reduce(function(x,y) paste(x, y, sep = ""), .SD[, grep(wtm, names(myTable2)), with = F]))]
docall <- function() myTable2[, paste(stri_trans_totitle(whatToMatch[1:2]), "final", sep = "_") := lapply(whatToMatch[1:2], function(wtm) do.call(paste, c(sep = "", .SD[, grep(wtm, names(myTable2)), with = F])))]

microbenchmark::microbenchmark(docall(), reduce(), times = 10)
# Unit: milliseconds
# expr min lq mean median uq max neval
# docall() 707.7818 722.6037 767.8923 737.6272 852.4909 868.8202 10
# reduce() 999.4925 1009.5146 1026.6200 1020.4637 1046.7073 1067.7479 10

Concatenate strings by group with dplyr for multiple columns

For these purposes, there are the summarise_all, summarise_at, and summarise_if functions. Using summarise_all:

df %>%
group_by(Sample) %>%
summarise_all(funs(paste(na.omit(.), collapse = ",")))
# A tibble: 3 × 5
Sample group Gene1 Gene2 Gene3
<chr> <chr> <chr> <chr> <chr>
1 A 1,2 a,b
2 B 1 c
3 C 1,2,3 a,b,c d,e

How can apply a function using data-table?

You could try this:

as.data.table(mpg)[,paste(unique(manufacturer),collapse="_"),by=fl]

Or, if your function is more elaborate you could write it separately:

myfun <- function(x){
u_x <- unique(x)
return(paste(u_x,collapse="_"))
}

res <- as.data.table(mpg)[,myfun(manufacturer),by=fl]


Related Topics



Leave a reply



Submit