Collapse / concatenate / aggregate multiple columns to a single comma separated string within each group
We can group by 'A', 'B', and use summarise_at
to paste
all the non-NA elements
library(dplyr)
data %>%
group_by(A, B) %>%
summarise_at(vars(-group_cols()), ~ toString(.[!is.na(.)]))
# A tibble: 2 x 5
# Groups: A [2]
# A B C D E
# <dbl> <dbl> <chr> <chr> <chr>
#1 111 100 1, 2 15, 16, 17 1
#2 222 200 1, 2 18, 19, 20 1
If we need to pass custom delimiter, use paste
or str_c
library(stringr)
data %>%
group_by(A, B) %>%
summarise_at(vars(-group_cols()), ~ str_c(.[!is.na(.)], collapse="_"))
Or using base R
with aggregate
aggregate(. ~ A + B, data, FUN = function(x)
toString(x[!is.na(x)]), na.action = NULL)
Collapse / concatenate / aggregate a column to a single comma separated string within each group
Here are some options using toString
, a function that concatenates a vector of strings using comma and space to separate components. If you don't want commas, you can use paste()
with the collapse
argument instead.
data.table
# alternative using data.table
library(data.table)
as.data.table(data)[, toString(C), by = list(A, B)]
aggregate This uses no packages:
# alternative using aggregate from the stats package in the core of R
aggregate(C ~., data, toString)
sqldf
And here is an alternative using the SQL function group_concat
using the sqldf package :
library(sqldf)
sqldf("select A, B, group_concat(C) C from data group by A, B", method = "raw")
dplyr A dplyr
alternative:
library(dplyr)
data %>%
group_by(A, B) %>%
summarise(test = toString(C)) %>%
ungroup()
plyr
# plyr
library(plyr)
ddply(data, .(A,B), summarize, C = toString(C))
How to merge the rows of a tibble to collapse cells into a single one
We can use
library(dplyr)
library(stringr)
tab %>%
group_by(category) %>%
summarise(word = str_c(word, collapse ="; "))
-output
# A tibble: 2 x 2
category word
<chr> <chr>
1 CAT1 Lorem; ipsum; dolor; sit; amet
2 CAT2 Consectetur; adipiscing; elit; nam
Concatenate several columns to comma separated strings by group
You can use aggregate
with paste
for each one and merge
at the end:
x <- structure(list(SNP = structure(c(1L, 1L, 2L, 3L, 4L, 4L, 5L,
5L), .Label = c("chr1.111642529", "chr1.111801684", "chr1.111925084",
"chr1.11801605", "chr1.151220354"), class = "factor"), hu_mRNA = structure(c(3L,
4L, 2L, 7L, 1L, 8L, 5L, 6L), .Label = c("AK027740", "BC098118",
"NM_002107", "NM_005324", "NM_018913", "NM_018918", "NM_020435",
"NM_032849"), class = "factor"), gene = structure(c(4L, 5L, 1L,
3L, 1L, 2L, 6L, 7L), .Label = c("<NA>", "C13orf33", "GJC2", "H3F3A",
"H3F3B", "PCDHGA10", "PCDHGA5"), class = "factor")), .Names = c("SNP",
"hu_mRNA", "gene"), class = "data.frame", row.names = c(NA, -8L
))
a1 <- aggregate(hu_mRNA~SNP,data=x,paste,sep=",")
a2 <- aggregate(gene~SNP,data=x,paste,sep=",")
merge(a1,a2)
SNP hu_mRNA gene
1 chr1.111642529 NM_002107, NM_005324 H3F3A, H3F3B
2 chr1.111801684 BC098118 <NA>
3 chr1.111925084 NM_020435 GJC2
4 chr1.11801605 AK027740, NM_032849 <NA>, C13orf33
5 chr1.151220354 NM_018913, NM_018918 PCDHGA10, PCDHGA5
r - Combine the distinct output in 1 row
You can simply use paste
for this:
library(dplyr)
df %>%
group_by(Date) %>%
summarize(Module = paste(Module, collapse = ", "))
Note: If your real data has more columns you might want to resort to mutate
rather than summarize
, if you do just make sure to do %>% distinct()
afterwards
Related Topics
R Programming: Read.Csv() Skips Lines Unexpectedly
How to Merge Two Data Frame Based on Partial String Match with R
Logistic Regression: How to Try Every Combination of Predictors in R
Reshape Data from Wide to Long
Count Number of Distinct Values in a Vector
Change Standard Error Color for Geom_Smooth
Shiny Ui.R - Error in Tag("Div", List(...)) - Not Sure Where Error Is
Error in Install.Packages:Type =="Both" Cannot Be Used with 'Repos =Null'
Changing the Order of Dodged Bars in Ggplot2 Barplot
Place 1 Heatmap on Another with Transparency in R
Scraping JavaScript Generated Data
Replace Na with Grouped Means in R
Importing Multiple .CSV Files with Variable Column Types into R
How to Unlock Environment in R