Collapse / concatenate / aggregate multiple columns to a single comma separated string within each group
We can group by 'A', 'B', and use summarise_at
to paste
all the non-NA elements
library(dplyr)
data %>%
group_by(A, B) %>%
summarise_at(vars(-group_cols()), ~ toString(.[!is.na(.)]))
# A tibble: 2 x 5
# Groups: A [2]
# A B C D E
# <dbl> <dbl> <chr> <chr> <chr>
#1 111 100 1, 2 15, 16, 17 1
#2 222 200 1, 2 18, 19, 20 1
If we need to pass custom delimiter, use paste
or str_c
library(stringr)
data %>%
group_by(A, B) %>%
summarise_at(vars(-group_cols()), ~ str_c(.[!is.na(.)], collapse="_"))
Or using base R
with aggregate
aggregate(. ~ A + B, data, FUN = function(x)
toString(x[!is.na(x)]), na.action = NULL)
Collapse / concatenate / aggregate a column to a single comma separated string within each group
Here are some options using toString
, a function that concatenates a vector of strings using comma and space to separate components. If you don't want commas, you can use paste()
with the collapse
argument instead.
data.table
# alternative using data.table
library(data.table)
as.data.table(data)[, toString(C), by = list(A, B)]
aggregate This uses no packages:
# alternative using aggregate from the stats package in the core of R
aggregate(C ~., data, toString)
sqldf
And here is an alternative using the SQL function group_concat
using the sqldf package :
library(sqldf)
sqldf("select A, B, group_concat(C) C from data group by A, B", method = "raw")
dplyr A dplyr
alternative:
library(dplyr)
data %>%
group_by(A, B) %>%
summarise(test = toString(C)) %>%
ungroup()
plyr
# plyr
library(plyr)
ddply(data, .(A,B), summarize, C = toString(C))
r - Combine the distinct output in 1 row
You can simply use paste
for this:
library(dplyr)
df %>%
group_by(Date) %>%
summarize(Module = paste(Module, collapse = ", "))
Note: If your real data has more columns you might want to resort to mutate
rather than summarize
, if you do just make sure to do %>% distinct()
afterwards
Column values in dataframe to comma separated string
Libraries
library(dplyr)
Data
df <-
structure(list(col1 = c(10L, 8L, 5L, 4L, 6L, 3L), col2 = c(11L,9L, 6L, 3L, 4L, 6L), col3 = c(12L, 10L, 7L, 5L, 7L, 7L), group = c("a","a", "b", "b", "c", "c")), class = "data.frame", row.names = c(NA,-6L)) -20L))
V1 V2
1 fish flounder
2 fish mackerel
3 fish sole
4 cats tabby
5 cats black
Code
df %>%
group_by(V1) %>%
summarise(V2 = paste(V2,collapse = ","))
Result
# A tibble: 2 x 2
V1 V2
<chr> <chr>
1 cats tabby,black
2 fish flounder,mackerel,sole
How to merge the rows of a tibble to collapse cells into a single one
We can use
library(dplyr)
library(stringr)
tab %>%
group_by(category) %>%
summarise(word = str_c(word, collapse ="; "))
-output
# A tibble: 2 x 2
category word
<chr> <chr>
1 CAT1 Lorem; ipsum; dolor; sit; amet
2 CAT2 Consectetur; adipiscing; elit; nam
Collapse a dataframe by pasting together levels within a factor
Use summarise
instead of mutate
:
data %>%
group_by(proposal_number) %>%
summarise(crop_weight = paste0(crop_weight,collapse = ","))
Output:
proposal_number crop_weight
<chr> <chr>
1 Expt 1 Winter Wheat 200g,Winter Barley 200g,Spring Beans 500g
2 Expt 2 Winter Wheat 300g,Spring Beans 100g
Create comma-separated list of unqiue values in dplyr
A possible solution:
library(dplyr)
df %>%
group_by(city) %>%
summarise(item = str_c(item, collapse = ", "))
#> # A tibble: 2 x 2
#> city item
#> <chr> <chr>
#> 1 Berlin A, B
#> 2 Frankfurt A, D, E
Related Topics
Rstudio Suddenly Stopped Showing Plots in the Plot Pane
How to Replace Negative Values in a Dataframe Column With a Different Value
Remove Total Value for One Column in Powerbi
Add Row to a Data Frame With Total Sum for Each Column
Repeat Each Row of Data.Frame the Number of Times Specified in a Column
Aggregating by Unique Identifier and Concatenating Related Values into a String
Formatting Decimal Places in R
Adding Value from One Data.Frame to Another Data.Frame by Matching a Variable
Removing All Empty Columns and Rows in Data.Frame When Rows Don't Go Away
Duplicate Columns in Spark Dataframe
How to Generate a Histogram for Each Column of My Table
Replacing Nas With Latest Non-Na Value
Convert Continuous Numeric Values to Discrete Categories Defined by Intervals