Aggregating by unique identifier and concatenating related values into a string
Let's call your data.frame DF
> aggregate(id ~ brand, data = DF, c)
brand id
1 RadioShack 2308
2 Rag & Bone 4466
3 Ragu 1830, 4518
4 Ralph Lauren 1638, 2719, 2720, 2721, 2722
Another alternative using aggregate
is:
result <- aggregate(id ~ brand, data = DF, paste, collapse = ",")
This produces the same result and now id
is not a list
anymore. Thanks to @Frank comment. To see the class
of each column try:
> sapply(result, class)
brand id
"factor" "character"
As mentioned by @DavidArenburg in the comments, another alternative is using the toString
function:
aggregate(id ~ brand, data = DF, toString)
Collapse / concatenate / aggregate a column to a single comma separated string within each group
Here are some options using toString
, a function that concatenates a vector of strings using comma and space to separate components. If you don't want commas, you can use paste()
with the collapse
argument instead.
data.table
# alternative using data.table
library(data.table)
as.data.table(data)[, toString(C), by = list(A, B)]
aggregate This uses no packages:
# alternative using aggregate from the stats package in the core of R
aggregate(C ~., data, toString)
sqldf
And here is an alternative using the SQL function group_concat
using the sqldf package :
library(sqldf)
sqldf("select A, B, group_concat(C) C from data group by A, B", method = "raw")
dplyr A dplyr
alternative:
library(dplyr)
data %>%
group_by(A, B) %>%
summarise(test = toString(C)) %>%
ungroup()
plyr
# plyr
library(plyr)
ddply(data, .(A,B), summarize, C = toString(C))
Group by unique value with data.table
You can use paste0
I think as below, you need to change the date to character so that it doesn't coverted to its numeric counterpart, running below query without converting dates to numeric will give you values like, 12166, 12187. In your query you are also using "c" as to combine the objects, however we should use paste to combine, also in data.table when you use .(id) in by it should give you unique values on by items unless your query have something which is not making things unique for example in this case if you avoid the collapse statement you won't get the unique keys on ID, I hope this is helpful. Thanks:
df_agr <- aggregate(cbind(progress, as.character(dates)) ~ id, data = df, FUN = paste0)
> df_agr
id progress V2
1 3505H62168 no, yes 2003-04-24, 2003-05-15
2 3505H65277 yes 2003-02-11
3 3505H67158 yes, yes 2003-01-14, 2003-02-18
4 3505H67862 yes 2003-03-06
5 3505H6856 yes, yes, yes, yes 2003-07-10, 2003-08-21, 2003-09-04, 2003-10-16
>
Using data.table:
setDT(df)[,.(paste0(progress,collapse=","), paste0(as.character(dates),collapse=",")), by = .(id)]
id V1 V2
1: 3505H6856 yes,yes,yes,yes 2003-07-10,2003-08-21,2003-09-04,2003-10-16
2: 3505H67158 yes,yes 2003-01-14,2003-02-18
3: 3505H67862 yes 2003-03-06
4: 3505H62168 no,yes 2003-04-24,2003-05-15
5: 3505H65277 yes 2003-02-11
OR just pointed out by David Arenberg, much easier way in data.table is, Thanks for valuable comments:
setDT(df)[, lapply(.SD, toString), by = id]
How to collapse dataframe rows if entry is the same?
You can use aggregate
with paste
setting collapse = "; "
aggregate(b ~ a, dataframe, paste, collapse = "; ")
# a b
#1 cat siamese; burmese
#2 dog chow; husky
Is there a concise way to group elements of a column that share a unique element in a different column?
A possible solution, based on tidyverse
:
library(tidyverse)
df <- data.frame(
stringsAsFactors = FALSE,
ID = c(1L, 2L, 3L, 2L, 4L, 1L),
Class = c("A", "B", "A", "A", "B", "C")
)
df %>%
group_by(Class) %>%
summarise(ID = str_c(ID, collapse = ","))
#> # A tibble: 3 × 2
#> Class ID
#> <chr> <chr>
#> 1 A 1,3,2
#> 2 B 2,4
#> 3 C 1
Related Topics
Elegant Way to Check For Missing Packages and Install Them
Removing Duplicate Combinations (Irrespective of Order)
Position Geom_Text on Dodged Barplot
Replacing Character Values With Na in a Data Frame
How to Find Common Elements from Multiple Vectors
How to Delete a Row by Reference in Data.Table
Force R Not to Use Exponential Notation (E.G. E+10)
Add Column Values Based on Other Columns in Data Frame Using for and If
How to Subset Matrix to One Column, Maintain Matrix Data Type, Maintain Row/Column Names
Stratified Random Sampling from Data Frame
Create Stacked Barplot Where Each Stack Is Scaled to Sum to 100%
Convert a List to a Data Frame
Error: C Stack Usage Is Too Close to the Limit
Fitting a Linear Model With Multiple Lhs
Categorize Numeric Variable into Group/ Bins/ Breaks
R: Rjava Package Install Failing