Aggregating by Unique Identifier and Concatenating Related Values into a String

Aggregating by unique identifier and concatenating related values into a string

Let's call your data.frame DF

> aggregate(id ~ brand, data = DF, c)
brand id
1 RadioShack 2308
2 Rag & Bone 4466
3 Ragu 1830, 4518
4 Ralph Lauren 1638, 2719, 2720, 2721, 2722

Another alternative using aggregate is:

result <- aggregate(id ~ brand, data = DF, paste, collapse = ",")

This produces the same result and now id is not a list anymore. Thanks to @Frank comment. To see the class of each column try:

> sapply(result, class)
brand id
"factor" "character"

As mentioned by @DavidArenburg in the comments, another alternative is using the toString function:

aggregate(id ~ brand, data = DF, toString)

Collapse / concatenate / aggregate a column to a single comma separated string within each group

Here are some options using toString, a function that concatenates a vector of strings using comma and space to separate components. If you don't want commas, you can use paste() with the collapse argument instead.

data.table

# alternative using data.table
library(data.table)
as.data.table(data)[, toString(C), by = list(A, B)]

aggregate This uses no packages:

# alternative using aggregate from the stats package in the core of R
aggregate(C ~., data, toString)

sqldf

And here is an alternative using the SQL function group_concat using the sqldf package :

library(sqldf)
sqldf("select A, B, group_concat(C) C from data group by A, B", method = "raw")

dplyr A dplyr alternative:

library(dplyr)
data %>%
group_by(A, B) %>%
summarise(test = toString(C)) %>%
ungroup()

plyr

# plyr
library(plyr)
ddply(data, .(A,B), summarize, C = toString(C))

Group by unique value with data.table

You can use paste0 I think as below, you need to change the date to character so that it doesn't coverted to its numeric counterpart, running below query without converting dates to numeric will give you values like, 12166, 12187. In your query you are also using "c" as to combine the objects, however we should use paste to combine, also in data.table when you use .(id) in by it should give you unique values on by items unless your query have something which is not making things unique for example in this case if you avoid the collapse statement you won't get the unique keys on ID, I hope this is helpful. Thanks:

df_agr <- aggregate(cbind(progress, as.character(dates)) ~ id, data = df, FUN = paste0)

> df_agr
id progress V2
1 3505H62168 no, yes 2003-04-24, 2003-05-15
2 3505H65277 yes 2003-02-11
3 3505H67158 yes, yes 2003-01-14, 2003-02-18
4 3505H67862 yes 2003-03-06
5 3505H6856 yes, yes, yes, yes 2003-07-10, 2003-08-21, 2003-09-04, 2003-10-16
>

Using data.table:

setDT(df)[,.(paste0(progress,collapse=","), paste0(as.character(dates),collapse=",")), by = .(id)]


id V1 V2
1: 3505H6856 yes,yes,yes,yes 2003-07-10,2003-08-21,2003-09-04,2003-10-16
2: 3505H67158 yes,yes 2003-01-14,2003-02-18
3: 3505H67862 yes 2003-03-06
4: 3505H62168 no,yes 2003-04-24,2003-05-15
5: 3505H65277 yes 2003-02-11

OR just pointed out by David Arenberg, much easier way in data.table is, Thanks for valuable comments:

setDT(df)[, lapply(.SD, toString), by = id]

How to collapse dataframe rows if entry is the same?

You can use aggregate with paste setting collapse = "; "

aggregate(b ~ a, dataframe, paste, collapse = "; ")
# a b
#1 cat siamese; burmese
#2 dog chow; husky

Is there a concise way to group elements of a column that share a unique element in a different column?

A possible solution, based on tidyverse:

library(tidyverse)

df <- data.frame(
stringsAsFactors = FALSE,
ID = c(1L, 2L, 3L, 2L, 4L, 1L),
Class = c("A", "B", "A", "A", "B", "C")
)

df %>%
group_by(Class) %>%
summarise(ID = str_c(ID, collapse = ","))

#> # A tibble: 3 × 2
#> Class ID
#> <chr> <chr>
#> 1 A 1,3,2
#> 2 B 2,4
#> 3 C 1


Related Topics



Leave a reply



Submit