Collapse Text by Group in Data Frame

Collapse text by group in data frame

Simply use aggregate :

aggregate(df$text, list(df$group), paste, collapse="")
## Group.1 x
## 1 a a1a2a3
## 2 b b1b2
## 3 c c1c2c3

Or with plyr

library(plyr)
ddply(df, .(group), summarize, text=paste(text, collapse=""))
## group text
## 1 a a1a2a3
## 2 b b1b2
## 3 c c1c2c3

ddply is faster than aggregate if you have a large dataset.

EDIT :
With the suggestion from @SeDur :

aggregate(text ~ group, data = df, FUN = paste, collapse = "")
## group text
## 1 a a1a2a3
## 2 b b1b2
## 3 c c1c2c3

For the same result with earlier method you have to do :

aggregate(x=list(text=df$text), by=list(group=df$group), paste, collapse="")

EDIT2 : With data.table :

library("data.table")
dt <- as.data.table(df)
dt[, list(text = paste(text, collapse="")), by = group]
## group text
## 1: a a1a2a3
## 2: b b1b2
## 3: c c1c2c3

Pyspark - collapse all columns in dataframe by group variable

You can use dtypes to classify, group by string and date type columns, and aggregate numeric columns respectively.

df = df.groupBy(*[t[0] for t in df.dtypes if t[1] in ('string', 'date')]) \
.agg(*[F.sum(t[0]).alias(t[0]) for t in df.dtypes if t[1] not in ('string', 'date')])
df.printSchema()
df.show(truncate=False)

How to collapse dataframe by grouping and concatenating on one column

As @thelatemail pointed out, you need to return a single value, rather than a vector of all of the values. Here's an example of how to do that:

dots = lapply(paste0("col", 1:19), as.symbol)

mydf %>%
group_by_(.dots=dots) %>%
summarise(alltasks = paste(col20, collapse=", "))

The change to the summarise line is how you paste together all the values from col20 into a single string. I used a comma-space separator, but you can of course change that to whatever you prefer. The dots business is just an easier way to combine all the grouping variables without typing out each one.

If the names of mydf aren't so regular that you can use a simple paste function to generate them, other options would be, for example:

dots = lapply(names(mydf)[1:19], as.symbol)

or

dots = lapply(names(mydf)[-grep("col20$", names(mydf))], as.symbol)

Here's a working example of the dplyr code using the built-in mtcars data frame:

dots = lapply(c("am","vs","cyl","carb"), as.symbol)

mtcars %>%
group_by_(.dots=dots) %>%
summarise(all_mpg = paste(mpg, collapse=", "))
      am    vs   cyl  carb                      all_mpg
1 0 0 8 2 18.7, 15.5, 15.2, 19.2
2 0 0 8 3 16.4, 17.3, 15.2
3 0 0 8 4 14.3, 10.4, 10.4, 14.7, 13.3
4 0 1 4 1 21.5
5 0 1 4 2 24.4, 22.8
6 0 1 6 1 21.4, 18.1
7 0 1 6 4 19.2, 17.8
8 1 0 4 2 26
9 1 0 6 4 21, 21
10 1 0 6 6 19.7
11 1 0 8 4 15.8
12 1 0 8 8 15
13 1 1 4 1 22.8, 32.4, 33.9, 27.3
14 1 1 4 2 30.4, 30.4, 21.4

Collapse cells across several rows to a single column in a R data frame

Try this base R approach:

#Code
data <- aggregate(x3~x1+x2,data,function(x) paste0(x,collapse = ','))

Output:

  x1 x2  x3
1 1 A 6,7
2 3 C 8,9
3 5 E 10

Concatenating strings / rows using dplyr, group_by & collapse or summarize, but maintain NA values

We may use str_c from the stringr package.

library(dplyr)
library(stringr)

df %>%
group_by(ID)%>%
summarize(string = str_c(string, collapse = "; "))

# ID string
# <dbl> <chr>
#1 1 " asfdas ; sdf"
#2 2 NA
#3 3 "NA"

collapse text by 2 ID's in a row

Alternative tidyverse answer:

library(tidyverse)
dat %>%
group_by(group) %>%
mutate(text=paste0(lag(text),text)) %>% slice(-1)

Using data.table:

library(data.table)
setDT(dat)
dat[, paste0(shift(text,1), text)[-1], by=group]

# group V1
#1: a a1a2
#2: a a2a3
#3: b b1b2
#4: c c1c2
#5: c c2c3
#6: c c3c4

How to collapse values by group from unpaired elements in a data frame in R

library(dplyr)
my_df %>% group_by(gene, group) %>%
summarise(count = sum(count))

Collapse rows across group and remove duplicates and NAs

We can use group_by with summarise(across(everything(), ...)) to apply a function to every column. That function in our case is written as a formula (the ~-notation), in which the column is called .x.

As you suggested, we can paste (with collapse = ", ") the rows together. I remove the NA values with .x[!is.na(.x)].

df_in %>% 
group_by(group, subgroup) %>%
summarise(across(everything(), ~ paste(unique(.x[!is.na(.x)]), collapse = ", "))) %>%
ungroup()

The only difference with your expected output is that the shape column is now an empty string instead of an NA value:

# A tibble: 1 x 6
group subgroup color shape emotion shade
<dbl> <chr> <chr> <chr> <chr> <chr>
1 1 a red "" happy, sad striped

That can be fixed by for example creating a function that replaces the zero-length list with NA before pasting.

paste_rows <- function(x) {
unique_x <- unique(x[!is.na(x)])
if (length(unique_x) == 0) {
unique_x <- NA
}

paste(unique_x, collapse = ", ")
}

df_in %>%
group_by(group, subgroup) %>%
summarise(across(everything(), paste_rows)) %>%
ungroup()


Related Topics



Leave a reply



Submit