Collapse text by group in data frame
Simply use aggregate
:
aggregate(df$text, list(df$group), paste, collapse="")
## Group.1 x
## 1 a a1a2a3
## 2 b b1b2
## 3 c c1c2c3
Or with plyr
library(plyr)
ddply(df, .(group), summarize, text=paste(text, collapse=""))
## group text
## 1 a a1a2a3
## 2 b b1b2
## 3 c c1c2c3
ddply
is faster than aggregate
if you have a large dataset.
EDIT :
With the suggestion from @SeDur :
aggregate(text ~ group, data = df, FUN = paste, collapse = "")
## group text
## 1 a a1a2a3
## 2 b b1b2
## 3 c c1c2c3
For the same result with earlier method you have to do :
aggregate(x=list(text=df$text), by=list(group=df$group), paste, collapse="")
EDIT2 : With data.table
:
library("data.table")
dt <- as.data.table(df)
dt[, list(text = paste(text, collapse="")), by = group]
## group text
## 1: a a1a2a3
## 2: b b1b2
## 3: c c1c2c3
Pyspark - collapse all columns in dataframe by group variable
You can use dtypes
to classify, group by string
and date
type columns, and aggregate numeric columns respectively.
df = df.groupBy(*[t[0] for t in df.dtypes if t[1] in ('string', 'date')]) \
.agg(*[F.sum(t[0]).alias(t[0]) for t in df.dtypes if t[1] not in ('string', 'date')])
df.printSchema()
df.show(truncate=False)
How to collapse dataframe by grouping and concatenating on one column
As @thelatemail pointed out, you need to return a single value, rather than a vector of all of the values. Here's an example of how to do that:
dots = lapply(paste0("col", 1:19), as.symbol)
mydf %>%
group_by_(.dots=dots) %>%
summarise(alltasks = paste(col20, collapse=", "))
The change to the summarise
line is how you paste together all the values from col20
into a single string. I used a comma-space separator, but you can of course change that to whatever you prefer. The dots
business is just an easier way to combine all the grouping variables without typing out each one.
If the names of mydf
aren't so regular that you can use a simple paste
function to generate them, other options would be, for example:
dots = lapply(names(mydf)[1:19], as.symbol)
or
dots = lapply(names(mydf)[-grep("col20$", names(mydf))], as.symbol)
Here's a working example of the dplyr
code using the built-in mtcars
data frame:
dots = lapply(c("am","vs","cyl","carb"), as.symbol)
mtcars %>%
group_by_(.dots=dots) %>%
summarise(all_mpg = paste(mpg, collapse=", "))
am vs cyl carb all_mpg
1 0 0 8 2 18.7, 15.5, 15.2, 19.2
2 0 0 8 3 16.4, 17.3, 15.2
3 0 0 8 4 14.3, 10.4, 10.4, 14.7, 13.3
4 0 1 4 1 21.5
5 0 1 4 2 24.4, 22.8
6 0 1 6 1 21.4, 18.1
7 0 1 6 4 19.2, 17.8
8 1 0 4 2 26
9 1 0 6 4 21, 21
10 1 0 6 6 19.7
11 1 0 8 4 15.8
12 1 0 8 8 15
13 1 1 4 1 22.8, 32.4, 33.9, 27.3
14 1 1 4 2 30.4, 30.4, 21.4
Collapse cells across several rows to a single column in a R data frame
Try this base R
approach:
#Code
data <- aggregate(x3~x1+x2,data,function(x) paste0(x,collapse = ','))
Output:
x1 x2 x3
1 1 A 6,7
2 3 C 8,9
3 5 E 10
Concatenating strings / rows using dplyr, group_by & collapse or summarize, but maintain NA values
We may use str_c
from the stringr
package.
library(dplyr)
library(stringr)
df %>%
group_by(ID)%>%
summarize(string = str_c(string, collapse = "; "))
# ID string
# <dbl> <chr>
#1 1 " asfdas ; sdf"
#2 2 NA
#3 3 "NA"
collapse text by 2 ID's in a row
Alternative tidyverse
answer:
library(tidyverse)
dat %>%
group_by(group) %>%
mutate(text=paste0(lag(text),text)) %>% slice(-1)
Using data.table
:
library(data.table)
setDT(dat)
dat[, paste0(shift(text,1), text)[-1], by=group]
# group V1
#1: a a1a2
#2: a a2a3
#3: b b1b2
#4: c c1c2
#5: c c2c3
#6: c c3c4
How to collapse values by group from unpaired elements in a data frame in R
library(dplyr)
my_df %>% group_by(gene, group) %>%
summarise(count = sum(count))
Collapse rows across group and remove duplicates and NAs
We can use group_by
with summarise(across(everything(), ...))
to apply a function to every column. That function in our case is written as a formula (the ~
-notation), in which the column is called .x
.
As you suggested, we can paste
(with collapse = ", "
) the rows together. I remove the NA
values with .x[!is.na(.x)]
.
df_in %>%
group_by(group, subgroup) %>%
summarise(across(everything(), ~ paste(unique(.x[!is.na(.x)]), collapse = ", "))) %>%
ungroup()
The only difference with your expected output is that the shape
column is now an empty string instead of an NA
value:
# A tibble: 1 x 6
group subgroup color shape emotion shade
<dbl> <chr> <chr> <chr> <chr> <chr>
1 1 a red "" happy, sad striped
That can be fixed by for example creating a function that replaces the zero-length list with NA
before pasting.
paste_rows <- function(x) {
unique_x <- unique(x[!is.na(x)])
if (length(unique_x) == 0) {
unique_x <- NA
}
paste(unique_x, collapse = ", ")
}
df_in %>%
group_by(group, subgroup) %>%
summarise(across(everything(), paste_rows)) %>%
ungroup()
Related Topics
Find Indices of Duplicated Rows
Storing Ggplot Objects in a List from Within Loop in R
Creating a Boxplot for Each Column in R
Conditionally Replace Values of Subset of Rows With Column Name in R Using Only Tidy
Convert Multiple Columns of Numeric Data to Dates in R
How to Remove the Negative Values from a Data Frame in R
How to Combine Multiple Variable Data to a Single Variable Data
Column Name Changes in R for Loop for Defined Data Frame
Combine Two Data Frames by Rows (Rbind) When They Have Different Sets of Columns
How to Save a Plot as Image on the Disk
Reorder Bars in Geom_Bar Ggplot2 by Value
Emulate Ggplot2 Default Color Palette
How to Use "≪≪-" (Scoping Assignment) in R
How to Read Multiple .Txt Files into R