R Dataframe: aggregating strings within column, across rows, by group
Here are two ways:
base R
aggregate(
text ~ page + passage + person,
data=df,
FUN=paste, collapse=' '
)
dplyr
library(dplyr)
df %>%
group_by_(~page, ~passage, ~person) %>%
summarize_(text=~paste(text, collapse=' '))
How to aggregate characters strings by group in R?
An option is to group by 'DocID', fill
the columns 'ElementA', 'ElementB' with adjacent non-NA elements and get the distinct
rows
library(dplyr)
library(tidyr)
df1 %>%
group_by(DocID) %>%
fill(ElementA, ElementB, .direction = "downup") %>%
ungroup %>%
distinct
-output
# A tibble: 3 x 3
# DocID ElementA ElementB
# <int> <chr> <chr>
#1 1 A1 B1
#2 2 A2 B2
#3 3 A3 B3
data
df1 <- structure(list(DocID = c(1L, 1L, 2L, 2L, 3L, 3L), ElementA = c("A1",
NA, "A2", NA, "A3", NA), ElementB = c(NA, "B1", NA, "B2", NA,
"B3")), class = "data.frame", row.names = c(NA, -6L))
How to list row values in a column based on grouping value in R?
I would suggest next base R
approach:
#Data
df <- structure(list(GeneID = c("am1001", "am1001", "am1002", "am1002",
"am1002"), GO = c(190909L, 600510L, 500050L, 432323L, 100209L
)), class = "data.frame", row.names = c(NA, -5L))
The code:
#Aggregation
aggregate(GO~GeneID,data=df,FUN = function(x) paste0(x,collapse = '; '))
The output:
GeneID GO
1 am1001 190909; 600510
2 am1002 500050; 432323; 100209
Concatenate strings by group with dplyr
You could simply do
data %>%
group_by(foo) %>%
mutate(bars_by_foo = paste0(bar, collapse = ""))
Without any helper functions
Aggregate Data Frame Containing Strings and Numbers
With dplyr
, we can do multiple aggregates on blocks of columns by group. The 'IDENTIFICATION' values are showed to be different, based on the expected, we can select the first
element of that column for each group
library(dplyr) # >= 1.0.0
df1 %>%
group_by(COUNTY, COMMON_FIELD) %>%
# // use across for more than one column
# // checks the type of columns i.e. numeric to select and return the sum
summarise(across(where(is.numeric), sum, na.rm = TRUE),
IDENTIFICATION = first(IDENTIFICATION))
The OP's original dataset code can be changed to
GAcatalistDupes %>%
group_by(FIPS, CAT_JOIN) %>%
# // summarise numeric columns
summarise(across(where(is.numeric), sum, na.rm = TRUE),
# // get the first value for specified columns
across(c(geography, CONG, SS, SH, Field23, FIPS), first))
Collapse text by group in data frame
Simply use aggregate
:
aggregate(df$text, list(df$group), paste, collapse="")
## Group.1 x
## 1 a a1a2a3
## 2 b b1b2
## 3 c c1c2c3
Or with plyr
library(plyr)
ddply(df, .(group), summarize, text=paste(text, collapse=""))
## group text
## 1 a a1a2a3
## 2 b b1b2
## 3 c c1c2c3
ddply
is faster than aggregate
if you have a large dataset.
EDIT :
With the suggestion from @SeDur :
aggregate(text ~ group, data = df, FUN = paste, collapse = "")
## group text
## 1 a a1a2a3
## 2 b b1b2
## 3 c c1c2c3
For the same result with earlier method you have to do :
aggregate(x=list(text=df$text), by=list(group=df$group), paste, collapse="")
EDIT2 : With data.table
:
library("data.table")
dt <- as.data.table(df)
dt[, list(text = paste(text, collapse="")), by = group]
## group text
## 1: a a1a2a3
## 2: b b1b2
## 3: c c1c2c3
Collapse / concatenate / aggregate a column to a single comma separated string within each group
Here are some options using toString
, a function that concatenates a vector of strings using comma and space to separate components. If you don't want commas, you can use paste()
with the collapse
argument instead.
data.table
# alternative using data.table
library(data.table)
as.data.table(data)[, toString(C), by = list(A, B)]
aggregate This uses no packages:
# alternative using aggregate from the stats package in the core of R
aggregate(C ~., data, toString)
sqldf
And here is an alternative using the SQL function group_concat
using the sqldf package :
library(sqldf)
sqldf("select A, B, group_concat(C) C from data group by A, B", method = "raw")
dplyr A dplyr
alternative:
library(dplyr)
data %>%
group_by(A, B) %>%
summarise(test = toString(C)) %>%
ungroup()
plyr
# plyr
library(plyr)
ddply(data, .(A,B), summarize, C = toString(C))
Search and combine text from the same columns related to a specific variable in R
You can try this
> aggregate(. ~ ID, unique(D), c)
ID VAR
1 1 A, B
2 2 C, D
3 3 E
4 4 F
5 5 G
or
> aggregate(. ~ ID, unique(D), toString)
ID VAR
1 1 A, B
2 2 C, D
3 3 E
4 4 F
5 5 G
Related Topics
Pivot_Wider, Count Number of Occurrences
Passing Arguments into Multiple Match_Fun Functions in R Fuzzyjoin::Fuzzy_Join
Cbind Two Lists of Data.Frames to a New List
Display Frequency Instead of Count with Geom_Bar() in Ggplot
R: Why Kable Doesn't Print Inside a for Loop
How to Always Display 3 Decimal Places in Datatables in R Shiny
R 3.5 Is Not Available for Linux
Add Titles to Ggplots Created with Map()
What Is the Internal Implementation of Lists
Package 'Pbkrtest' Is Not Available (For R Version 3.2.2)
Constructing a Named List Without Having to Type Each Object's Name Twice
R: Saving Ggplot2 Plots in a List
Use Csl-File for PDF-Output in Bookdown
In Shiny Apps for R, How to Delay the Firing of a Reactive
Predict() with Arbitrary Coefficients in R
How to Calculate the Distance Between Latitude and Longitude Along Rows of Columns in R