R Collapse Multiple Rows into 1 Row - Same Columns

R collapse multiple rows into 1 row - same columns

As you suggested that you would like a data.table solution in your comment, you could use

library(data.table)
df <- data.table(record_numb,col_a,col_b,col_c)

df[, lapply(.SD, paste0, collapse=""), by=record_numb]
record_numb col_a col_b col_c
1: 1 123 234 543
2: 2 987 765 543

.SD basically says, "take all the variables in my data.table" except those in the by argument. In @Frank's answer, he reduces the set of the variables using .SDcols. If you want to cast the variables into numeric, you can still do this in one line. Here is a chaining method.

df[, lapply(.SD, paste0, collapse=""), by=record_numb][, lapply(.SD, as.integer)]

The second "chain" casts all the variables as integers.

In R, collapse over multiple logical rows of the same ID into 1 row

We can use any instead of paste as any will check for any TRUE elements in the column, grouped by 'ID'

library(data.table)
setDT(df)[, lapply(.SD, any), ID]

-output

#   ID cardiovasc beta_blockers antibiotics
#1: a TRUE FALSE TRUE

How to collapse multiple rows with condition into one row using dplyr in r?

Here's another way to achieve the output.

library(tidyverse)

df %>%
mutate(value = str_extract(Description, "'\\w+'"),
Description = trimws(str_remove(Description, value))) %>%
group_by(Description, Category) %>%
summarise(ID = toString(ID),
value = sprintf("'%s'", toString(gsub("'", "", value)))) %>%
unite(Description, value, Description, sep = ' ')

# Description Category ID
# <chr> <chr> <chr>
#1 'foo' is a cat B 3
#2 'foo, bar' is a dog A 1, 2
#3 'bar' is a fish C 5
#4 'foo' is not a cat B 4

R collapsing multiple rows into one row by grouping multiple columns

Here is one option with dplyr

library(dplyr)
df %>%
group_by_at(groupColumns) %>%
summarise_at(vars(dataColumns), ~ if(all(is.na(.))) NA_real_
else na.omit(.))
# A tibble: 3 x 6
# Groups: TreatName, id [3]
# TreatName id Method drug1 drug2 drug3
# <fct> <fct> <fct> <dbl> <dbl> <dbl>
#1 Dynamic patient2 IV NA NA 56
#2 Static patient1 IV 34 7 NA
#3 Static patient2 IV NA NA 0

R collapse multiple rows into 1 row using specific function to date & character columns

It is not clear why we have to go through Map and get. After grouping by 'id', get the mean of 'date1' and paste the 'charval' together

dt2[, .(date1 = mean(date1), charval = toString(charval)), id]
# id date1 charval
#1: 1 2009-01-02 aa, vv, ss
#2: 2 2009-01-05 a, b, c, d

Note: toString is paste(..., collapse=', ')

dt2[, .(date1 = mean(date1), charval = paste(charval, collapse=";")), id]
# id date1 charval
#1: 1 2009-01-02 aa;vv;ss
#2: 2 2009-01-05 a;b;c;d

As the OP's question is about Map with using get to call the mean. This seems to be triggering the

if (!is.numeric(x) && !is.complex(x) && !is.logical(x)) {
warning("argument is not numeric or logical: returning NA")
return(NA_real_)

and returns the NA when it finds that 'date1' is of class Date although it is stored as numeric. One option is to specify the envir in get

Another problem is the use of ifelse. It is better to use if/else as there are only two elements

dt2[, Map(function(x, y)  if(x != "paste") get(x, envir = parent.frame())(y, na.rm = TRUE) 
else paste(y, collapse=':'), setNames(c("mean", "paste"), names(.SD)), .SD), by = id]
# id date1 charval
#1: 1 2009-01-02 aa:vv:ss
#2: 2 2009-01-05 a:b:c:d

get is kind of tricky and if specify the correct environment, it works as expected

get("mean")(dt2$date1)
#[1] "2009-01-04"

Or instead of if/else to the "paste" string, we can check on the column class and if it is character then do the paste or else return mean

dt2[, Map(function(x, y)  if(is.character(y)) get(x)(y, collapse=":") 
else get(x, envir = parent.frame())(y, na.rm = TRUE),
setNames(c("mean", "paste"), names(.SD)), .SD), by = id]
# id date1 charval
#1: 1 2009-01-02 aa:vv:ss
#2: 2 2009-01-05 a:b:c:d

Note that it is better to use the first approach without any hassles

How to merge multiple rows into a single row for a single column?

As it is a tibble, we can make use of tidyverse functions (in the newer version of dplyr , we can use across with summarise)

library(dplyr)
library(stringr)
df %>%
group_by(Injury) %>%
summarise(across(everything(), str_c, collapse=""))

Or with summarise_at

df %>% 
group_by(Injury) %>%
summarise_at(vars(-group_cols()), str_c, collapse="")


Related Topics



Leave a reply



Submit