Merge Rows in a Dataframe Where the Rows Are Disjoint and Contain Nas

Merge rows in a dataframe where the rows are disjoint and contain NAs

You can use aggregate. Assuming that you want to merge rows with identical values in column name:

aggregate(x=DF[c("v1","v2","v3","v4")], by=list(name=DF$name), min, na.rm = TRUE)
name v1 v2 v3 v4
1 Yemen 4 2 3 5

This is like the SQL SELECT name, min(v1) GROUP BY name. The min function is arbitrary, you could also use max or mean, all of them return the non-NA value from an NA and a non-NA value if na.rm = TRUE.
(An SQL-like coalesce() function would sound better if existed in R.)

However, you should check first if all non-NA values for a given name is identical. For example, run the aggregate both with min and max and compare, or run it with range.

Finally, if you have many more variables than just v1-4, you could use DF[,!(names(DF) %in% c("code","name"))] to define the columns.

combine rows in data frame containing NA to make complete row

I haven't figured out how to put the coalesce_by_column function inside the dplyr pipeline, but this works:

coalesce_by_column <- function(df) {
return(coalesce(df[1], df[2]))
}

df %>%
group_by(A) %>%
summarise_all(coalesce_by_column)

## A B C D E
## <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 2 3 2 5
## 2 2 4 5 3 4

Edit: include @Jon Harmon's solution for more than 2 members of a group

# Supply lists by splicing them into dots:
coalesce_by_column <- function(df) {
return(dplyr::coalesce(!!! as.list(df)))
}

df %>%
group_by(A) %>%
summarise_all(coalesce_by_column)

#> # A tibble: 2 x 5
#> A B C D E
#> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 2 3 2 5
#> 2 2 4 5 3 4

r - merge rows in group while replacing NAs

No need to delete the question, it may be helpful to some users. This summarises each group to the first non NA occurrence for each column.

library(dplyr)

df_start <- data.frame(
id = c("as", "as", "as", "as", "as", "bs", "bs", "bs", "bs", "bs"),
b = c(NA, NA, NA, NA, "A", NA, NA, 6, NA, NA),
c = c(2, NA, NA, NA, NA, 7, NA, NA, NA, NA),
d = c(NA, 4, NA, NA, NA, NA, 8, NA, NA, NA),
e = c(NA, NA, NA, 3, NA, NA, NA, NA, "B", NA),
f = c(NA, NA, 5, NA, NA, NA, NA, NA, NA, 10))

df_start %>%
group_by(id) %>%
summarise_all(list(~first(na.omit(.))))

Output:

# A tibble: 2 x 6
id b c d e f
<fct> <fct> <dbl> <dbl> <fct> <dbl>
1 as A 2. 4. 3 5.
2 bs 6 7. 8. B 10.

You will, of course, get some data lost if there is more than one occurrence of a value with each group for each column.

Merge two rows in data.frame

An idea via dplyr,

library(dplyr)

df %>%
group_by(Date, Origin) %>%
summarise_all(funs(trimws(paste(., collapse = ''))))
 A tibble: 4 x 5
Groups: Date [?]
Date Origin Checkin Checkout Destination
<chr> <chr> <chr> <chr> <chr>
1 03-07-17 A 08:00 09:00 B
2 03-07-17 B 17:00 18:00 A
3 04-07-17 A 08:00 09:00 B
4 04-07-17 B 17:00 18:00 A

DATA

dput(df)
structure(list(Date = c(" 03-07-17 ", " 03-07-17 ", " 03-07-17 ",
" 03-07-17 ", " 04-07-17 ", " 04-07-17 ", " 04-07-17 ", " 04-07-17 "
), Checkin = c(" 08:00 ", " ", " 17:00 ", " ",
" 08:00 ", " ", " 17:00 ", " "), Origin = c(" A ",
" A ", " B ", " B ", " A ", " A ", " B ",
" B "), Checkout = c(" ", " 09:00 ", " ",
" 18:00 ", " ", " 09:00 ", " ", " 18:00 "
), Destination = c(" ", " B ", " ",
" A ", " ", " B ", " ",
" A ")), .Names = c("Date", "Checkin", "Origin", "Checkout",
"Destination"), row.names = c(NA, -8L), class = "data.frame")

Combine rows by group with differing NAs in each row

Is this what you want ? zoo+dplyr also check the link here

df %>%
group_by(groupid) %>%
mutate_all(funs(na.locf(., na.rm = FALSE, fromLast = FALSE)))%>%filter(row_number()==n())


# A tibble: 1 x 5
# Groups: groupid [1]
groupid col1 col2 col3 col4
<int> <int> <chr> <int> <int>
1 1 0 n 2 2

EDIT1

without the filter , will give back whole dataframe.

    df %>%
group_by(groupid) %>%
mutate_all(funs(na.locf(., na.rm = FALSE, fromLast = FALSE)))

# A tibble: 2 x 5
# Groups: groupid [1]
groupid col1 col2 col3 col4
<int> <int> <chr> <int> <int>
1 1 0 n NA 2
2 1 0 n 2 2

filter here, just slice the last one, na.locf will carry on the previous not NA value, which mean the last row in your group is what you want.

Also base on @ thelatemail recommended. you can do the following , give back the same answer.

df %>% group_by(groupid) %>% summarise_all(funs(.[!is.na(.)][1]))

EDIT2

Assuming you have conflict and you want to show them all.

df <- read.table(text="groupid  col1  col2  col3  col4
1 0 n NA 2
1 1 NA 2 2",
header=TRUE,stringsAsFactors=FALSE)
df
groupid col1 col2 col3 col4
1 1 0 n NA 2
2 1 1(#)<NA> 2 2(#)
df %>%
group_by(groupid) %>%
summarise_all(funs(toString(unique(na.omit(.)))))#unique for duplicated like col4
groupid col1 col2 col3 col4
<int> <chr> <chr> <chr> <chr>
1 1 0, 1 n 2 2

Merge rows in one data.frame

We could use data.table. We convert the 'data.frame' to 'data.table' (setDT(data)), grouped by 'name', we unlist the columns specified in the .SDcols, and paste it together.

library(data.table)
setDT(data)[, unlist(.SD), name, .SDcols=v1:v4][V1!='', paste(V1, collapse=', '), name]

As the expected output is not showed, it could be also

setDT(data)[, lapply(.SD, function(x) paste(x[x!=''], collapse='')) , name, .SDcols= v1:v4]

Update

Based on the expected output, we convert the 'factor' columns ('v1:v4') to 'character' class, then use the formula method of aggregate and paste the columns grouped by 'name'.

data[3:6] <- lapply(data[3:6], as.character)
aggregate(.~name, data[-1], FUN=function(x) paste(x[x!=''], collapse=', '))

Collapsing rows where some are all NA, others are disjoint with some NAs

Try

library(dplyr)
DF %>% group_by(ID) %>% summarise_each(funs(sum(., na.rm = TRUE)))

Edit: To account for the case in which one column has all NAs for a certain ID, we need sum_NA() function which returns NA if all are NAs

txt <- "ID    Col1    Col2    Col3    Col4
1 NA NA NA NA
1 5 10 NA NA
1 NA NA 15 20
2 NA NA NA NA
2 NA 30 NA NA
2 NA NA 35 40"
DF <- read.table(text = txt, header = TRUE)

# original code
DF %>%
group_by(ID) %>%
summarise_each(funs(sum(., na.rm = TRUE)))

# `summarise_each()` is deprecated.
# Use `summarise_all()`, `summarise_at()` or `summarise_if()` instead.
# To map `funs` over all variables, use `summarise_all()`
# A tibble: 2 x 5
ID Col1 Col2 Col3 Col4
<int> <int> <int> <int> <int>
1 1 5 10 15 20
2 2 0 30 35 40

sum_NA <- function(x) {if (all(is.na(x))) x[NA_integer_] else sum(x, na.rm = TRUE)}

DF %>%
group_by(ID) %>%
summarise_all(funs(sum_NA))

DF %>%
group_by(ID) %>%
summarise_if(is.numeric, funs(sum_NA))

# A tibble: 2 x 5
ID Col1 Col2 Col3 Col4
<int> <int> <int> <int> <int>
1 1 5 10 15 20
2 2 NA 30 35 40

R- combine rows of a data frame to be unique by 3 columns

After I made sure all columns classes are numeric (not factors) by defining the classes of columns while reading the data in, this worked for me:

CompleteCoxObs<-aggregate(x=CompleteCoxObs[c("stop","Value_EVS current weight kg CAL","Value_EVS hr heart rate NU EE0A","Value_EVS temp celsius CAL 113C")], by=list(VisitIDCode=CompleteCoxObs$VisitIDCode,start=CompleteCoxObs$start), max, na.rm = FALSE);

How to merge 2 columns within the same dataframe in R

You can use coalesce

library(dplyr)

df %>%
mutate(Var1.2 = coalesce(Var1, Var2))

#> Year Var1 Var2 Var1.2
#> 1 2014 123 123 123
#> 2 2014 NA 155 155
#> 3 2015 541 NA 541
#> 4 2015 432 432 432
#> 5 2016 NA 124 124

Created on 2019-04-11 by the reprex package (v0.2.1.9000)



Related Topics



Leave a reply



Submit