Collapsing Rows Where Some Are All Na, Others Are Disjoint With Some Nas

Collapsing rows where some are all NA, others are disjoint with some NAs

Try

library(dplyr)
DF %>% group_by(ID) %>% summarise_each(funs(sum(., na.rm = TRUE)))

Edit: To account for the case in which one column has all NAs for a certain ID, we need sum_NA() function which returns NA if all are NAs

txt <- "ID    Col1    Col2    Col3    Col4
1 NA NA NA NA
1 5 10 NA NA
1 NA NA 15 20
2 NA NA NA NA
2 NA 30 NA NA
2 NA NA 35 40"
DF <- read.table(text = txt, header = TRUE)

# original code
DF %>%
group_by(ID) %>%
summarise_each(funs(sum(., na.rm = TRUE)))

# `summarise_each()` is deprecated.
# Use `summarise_all()`, `summarise_at()` or `summarise_if()` instead.
# To map `funs` over all variables, use `summarise_all()`
# A tibble: 2 x 5
ID Col1 Col2 Col3 Col4
<int> <int> <int> <int> <int>
1 1 5 10 15 20
2 2 0 30 35 40

sum_NA <- function(x) {if (all(is.na(x))) x[NA_integer_] else sum(x, na.rm = TRUE)}

DF %>%
group_by(ID) %>%
summarise_all(funs(sum_NA))

DF %>%
group_by(ID) %>%
summarise_if(is.numeric, funs(sum_NA))

# A tibble: 2 x 5
ID Col1 Col2 Col3 Col4
<int> <int> <int> <int> <int>
1 1 5 10 15 20
2 2 NA 30 35 40

Create all possible combinations of non-NA values for each group ID

Grouped by 'ID', fill other columns, ungroup to remove the group attribute and keep the distinct rows

library(dplyr)
library(tidyr)
DF %>%
group_by(ID) %>%
fill(everything(), .direction = 'updown') %>%
ungroup %>%
distinct(.keep_all = TRUE)

Or may also be

DF %>% 
group_by(ID) %>%
mutate(across(everything(), ~ replace(., is.na(.),
rep(.[!is.na(.)], length.out = sum(is.na(.))))))

Or based on the comments

DF %>%
group_by(ID) %>%
mutate(across(where(~ any(is.na(.))), ~ {
i1 <- is.na(.)
ind <- which(i1)
i2 <- !i1
if(i1[1] == 1) rep(.[i2], each = n()/sum(i2)) else
rep(.[i2], length.out = n())
})) %>%
ungroup %>%
distinct(.keep_all = TRUE)

-output

# A tibble: 6 x 5
ID Col1 Col2 Col3 Col4
<int> <int> <int> <int> <int>
1 1 6 10 15 20
2 1 5 10 15 20
3 2 17 25 21 34
4 2 13 25 21 34
5 2 17 25 35 40
6 2 13 25 35 40

Collapse Elements in R with NA

If we don't mind losing the order, then maybe try this:

apply(df1, 2, sort, na.last = TRUE)

To keep the order:

sapply(1:ncol(df1),
function(i){
c(
df1[, i][!is.na(df1[, i])],
df1[, i][ is.na(df1[, i])]
)
})

Merge rows in a dataframe where the rows are disjoint and contain NAs

You can use aggregate. Assuming that you want to merge rows with identical values in column name:

aggregate(x=DF[c("v1","v2","v3","v4")], by=list(name=DF$name), min, na.rm = TRUE)
name v1 v2 v3 v4
1 Yemen 4 2 3 5

This is like the SQL SELECT name, min(v1) GROUP BY name. The min function is arbitrary, you could also use max or mean, all of them return the non-NA value from an NA and a non-NA value if na.rm = TRUE.
(An SQL-like coalesce() function would sound better if existed in R.)

However, you should check first if all non-NA values for a given name is identical. For example, run the aggregate both with min and max and compare, or run it with range.

Finally, if you have many more variables than just v1-4, you could use DF[,!(names(DF) %in% c("code","name"))] to define the columns.

r - merge rows in group while replacing NAs

No need to delete the question, it may be helpful to some users. This summarises each group to the first non NA occurrence for each column.

library(dplyr)

df_start <- data.frame(
id = c("as", "as", "as", "as", "as", "bs", "bs", "bs", "bs", "bs"),
b = c(NA, NA, NA, NA, "A", NA, NA, 6, NA, NA),
c = c(2, NA, NA, NA, NA, 7, NA, NA, NA, NA),
d = c(NA, 4, NA, NA, NA, NA, 8, NA, NA, NA),
e = c(NA, NA, NA, 3, NA, NA, NA, NA, "B", NA),
f = c(NA, NA, 5, NA, NA, NA, NA, NA, NA, 10))

df_start %>%
group_by(id) %>%
summarise_all(list(~first(na.omit(.))))

Output:

# A tibble: 2 x 6
id b c d e f
<fct> <fct> <dbl> <dbl> <fct> <dbl>
1 as A 2. 4. 3 5.
2 bs 6 7. 8. B 10.

You will, of course, get some data lost if there is more than one occurrence of a value with each group for each column.

Combine rows by group with differing NAs in each row

Is this what you want ? zoo+dplyr also check the link here

df %>%
group_by(groupid) %>%
mutate_all(funs(na.locf(., na.rm = FALSE, fromLast = FALSE)))%>%filter(row_number()==n())


# A tibble: 1 x 5
# Groups: groupid [1]
groupid col1 col2 col3 col4
<int> <int> <chr> <int> <int>
1 1 0 n 2 2

EDIT1

without the filter , will give back whole dataframe.

    df %>%
group_by(groupid) %>%
mutate_all(funs(na.locf(., na.rm = FALSE, fromLast = FALSE)))

# A tibble: 2 x 5
# Groups: groupid [1]
groupid col1 col2 col3 col4
<int> <int> <chr> <int> <int>
1 1 0 n NA 2
2 1 0 n 2 2

filter here, just slice the last one, na.locf will carry on the previous not NA value, which mean the last row in your group is what you want.

Also base on @ thelatemail recommended. you can do the following , give back the same answer.

df %>% group_by(groupid) %>% summarise_all(funs(.[!is.na(.)][1]))

EDIT2

Assuming you have conflict and you want to show them all.

df <- read.table(text="groupid  col1  col2  col3  col4
1 0 n NA 2
1 1 NA 2 2",
header=TRUE,stringsAsFactors=FALSE)
df
groupid col1 col2 col3 col4
1 1 0 n NA 2
2 1 1(#)<NA> 2 2(#)
df %>%
group_by(groupid) %>%
summarise_all(funs(toString(unique(na.omit(.)))))#unique for duplicated like col4
groupid col1 col2 col3 col4
<int> <chr> <chr> <chr> <chr>
1 1 0, 1 n 2 2

combine rows in data frame containing NA to make complete row

I haven't figured out how to put the coalesce_by_column function inside the dplyr pipeline, but this works:

coalesce_by_column <- function(df) {
return(coalesce(df[1], df[2]))
}

df %>%
group_by(A) %>%
summarise_all(coalesce_by_column)

## A B C D E
## <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 2 3 2 5
## 2 2 4 5 3 4

Edit: include @Jon Harmon's solution for more than 2 members of a group

# Supply lists by splicing them into dots:
coalesce_by_column <- function(df) {
return(dplyr::coalesce(!!! as.list(df)))
}

df %>%
group_by(A) %>%
summarise_all(coalesce_by_column)

#> # A tibble: 2 x 5
#> A B C D E
#> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 2 3 2 5
#> 2 2 4 5 3 4

How to collapse many records into one while removing NA values

Here's an option with dplyr:

library(dplyr)

df %>%
group_by(name) %>%
summarise_each(funs(first(.[!is.na(.)]))) # or summarise_each(funs(first(na.omit(.))))

#Source: local data frame [3 x 3]
#
# name address favteam
#1 Bill 123 Main St Dodgers
#2 Joe 456 North Ave Pirates
#3 Rob 234 Broad St Mets

And with data.table:

library(data.table)
setDT(df)[, lapply(.SD, function(x) x[!is.na(x)][1L]), by = name]
# name address favteam
#1: Bill 123 Main St Dodgers
#2: Rob 234 Broad St Mets
#3: Joe 456 North Ave Pirates

Or

setDT(df)[, lapply(.SD, function(x) head(na.omit(x), 1L)), by = name]

Edit:

You say in your actual data you have varying numbers of non-NA responses per name. In that case, the following approach may be helpful.

Consider this modified sample data (look at last row):

name <- c("Bill", "Rob", "Joe", "Joe", "Joe")
address <- c("123 Main St", "234 Broad St", NA, "456 North Ave", "123 Boulevard")
favteam <- c("Dodgers", "Mets", "Pirates", NA, NA)

df <- data.frame(name = name,
address = address,
favteam = favteam)

df
# name address favteam
#1 Bill 123 Main St Dodgers
#2 Rob 234 Broad St Mets
#3 Joe <NA> Pirates
#4 Joe 456 North Ave <NA>
#5 Joe 123 Boulevard <NA>

Then, you can use this data.table approach to get the non-NA responses that can be varying in number by name:

setDT(df)[, lapply(.SD, function(x) unique(na.omit(x))), by = name]
# name address favteam
#1: Bill 123 Main St Dodgers
#2: Rob 234 Broad St Mets
#3: Joe 456 North Ave Pirates
#4: Joe 123 Boulevard Pirates

A better way to collapse rows with numerical value and NA

The issue is when you only have NAs ("no non-missing arguments"). Here are workarounds using dplyr and data.table:

abc %>% 
group_by(ID) %>%
summarize_all(~ if (length(na.omit(.))) max(., na.rm = TRUE) else NA_real_ ) %>%
ungroup()

setDT(abc)
abc[,
lapply(.SD, function(.) if (length(na.omit(.))) max(., na.rm = TRUE) else NA_real_),
by = ID]


Related Topics



Leave a reply



Submit