Combine Rows in Data Frame Containing Na to Make Complete Row

combine rows in data frame containing NA to make complete row

I haven't figured out how to put the coalesce_by_column function inside the dplyr pipeline, but this works:

coalesce_by_column <- function(df) {
return(coalesce(df[1], df[2]))
}

df %>%
group_by(A) %>%
summarise_all(coalesce_by_column)

## A B C D E
## <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 2 3 2 5
## 2 2 4 5 3 4

Edit: include @Jon Harmon's solution for more than 2 members of a group

# Supply lists by splicing them into dots:
coalesce_by_column <- function(df) {
return(dplyr::coalesce(!!! as.list(df)))
}

df %>%
group_by(A) %>%
summarise_all(coalesce_by_column)

#> # A tibble: 2 x 5
#> A B C D E
#> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 2 3 2 5
#> 2 2 4 5 3 4

Merge rows in a dataframe where the rows are disjoint and contain NAs

You can use aggregate. Assuming that you want to merge rows with identical values in column name:

aggregate(x=DF[c("v1","v2","v3","v4")], by=list(name=DF$name), min, na.rm = TRUE)
name v1 v2 v3 v4
1 Yemen 4 2 3 5

This is like the SQL SELECT name, min(v1) GROUP BY name. The min function is arbitrary, you could also use max or mean, all of them return the non-NA value from an NA and a non-NA value if na.rm = TRUE.
(An SQL-like coalesce() function would sound better if existed in R.)

However, you should check first if all non-NA values for a given name is identical. For example, run the aggregate both with min and max and compare, or run it with range.

Finally, if you have many more variables than just v1-4, you could use DF[,!(names(DF) %in% c("code","name"))] to define the columns.

Combine rows by group with differing NAs in each row

Is this what you want ? zoo+dplyr also check the link here

df %>%
group_by(groupid) %>%
mutate_all(funs(na.locf(., na.rm = FALSE, fromLast = FALSE)))%>%filter(row_number()==n())


# A tibble: 1 x 5
# Groups: groupid [1]
groupid col1 col2 col3 col4
<int> <int> <chr> <int> <int>
1 1 0 n 2 2

EDIT1

without the filter , will give back whole dataframe.

    df %>%
group_by(groupid) %>%
mutate_all(funs(na.locf(., na.rm = FALSE, fromLast = FALSE)))

# A tibble: 2 x 5
# Groups: groupid [1]
groupid col1 col2 col3 col4
<int> <int> <chr> <int> <int>
1 1 0 n NA 2
2 1 0 n 2 2

filter here, just slice the last one, na.locf will carry on the previous not NA value, which mean the last row in your group is what you want.

Also base on @ thelatemail recommended. you can do the following , give back the same answer.

df %>% group_by(groupid) %>% summarise_all(funs(.[!is.na(.)][1]))

EDIT2

Assuming you have conflict and you want to show them all.

df <- read.table(text="groupid  col1  col2  col3  col4
1 0 n NA 2
1 1 NA 2 2",
header=TRUE,stringsAsFactors=FALSE)
df
groupid col1 col2 col3 col4
1 1 0 n NA 2
2 1 1(#)<NA> 2 2(#)
df %>%
group_by(groupid) %>%
summarise_all(funs(toString(unique(na.omit(.)))))#unique for duplicated like col4
groupid col1 col2 col3 col4
<int> <chr> <chr> <chr> <chr>
1 1 0, 1 n 2 2

Combining/joining rows within the same dataframe based on grouping R

library(dplyr)

df %>%
group_by(name, year) %>%
summarise_all(mean, na.rm = TRUE)

This is a dplyr answer. It works, if your data really looks like the one you posted.

Output:

  name   year     A     B     C
<fct> <dbl> <dbl> <dbl> <dbl>
1 bar 18 2 4 5
2 foo 19 1 3 2

Combine, merge, coalesce rows by group and replace certain value by another value without pivoting

We can group by 'A', 'B', summarise across the columns, order the values so that 'u' will return before other values and select the first element

library(dplyr)
df %>%
group_by(A, B) %>%
summarise(across(everything(),
~ first(.[order(. != 'u')])), .groups = 'drop')

-output

# A tibble: 5 x 5
A B C D E
<int> <chr> <chr> <chr> <chr>
1 1 a u u t
2 2 b t u u
3 3 c u t u
4 4 d t u u
5 5 e t u u

How to merge rows that contain NAs into one in r?

With the strict assumption that the number of non-NA values is the same for each column, one can do

t(apply(data, 2, na.omit))
# [,1] [,2] [,3]
# [1,] 1 2 9

The need for t(..) is distinct for this example, since apply here will auto-simplify to a vector, not to an array. If your data produces more than two non-NA values, then you can do without t(..):

data <- rbind(data, data)
data
# [,1] [,2] [,3]
# 1 1 2 NA
# 2 NA NA 9
# 1 1 2 NA
# 2 NA NA 9
apply(data, 2, na.omit)
# [,1] [,2] [,3]
# [1,] 1 2 9
# [2,] 1 2 9

@akrun made a great point: if a column is all NA, this will fail. Here's a slight fix for that:

apply(data, 2, function(z) { out <- na.omit(z); if (!length(out)) NA else out; })
# [,1] [,2] [,3]
# [1,] 1 2 9
# [2,] 1 2 9

How to merge multiple rows into a single row for a single column?

As it is a tibble, we can make use of tidyverse functions (in the newer version of dplyr , we can use across with summarise)

library(dplyr)
library(stringr)
df %>%
group_by(Injury) %>%
summarise(across(everything(), str_c, collapse=""))

Or with summarise_at

df %>% 
group_by(Injury) %>%
summarise_at(vars(-group_cols()), str_c, collapse="")

Combining rows with duplicate values and NAs [without using tidyverse]

One tidyverse possibility could be:

df %>%
gather(var, val, -ID, na.rm = TRUE) %>%
group_by(ID, var) %>%
distinct(val) %>%
spread(var, val)

ID V1 V2 V3 V4
<chr> <int> <int> <int> <int>
1 04C 6 9 NA 9
2 0F0 NA 5 7 4
3 167 8 10 5 NA
4 2D7 3 3 NA 1


Related Topics



Leave a reply



Submit