combine rows in data frame containing NA to make complete row
I haven't figured out how to put the coalesce_by_column
function inside the dplyr
pipeline, but this works:
coalesce_by_column <- function(df) {
return(coalesce(df[1], df[2]))
}
df %>%
group_by(A) %>%
summarise_all(coalesce_by_column)
## A B C D E
## <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 2 3 2 5
## 2 2 4 5 3 4
Edit: include @Jon Harmon's solution for more than 2 members of a group
# Supply lists by splicing them into dots:
coalesce_by_column <- function(df) {
return(dplyr::coalesce(!!! as.list(df)))
}
df %>%
group_by(A) %>%
summarise_all(coalesce_by_column)
#> # A tibble: 2 x 5
#> A B C D E
#> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 2 3 2 5
#> 2 2 4 5 3 4
Merge rows in a dataframe where the rows are disjoint and contain NAs
You can use aggregate
. Assuming that you want to merge rows with identical values in column name
:
aggregate(x=DF[c("v1","v2","v3","v4")], by=list(name=DF$name), min, na.rm = TRUE)
name v1 v2 v3 v4
1 Yemen 4 2 3 5
This is like the SQL SELECT name, min(v1) GROUP BY name
. The min
function is arbitrary, you could also use max
or mean
, all of them return the non-NA value from an NA and a non-NA value if na.rm = TRUE
.
(An SQL-like coalesce()
function would sound better if existed in R.)
However, you should check first if all non-NA values for a given name
is identical. For example, run the aggregate
both with min
and max
and compare, or run it with range
.
Finally, if you have many more variables than just v1-4, you could use DF[,!(names(DF) %in% c("code","name"))]
to define the columns.
Combine rows by group with differing NAs in each row
Is this what you want ? zoo
+dplyr
also check the link here
df %>%
group_by(groupid) %>%
mutate_all(funs(na.locf(., na.rm = FALSE, fromLast = FALSE)))%>%filter(row_number()==n())
# A tibble: 1 x 5
# Groups: groupid [1]
groupid col1 col2 col3 col4
<int> <int> <chr> <int> <int>
1 1 0 n 2 2
EDIT1
without the filter , will give back whole dataframe.
df %>%
group_by(groupid) %>%
mutate_all(funs(na.locf(., na.rm = FALSE, fromLast = FALSE)))
# A tibble: 2 x 5
# Groups: groupid [1]
groupid col1 col2 col3 col4
<int> <int> <chr> <int> <int>
1 1 0 n NA 2
2 1 0 n 2 2
filter
here, just slice the last one, na.locf
will carry on the previous not NA
value, which mean the last row in your group is what you want.
Also base on @ thelatemail recommended. you can do the following , give back the same answer.
df %>% group_by(groupid) %>% summarise_all(funs(.[!is.na(.)][1]))
EDIT2
Assuming you have conflict and you want to show them all.
df <- read.table(text="groupid col1 col2 col3 col4
1 0 n NA 2
1 1 NA 2 2",
header=TRUE,stringsAsFactors=FALSE)
df
groupid col1 col2 col3 col4
1 1 0 n NA 2
2 1 1(#)<NA> 2 2(#)
df %>%
group_by(groupid) %>%
summarise_all(funs(toString(unique(na.omit(.)))))#unique for duplicated like col4
groupid col1 col2 col3 col4
<int> <chr> <chr> <chr> <chr>
1 1 0, 1 n 2 2
Combining/joining rows within the same dataframe based on grouping R
library(dplyr)
df %>%
group_by(name, year) %>%
summarise_all(mean, na.rm = TRUE)
This is a dplyr answer. It works, if your data really looks like the one you posted.
Output:
name year A B C
<fct> <dbl> <dbl> <dbl> <dbl>
1 bar 18 2 4 5
2 foo 19 1 3 2
Combine, merge, coalesce rows by group and replace certain value by another value without pivoting
We can group by 'A', 'B', summarise
across
the columns, order
the values so that 'u' will return before other values and select the first
element
library(dplyr)
df %>%
group_by(A, B) %>%
summarise(across(everything(),
~ first(.[order(. != 'u')])), .groups = 'drop')
-output
# A tibble: 5 x 5
A B C D E
<int> <chr> <chr> <chr> <chr>
1 1 a u u t
2 2 b t u u
3 3 c u t u
4 4 d t u u
5 5 e t u u
How to merge rows that contain NAs into one in r?
With the strict assumption that the number of non-NA
values is the same for each column, one can do
t(apply(data, 2, na.omit))
# [,1] [,2] [,3]
# [1,] 1 2 9
The need for t(..)
is distinct for this example, since apply
here will auto-simplify to a vector, not to an array. If your data produces more than two non-NA
values, then you can do without t(..)
:
data <- rbind(data, data)
data
# [,1] [,2] [,3]
# 1 1 2 NA
# 2 NA NA 9
# 1 1 2 NA
# 2 NA NA 9
apply(data, 2, na.omit)
# [,1] [,2] [,3]
# [1,] 1 2 9
# [2,] 1 2 9
@akrun made a great point: if a column is all NA
, this will fail. Here's a slight fix for that:
apply(data, 2, function(z) { out <- na.omit(z); if (!length(out)) NA else out; })
# [,1] [,2] [,3]
# [1,] 1 2 9
# [2,] 1 2 9
How to merge multiple rows into a single row for a single column?
As it is a tibble, we can make use of tidyverse functions (in the newer version of dplyr
, we can use across
with summarise
)
library(dplyr)
library(stringr)
df %>%
group_by(Injury) %>%
summarise(across(everything(), str_c, collapse=""))
Or with summarise_at
df %>%
group_by(Injury) %>%
summarise_at(vars(-group_cols()), str_c, collapse="")
Combining rows with duplicate values and NAs [without using tidyverse]
One tidyverse
possibility could be:
df %>%
gather(var, val, -ID, na.rm = TRUE) %>%
group_by(ID, var) %>%
distinct(val) %>%
spread(var, val)
ID V1 V2 V3 V4
<chr> <int> <int> <int> <int>
1 04C 6 9 NA 9
2 0F0 NA 5 7 4
3 167 8 10 5 NA
4 2D7 3 3 NA 1
Related Topics
Convert Data.Frame Column to a Vector
Using a Pre-Defined Color Palette in Ggplot
How to Get the Maximum Value by Group
Is Set.Seed Consistent Over Different Versions of R (And Ubuntu)
How to Deal with "'Somefunction' Is Not an Exported Object from 'Namespace:Somepackage'" Error
Is There a Better Alternative Than String Manipulation to Programmatically Build Formulas
How to Drop Columns by Name Pattern in R
Ggplot2: Curly Braces on an Axis
Display Custom Image as Geom_Point
Add New Row to Dataframe, at Specific Row-Index, Not Appended
Specification of First and Last Tick Marks with Scale_X_Date
Reverse Order of Discrete Y Axis in Ggplot2
Using Multiple Criteria in Subset Function and Logical Operators
Pasting Elements of Two Vectors Alphabetically