R Remove Groups with Only Nas

Removing groups with all NA in Data.Table or DPLYR in R

Is this what you want?

library(dplyr)

dataHAVE %>%
group_by(student) %>%
filter(!all(is.na(score)))

student time score
<dbl> <dbl> <dbl>
1 1 1 7
2 1 2 9
3 1 3 5
4 3 1 NA
5 3 2 3
6 3 3 9
7 5 NA 7
8 5 2 NA
9 5 3 5

Each student is only kept if not (!) all score values are NA

Remove NAs in each column by group

Here is one possible solution using data.table package:

library(data.table)

setDT(na_data)[, lapply(.SD, function(x) if(length(y<-na.omit(x))) y else first(x)), by=Year]

# Year Peter Paul John
# 1: 2011 1 1 NA
# 2: 2011 2 2 NA
# 3: 2011 3 3 NA
# 4: 2012 1 3 NA
# 5: 2012 2 2 NA
# 6: 2012 3 1 NA
# 7: 2013 1 1 4
# 8: 2013 2 2 5
# 9: 2013 3 3 6

dplyr equivalent:

library(dplyr)

na_data |>
group_by(Year) |>
summarise(across(.fns = ~ if(length(y<-na.omit(.x))) y else first(.x)))

# # A tibble: 9 x 4
# # Groups: Year [3]
# Year Peter Paul John
# <dbl> <dbl> <dbl> <int>
# 1 2011 1 1 NA
# 2 2011 2 2 NA
# 3 2011 3 3 NA
# 4 2012 1 3 NA
# 5 2012 2 2 NA
# 6 2012 3 1 NA
# 7 2013 1 1 4
# 8 2013 2 2 5
# 9 2013 3 3 6

Remove groups which do not have non-consecutive NA values in R

How about using difference between the index of NA-values per group?

library(dplyr)
df %>% group_by(group) %>% filter(any(diff(which(is.na(D))) > 1))

## A tibble: 8 x 2
## Groups: group [2]
# group D
# <dbl> <dbl>
#1 2. NA
#2 2. 2.
#3 2. NA
#4 2. NA
#5 4. NA
#6 4. 2.
#7 4. 3.
#8 4. NA

I'm not sure this would catch all potential edge cases but it seems to work for the given example.

remove group if any member containes NA in R

We can use filter after grouping by 'category'

library(dplyr)
tbl %>%
group_by(category) %>%
filter(!any(is.na(values))) %>%
ungroup

-output

# A tibble: 2 x 2
category values
<chr> <dbl>
1 A 2
2 A 3

Exclude groups with NAs in tidy dataset

Using all() will evaluate the entire group, so you can skip the mutate step.

MWA %>% 
group_by(Dir) %>%
filter(all(!is.na(time_seg)))

# A tibble: 8 x 5
# Groups: Dir [1]
VP Con Dir Seg time_seg
<int> <int> <int> <int> <int>
1 10 2 2 1 320
2 10 2 2 2 1110
3 10 2 2 3 450
4 10 2 2 4 600
5 10 2 2 5 1680
6 10 2 2 6 730
7 10 2 2 7 850
8 10 2 2 8 840

Remove rows with NA in a group, given the group contains at-least one non NA value

We could use data.table. Convert the 'data.frame' to 'data.table' (setDT(df)). Grouped by 'class', we check with an if/else condition about occurrence of 'NA' elements in the 'value' and subset with .SD

library(data.table)
setDT(df)[, if(any(!is.na(value))) .SD[!is.na(value)] else .SD , by = class]
# class value
#1: orange NA
#2: apple 1
#3: grape 1
#4: berry NA

Or we can change the condition from any to all by slightly modifying the condition

setDT(df)[, if(all(is.na(value))) .SD else .SD[!is.na(value)], by = class]
# class value
#1: orange NA
#2: apple 1
#3: grape 1
#4: berry NA

Or we get the row index (.I) and then subset the dataset.

indx <- setDT(df)[, if(any(!is.na(value))) .I[!is.na(value)] else .I, class]$V1
df[indx]


Related Topics



Leave a reply



Submit