Sum Non Na Elements Only, But If All Na Then Return Na

sum non NA elements only, but if all NA then return NA

Following the suggestions from other users, I will post the answer to my question. The solution was provided by @sandipan in the comments above:

As noted in the question, if you need to sum the values of one column which contains NAs,there are two good approaches:

1) using ifelse:

A[, (ifelse(all(is.na(col2)), col2[NA_integer_], sum(col2, na.rm = T))), 
by = .(col1)]

2) define a function as suggested by @Frank:

suma = function(x) if (all(is.na(x))) x[NA_integer_] else sum(x, na.rm = TRUE)

A[, suma(col2), by = .(col1)]

Note that I added NA_integer_ as @Frank pointed out because I kept getting errors about the types.

Sum/return NA when all values are NA

The function was checking the NA on the whole dataset columns instead it should be by each column. Here, is an option with across

library(dplyr)
names(y_true_test) <- grep("species", names(df), value = TRUE)
df %>%
group_by(group) %>%
summarise(across(everything(), ~ if(all(is.na(.x))) NA_real_ else
sqrt(sum((.x - y_true_test)^2, na.rm = TRUE)/n())/
(y_true_test[cur_column()]) * 100), .groups = 'drop')

-output

# A tibble: 1 × 4
group species_1 species_2 species_3
<dbl> <dbl> <dbl> <dbl>
1 1 43.0 28.9 NA

If we want to modify the OP's function

estimate <- function(df, y_true, narm=TRUE) {

i1 <- colSums(is.na(df)) == nrow(df)


out <- sqrt(colSums((t(t(df) - y_true_test))^2,
na.rm= narm) / 3) / y_true_test * 100
out[i1] <- NA
out

}

-testing

> df %>%
+ group_by(group) %>%
+ group_modify( ~ as.data.frame.list(estimate(.,
y_true_test)))
# A tibble: 1 × 4
# Groups: group [1]
group species_1 species_2 species_3
<dbl> <dbl> <dbl> <dbl>
1 1 43.0 28.9 NA

Sum the last n non NA values in each column of a matrix in R

You can use apply with tail to sum up the last non NA like:

apply(x, 2, function(x) sum(tail(x[!is.na(x)], 3)))
#x1 x2 x3 x4 x5
#15 11 9 6 3

Classic case of `sum` returning NA because it doesn't sum NAs

Following Joshua Ulrich's comment, before saying that you have some overflow problem, you should answer these questions:

  1. How many elements are you summing? R can handle a BIG number of entries
  2. How big are the values in your vectors? Again, R can handle quite big numbers
  3. Are you summing integers or floats? If you are summing floating-point numbers, you can't have an integer overflow (floats are not integers)
  4. Do you have NAs in your data? If you sum anything with NAs present, the result will be NA, unless you handle it properly.

That said, some solutions:

  • Use sum(..., na.rm=T) to ignore NAs from your object (this is the simple solution)
  • Sum only non NA entries: sum(yourVector[!is.na(yourVector)] (the not so simple one)
  • If you are summing a column from a data frame, subset the data frame before summing: sum(subset(yourDataFrame, !is.na(columnToSum))[columnToSum]) (this is like using a cannon to kill a mosquito)

R: apply statement to take the sum of the number of non-NA values across multiple columns

Just use is.na and rowSums:

z <- rowSums(!is.na(y[,paste("diag", 1:11, sep="")]))

Count non-NA values by group

You can use this

mydf %>% group_by(col_1) %>% summarise(non_na_count = sum(!is.na(col_2)))

# A tibble: 2 x 2
col_1 non_na_count
<fctr> <int>
1 A 1
2 B 2

Create all possible combinations of non-NA values for each group ID

Grouped by 'ID', fill other columns, ungroup to remove the group attribute and keep the distinct rows

library(dplyr)
library(tidyr)
DF %>%
group_by(ID) %>%
fill(everything(), .direction = 'updown') %>%
ungroup %>%
distinct(.keep_all = TRUE)

Or may also be

DF %>% 
group_by(ID) %>%
mutate(across(everything(), ~ replace(., is.na(.),
rep(.[!is.na(.)], length.out = sum(is.na(.))))))

Or based on the comments

DF %>%
group_by(ID) %>%
mutate(across(where(~ any(is.na(.))), ~ {
i1 <- is.na(.)
ind <- which(i1)
i2 <- !i1
if(i1[1] == 1) rep(.[i2], each = n()/sum(i2)) else
rep(.[i2], length.out = n())
})) %>%
ungroup %>%
distinct(.keep_all = TRUE)

-output

# A tibble: 6 x 5
ID Col1 Col2 Col3 Col4
<int> <int> <int> <int> <int>
1 1 6 10 15 20
2 1 5 10 15 20
3 2 17 25 21 34
4 2 13 25 21 34
5 2 17 25 35 40
6 2 13 25 35 40


Related Topics



Leave a reply



Submit