There Is Pmin and Pmax Each Taking Na.Rm, Why No Psum

There is pmin and pmax each taking na.rm, why no psum?

Following @JoshUlrich's comment on the previous question,

psum <- function(...,na.rm=FALSE) { 
rowSums(do.call(cbind,list(...)),na.rm=na.rm) }

edit: from Sven Hohenstein:

psum2 <- function(...,na.rm=FALSE) { 
dat <- do.call(cbind,list(...))
res <- rowSums(dat, na.rm=na.rm)
idx_na <- !rowSums(!is.na(dat))
res[idx_na] <- NA
res
}

x = c(1,3,NA,5,NA)
y = c(2,NA,4,1,NA)
z = c(1,2,3,4,NA)

psum(x,y,na.rm=TRUE)
## [1] 3 3 4 6 0
psum2(x,y,na.rm=TRUE)
## [1] 3 3 4 6 NA

n = 1e7
x = sample(c(1:10,NA),n,replace=TRUE)
y = sample(c(1:10,NA),n,replace=TRUE)
z = sample(c(1:10,NA),n,replace=TRUE)

library(rbenchmark)
benchmark(psum(x,y,z,na.rm=TRUE),
psum2(x,y,z,na.rm=TRUE),
pmin(x,y,z,na.rm=TRUE),
pmax(x,y,z,na.rm=TRUE), replications=20)

## test replications elapsed relative
## 4 pmax(x, y, z, na.rm = TRUE) 20 26.114 1.019
## 3 pmin(x, y, z, na.rm = TRUE) 20 25.632 1.000
## 2 psum2(x, y, z, na.rm = TRUE) 20 164.476 6.417
## 1 psum(x, y, z, na.rm = TRUE) 20 63.719 2.486

Sven's version (which arguably is the correct one) is quite a bit slower,
although whether it matters obviously depends on the application.
Anyone want to hack up an inline/Rcpp version?

As for why this doesn't exist: don't know, but good luck getting R-core to make additions like this ... I can't offhand think of a sufficiently widespread *misc package into which this could go ...

Follow up thread by Matthew on r-devel is here (which seems to confirm) :

r-devel: There is pmin and pmax each taking na.rm, how about psum?

R: pmax() function to ignore NA's?

The issue is with sort as it removes the NA by default or else we have to specify na.last = TRUE which may also not be the case we need. One option is order

winsor1 <- function(x, probability){

numWin <- ceiling(length(x)*probability)

# Replace first lower, then upper
x1 <- x[order(x)]
x <- pmax(x, x1[numWin+1])
x1 <- x1[order(x1)]
x <- pmin(x, x1[length(x)-numWin], na.rm = TRUE)

return(x)
}

-testing

x <- 0:10
winsor1(x, probability=0.01)
#[1] 1 1 2 3 4 5 6 7 8 9 9

x[5] <- NA
winsor1(x, probability=0.01)
#[1] 1 1 2 3 NA 5 6 7 8 9 10

or with na.last in sort

winsor1 <- function(x, probability){

numWin <- ceiling(length(x)*probability)

# Replace first lower, then upper
x <- pmax(x, sort(x, na.last = TRUE)[numWin+1])
x <- pmin(x, sort(x, na.last = TRUE)[length(x)-numWin], na.rm = TRUE)

return(x)
}

Using pmax/pmin with vector of variable string names in R

We may use invoke (similar to do.call in base R) with across

library(purrr)
library(dplyr)
out <- mtcars %>%
mutate(maxval = invoke(pmax, c(across(all_of(values)), na.rm = TRUE)))
# or use do.call
# mutate(maxval = do.call(pmax, c(across(all_of(values)), na.rm = TRUE)))

-output

> head(out)
mpg cyl disp hp drat wt qsec vs am gear carb maxval
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 3.900
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 3.900
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 3.850
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 3.215
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 3.440
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 3.460

Or may use exec as well

out2 <- mtcars %>%
mutate(maxval = exec(pmax, !!! rlang::syms(values), na.rm = TRUE))

-output

> head(out2)
mpg cyl disp hp drat wt qsec vs am gear carb maxval
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 3.900
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 3.900
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 3.850
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 3.215
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 3.440
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 3.460

Using na.rm=T in pmin with do.call

Concatenate the na.rm = TRUE as a named list element and then use pmin with do.call so that the parameter na.rm will be found

do.call(pmin, c(mylist, list(na.rm = TRUE)))
# [,1] [,2]
#[1,] -1.0830716 -0.1237099
#[2,] -0.5949517 -3.7873790
#[3,] -2.1003236 -1.2565663
#[4,] -0.4500171 -1.0588205
#[5,] -1.0937602 -1.0537657

counting the occurrence of NA's in multiple vectors in r

Your question isn't well-phrased, but it looks like you want the result of colSums used with rbind and is.na:

> colSums(is.na(rbind(c1, c2, c3)))
[1] 2 0 1 0 0

A + B with NA in r

You can always implement your own sum:

mysum <- function(...) {
plus <- function(x, y) {
ifelse(is.na(x), 0, x) + ifelse(is.na(y), 0, y)
}
Reduce(plus, list(...))
}

A <- c(NA,2,3,4,5)
B <- c(1,2,3,4,NA)

mysum(A, B)
#> [1] 1 4 6 8 5
mysum(A, A)
#> [1] 0 4 6 8 10
mysum(A, B, A, B)
#> [1] 2 8 12 16 10

Created on 2020-03-09 by the reprex package (v0.3.0)

max(x, na.rm = TRUE) returns NA anyway

This does exactly what I expected:

max(e[e != 0 & val_hps < 0])

@giusti and @Frank: thanks!

Return pmin or pmax of data.frame with multiple columns

Use do.call to call pmax to compare all the columns together for each row value, e.g.:

dat <- data.frame(a=1:5,b=rep(3,5))

# a b
#1 1 3
#2 2 3
#3 3 3
#4 4 3
#5 5 3

do.call(pmax,dat)
#[1] 3 3 3 4 5

When you call pmax on an entire data.frame directly, it only has one argument passed to the function and nothing to compare it to. So, it just returns the supplied argument as it must be the maximum. It works for non-numeric and numeric arguments, even though it may not make much sense:

pmax(7)
#[1] 7

pmax("a")
#[1] "a"

pmax(data.frame(1,2,3))
# X1 X2 X3
#1 1 2 3

Using do.call(pmax,...) with a data.frame means you pass each column of the data.frame as a list of arguments to pmax:

do.call(pmax,dat) 

is thus equivalent to:

pmax(dat$a, dat$b)


Related Topics



Leave a reply



Submit