There is pmin and pmax each taking na.rm, why no psum?
Following @JoshUlrich's comment on the previous question,
psum <- function(...,na.rm=FALSE) {
rowSums(do.call(cbind,list(...)),na.rm=na.rm) }
edit: from Sven Hohenstein:
psum2 <- function(...,na.rm=FALSE) {
dat <- do.call(cbind,list(...))
res <- rowSums(dat, na.rm=na.rm)
idx_na <- !rowSums(!is.na(dat))
res[idx_na] <- NA
res
}
x = c(1,3,NA,5,NA)
y = c(2,NA,4,1,NA)
z = c(1,2,3,4,NA)
psum(x,y,na.rm=TRUE)
## [1] 3 3 4 6 0
psum2(x,y,na.rm=TRUE)
## [1] 3 3 4 6 NA
n = 1e7
x = sample(c(1:10,NA),n,replace=TRUE)
y = sample(c(1:10,NA),n,replace=TRUE)
z = sample(c(1:10,NA),n,replace=TRUE)
library(rbenchmark)
benchmark(psum(x,y,z,na.rm=TRUE),
psum2(x,y,z,na.rm=TRUE),
pmin(x,y,z,na.rm=TRUE),
pmax(x,y,z,na.rm=TRUE), replications=20)
## test replications elapsed relative
## 4 pmax(x, y, z, na.rm = TRUE) 20 26.114 1.019
## 3 pmin(x, y, z, na.rm = TRUE) 20 25.632 1.000
## 2 psum2(x, y, z, na.rm = TRUE) 20 164.476 6.417
## 1 psum(x, y, z, na.rm = TRUE) 20 63.719 2.486
Sven's version (which arguably is the correct one) is quite a bit slower,
although whether it matters obviously depends on the application.
Anyone want to hack up an inline/Rcpp version?
As for why this doesn't exist: don't know, but good luck getting R-core to make additions like this ... I can't offhand think of a sufficiently widespread *misc
package into which this could go ...
Follow up thread by Matthew on r-devel is here (which seems to confirm) :
r-devel: There is pmin and pmax each taking na.rm, how about psum?
R: pmax() function to ignore NA's?
The issue is with sort
as it removes the NA by default or else we have to specify na.last = TRUE
which may also not be the case we need. One option is order
winsor1 <- function(x, probability){
numWin <- ceiling(length(x)*probability)
# Replace first lower, then upper
x1 <- x[order(x)]
x <- pmax(x, x1[numWin+1])
x1 <- x1[order(x1)]
x <- pmin(x, x1[length(x)-numWin], na.rm = TRUE)
return(x)
}
-testing
x <- 0:10
winsor1(x, probability=0.01)
#[1] 1 1 2 3 4 5 6 7 8 9 9
x[5] <- NA
winsor1(x, probability=0.01)
#[1] 1 1 2 3 NA 5 6 7 8 9 10
or with na.last
in sort
winsor1 <- function(x, probability){
numWin <- ceiling(length(x)*probability)
# Replace first lower, then upper
x <- pmax(x, sort(x, na.last = TRUE)[numWin+1])
x <- pmin(x, sort(x, na.last = TRUE)[length(x)-numWin], na.rm = TRUE)
return(x)
}
Using pmax/pmin with vector of variable string names in R
We may use invoke
(similar to do.call
in base R
) with across
library(purrr)
library(dplyr)
out <- mtcars %>%
mutate(maxval = invoke(pmax, c(across(all_of(values)), na.rm = TRUE)))
# or use do.call
# mutate(maxval = do.call(pmax, c(across(all_of(values)), na.rm = TRUE)))
-output
> head(out)
mpg cyl disp hp drat wt qsec vs am gear carb maxval
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 3.900
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 3.900
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 3.850
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 3.215
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 3.440
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 3.460
Or may use exec
as well
out2 <- mtcars %>%
mutate(maxval = exec(pmax, !!! rlang::syms(values), na.rm = TRUE))
-output
> head(out2)
mpg cyl disp hp drat wt qsec vs am gear carb maxval
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 3.900
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 3.900
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 3.850
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 3.215
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 3.440
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 3.460
Using na.rm=T in pmin with do.call
Concatenate the na.rm = TRUE
as a named list
element and then use pmin
with do.call
so that the parameter na.rm
will be found
do.call(pmin, c(mylist, list(na.rm = TRUE)))
# [,1] [,2]
#[1,] -1.0830716 -0.1237099
#[2,] -0.5949517 -3.7873790
#[3,] -2.1003236 -1.2565663
#[4,] -0.4500171 -1.0588205
#[5,] -1.0937602 -1.0537657
counting the occurrence of NA's in multiple vectors in r
Your question isn't well-phrased, but it looks like you want the result of colSums
used with rbind
and is.na
:
> colSums(is.na(rbind(c1, c2, c3)))
[1] 2 0 1 0 0
A + B with NA in r
You can always implement your own sum:
mysum <- function(...) {
plus <- function(x, y) {
ifelse(is.na(x), 0, x) + ifelse(is.na(y), 0, y)
}
Reduce(plus, list(...))
}
A <- c(NA,2,3,4,5)
B <- c(1,2,3,4,NA)
mysum(A, B)
#> [1] 1 4 6 8 5
mysum(A, A)
#> [1] 0 4 6 8 10
mysum(A, B, A, B)
#> [1] 2 8 12 16 10
Created on 2020-03-09 by the reprex package (v0.3.0)
max(x, na.rm = TRUE) returns NA anyway
This does exactly what I expected:
max(e[e != 0 & val_hps < 0])
@giusti and @Frank: thanks!
Return pmin or pmax of data.frame with multiple columns
Use do.call
to call pmax
to compare all the columns together for each row value, e.g.:
dat <- data.frame(a=1:5,b=rep(3,5))
# a b
#1 1 3
#2 2 3
#3 3 3
#4 4 3
#5 5 3
do.call(pmax,dat)
#[1] 3 3 3 4 5
When you call pmax
on an entire data.frame directly, it only has one argument passed to the function and nothing to compare it to. So, it just returns the supplied argument as it must be the maximum. It works for non-numeric and numeric arguments, even though it may not make much sense:
pmax(7)
#[1] 7
pmax("a")
#[1] "a"
pmax(data.frame(1,2,3))
# X1 X2 X3
#1 1 2 3
Using do.call(pmax,...)
with a data.frame means you pass each column of the data.frame as a list of arguments to pmax
:
do.call(pmax,dat)
is thus equivalent to:
pmax(dat$a, dat$b)
Related Topics
Data.Frame Without Ruining Column Names
How to Delete Columns That Contain Only Nas
Dynamic Column Names in Data.Table
Why Does As.Factor Return a Character When Used Inside Apply
Filter Function in Dplyr Errors: Object 'Name' Not Found
Compile R Script into Standalone .Exe File
Way to Securely Give a Password to R Application from the Terminal
Mutate Multiple Columns in a Dataframe
Render Dropdown for Single Column in Dt Shiny
Finding Out Which Functions Are Called Within a Given Function
Solution. How to Install_Github When There Is a Proxy
Is There a Way of Manipulating Ggplot Scale Breaks and Labels
How Subset a Data Frame by a Factor and Repeat a Plot for Each Subset
Global Variables in Packages in R