How to Set Na.Rm to True Globally

Is it possible to set na.rm to TRUE globally?

It is not possible to change na.rm to TRUE globally. (See Hong Ooi's comment under the question.)

EDIT:

Unfortunately, the answer you don't want is the only one that works
generally. There's no global option for this like there is for
na.action, which only affects modeling functions like lm, glm, etc
(and even there, it isn't guaranteed to work in all cases). – Hong
Ooi Jul 2 '13 at 6:23

How to set na.rm = TRUE within R glue syntax

This is a great question.

One of the difficult things tbl_summary() tries to do is guess how many digits to round statistics to. This example is tricky because you'll often want the mean and the CV to be shown at different levels of precision since they are not on the same scale. In the example below, tbl_summary() guessed that age should be shown rounded to the nearest integer. This is a reasonable assumption for the mean and SD, but we need to tell it to show more digits for CV. To do this, we use the tbl_summary(digits=) argument. There are three statistics being shown for age (mean, sd, and cv), and we'll pass a vector of length three, indicating how many digits to round each statistic to.

library(goeveg)
library(gtsummary)

trial %>%
select(age, marker) %>%
tbl_summary(
statistic = list(all_continuous() ~ " {mean} ± {sd} [{cv}]"),
digits = list(age ~ c(0, 0, 2))
)

Sample Image

Hope this answers your question! Happy Coding!

Saving na.rm=TRUE for each function in dplyr

You should use summarise_at, which lets you compute multiple functions for the supplied columns and set arguments that are shared among them:

df %>% group_by(group) %>% 
summarise_at("value",
funs(mean = mean, sd = sd, min = min),
na.rm = TRUE)

Is there a difference between na.rm = FALSE and na.ram = na.rm?

Many base functions (base as in base R or base to any particular package) accept the argument na.rm=, where the default is often FALSE. (Some functions use useNA= or na.action, depending on different actions, but we'll ignore those.)

Higher-level functions (user-defined and/or other packages) might also define this argument and then pass it on to the other functions. For example:

parent_func <- function(x, ..., na.rm = FALSE) {
# something important
mu <- mean(x, na.rm = na.rm)
sigma <- sd(x, na.rm = na.rm)
(mu - x) / sigma
}

One premise being that if you intend to remove/ignore NA values for one portion of the function, you might use it in other places (or all).

In this case, in the call to mean(x, na.rm = na.rm), the left na.rm is referring to the argument named na.rm in the definition of mean. The right na.rm is referring to the same-named argument of parent_func.

An alternative way to define this parent function (for the sake of differentiating variables) could be:

parent_func <- function(x, ..., NARM = FALSE) {
# something important
mu <- mean(x, na.rm = NARM)
sigma <- sd(x, na.rm = NARM)
(mu - x) / sigma
}

The advantage of using na.rm= instead of this NARM= is likely consistency (though that is not always one of R's strengths across all functions). Many users are likely more intuitively familiar with the na.rm= argument name, purpose, and effect than something else.

Edit:

I'm seeing that it is better practice to do function(x, na.rm = FALSE) {} in general to allow the user to change it and to be consistent with default settings for sum and mean. Is this correct?

I believe so. In general I find that removal of missing data should be an explicit act by the user, not a default by the function. That is, if having missing data indicates a larger problem, then defaulting to na.rm=FALSE will quickly indicate to the user that something is wrong; na.rm=TRUE will mask this problem and suggest valid results when perhaps there should be no NAs at all. This holds true for the "smaller" functions (e.g., mean, sum) and so its logic should be carried outwards to the encapsulating functions.

Adding na.rm as a parameter to a custom function in r

Just add na.rm = TRUE to your calls

Normalize.Final.Score.fun <- function(x) {
5 * (x - min(x, na.rm = TRUE)) / (max(x, na.rm = TRUE) - min(x, na.rm = TRUE))
}

Summarizing data with na.rm = TRUE

The issue is you are evaluating

min(NA, na.rm=TRUE)
# Inf

for row 3, which leads to it being

dput(temp$DATE[3])
# structure(Inf, class = "Date")

Add is.finite to your mutate

temp %>% 
mutate(DATE_lgl = is.finite(DATE) | is.na(DATE) # Identify dates that are missing/NA)

# A tibble: 3 x 3
# CHAR DATE DATE_lgl
# <chr> <date> <lgl>
# 1 A 2009-01-01 TRUE
# 2 B 2010-01-01 TRUE
# 3 C NA FALSE

Printing NA is likely a printing limitation of Date class

as.Date(Inf, origin="1970-01-01")
# NA
dput(as.Date(Inf, origin="1970-01-01"))
# structure(Inf, class = "Date")


Related Topics



Leave a reply



Submit