NaN is removed when using na.rm=TRUE
It's a language decision:
> is.na(NaN)
[1] TRUE
is.nan
differentiates:
> is.nan(NaN)
[1] TRUE
> is.nan(NA)
[1] FALSE
So you may need to call both.
R: Why does mean(NA, na.rm = TRUE) return NaN
It is a bit pity that ?mean
does not say anything about this. My comment only told you that applying mean
on an empty "numeric" results in NaN
without more reasoning. Rui Barradas's comment tried to reason this but was not accurate, as division by 0
is not always NaN
, it can be Inf
or -Inf
. I once discussed about this in R: element-wise matrix division. However, we are getting close. Although mean(x)
is not coded by sum(x) / length(x)
, this mathematical fact really explains this NaN
.
From ?sum:
*NB:* the sum of an empty set is zero, by definition.
So sum(numeric(0))
is 0
. As length(numeric(0))
is 0
, mean(numeric(0))
is 0 / 0
which is NaN
.
mean of c(NA, NA) differs between na.rm = TRUE and na.rm = FALSE why?
mean(c(NA,NA), na.rm = TRUE)
For this function, NA is removed, and there will be no element left in the vector. Hence, the way R calculate will be 0/0 => NaN
mean(c(NA,NA), na.rm = FALSE)
For this function, NA is not removed, and the mean function will be applied to the vector c(NA,NA). Hence, it will be (NA + NA)/2 => NA
Using rollmean filtering out NA with threshold
1) Define a function which returns NaN if there are thresh
or more NA's in its input and returns mean of the non-NA's otherwise. Then use it with rollapply
. Convert that to a data frame if desired using as.data.frame
but since the data is entirely numeric leaving it as a matrix may be sufficient.
w <- 5
thresh <- w/2
Mean <- function(x, thresh) if (sum(is.na(x)) > thresh) NaN else mean(x,na.rm=TRUE)
rollapply(df, w, Mean, thresh = thresh, fill = NA)
2) Another possibility is to check if there are more than thresh NA's in each cell and if so return NaN and otherwise return the rolling mean. Again use as.data.frame
on the result if a data frame is needed. (1) has the advantage over this one that it only calls roll*
once instead of twice.
w <- 5
thresh <- w/2
ifelse(rollsum(is.na(df), w, fill = NA) > thresh, NaN,
rollmean(df, w, na.rm = TRUE, fill = NA))
Error: missing values and NaN's not allowed if 'na.rm' is FALSE
Your problem has nothing to do with the use of lm
, but inside splines::ns
when generating B-spline basis for natural cubic splines. Very likely your Month
is a character variable, and you can not use as.numeric
for coercing.
I just checked your attached figure. The x-axis in the plots verifies what I guessed. You need to use 1:12 for Month
, not "JAN", "FEB", etc.
Remove NA values from a vector
Trying ?max
, you'll see that it actually has a na.rm =
argument, set by default to FALSE
. (That's the common default for many other R functions, including sum()
, mean()
, etc.)
Setting na.rm=TRUE
does just what you're asking for:
d <- c(1, 100, NA, 10)
max(d, na.rm=TRUE)
If you do want to remove all of the NA
s, use this idiom instead:
d <- d[!is.na(d)]
A final note: Other functions (e.g. table()
, lm()
, and sort()
) have NA
-related arguments that use different names (and offer different options). So if NA
's cause you problems in a function call, it's worth checking for a built-in solution among the function's arguments. I've found there's usually one already there.
R - How to remove missing values and Nan in Dplyr Summarize function?
Try to set na.rm
to TRUE
:
trafficdata %>%
group_by(Platform) %>%
summarise(quantile = scales::percent(c(0.25, 0.5, 0.75)),
calCTRLPV = quantile(calCTRLPV, c(0.25, 0.5, 0.75), na.rm = TRUE))
Assign 0 to Nan, NA and Inf
I included at the start and it removed the NaN
, NA
, Inf
results/Not Sized column before starting the group and
dataframe <− na.omit(dataframe)
dataframe %>%
group_by(Size) %>%
summarise(Mean_Managers = mean(Total_Managers, na.rm = TRUE),
Median_Managers = median(Total_Managers, na.rm = TRUE),
Max_Employeess = max(Total_Employees, na.rm = TRUE))
Related Topics
Removing a Group of Words from a Character Vector
R Cmd Check Latex Error: Fatal PDFlatex - Gui Framework Cannot Be Initialized
Preview a Saved Png in an R Device Window
Does R-Server or Shiny Server Create a New R Process/Instance for Each User
Why Is 'Unlist(Lapply)' Faster Than 'Sapply'
No Dimensions of Non-Empty Numeric Vector in R
How to Correctly 'Dput' a Fitted Linear Model (By 'Lm') to an Ascii File and Recreate It Later
How to Always Suppress Messages in R
Manually Colouring Plots with 'Scale_Fill_Manual' in Ggplot2 Not Working
How to Install Tidyverse on Ubuntu 16.04 and 17.04
R Looping Through in Survey Package
Knitr Inline Chunk Options (No Evaluation) or Just Render Highlighted Code
How to Select All Factor Variables in R
How to Change the Size of the Strip on Facets in a Ggplot