Differencebetween <Na> and Na

What is the difference between NA and NA?

When you are dealing with factors, when the NA is wrapped in angled brackets ( <NA> ), that indicates thtat it is in fact NA.

When it is NA without brackets, then it is not NA, but rather a proper factor whose label is "NA"

# Note a 'real' NA and a string with the word "NA"
x <- factor(c("hello", NA, "world", "NA"))

x
[1] hello <NA> world NA
Levels: hello NA world <~~ The string appears as a level, the actual NA does not.

as.numeric(x)
[1] 1 NA 3 2 <~~ The string has a numeric value (here, 2, alphabetically)
The NA's numeric value is just NA

Edit to answer @Arun's question:

R is simply trying to distinguish between a string whose value are the two letters "NA" and an actual missing value, NA
Thus the difference you see when displaying df versus df$y. Example:

df <- data.frame(x=1:4, y=c("a", NA_character_, "c", "NA"), stringsAsFactors=FALSE)

Note the two different styles of NA:

> df
x y
1 1 a
2 2 <NA>
3 3 c
4 4 NA

However, if we look at just 'df$y'

[1] "a"  NA   "c"  "NA"

But, if we remove the quotation marks (similar to what we see when printing a data.frame to the console):

print(df$y, quote=FALSE)
[1] a <NA> c NA

And thus, we once again have the distinction of NA via the angled brackets.

What is the difference between na.omit and is.na?

In the call to equal.count, the object na.omit(algae$mnO2) will be those values in algae$mn02 that are not NA.

Now, say that you have this code for the plot:

stripplot(season ~ a3|minO2,data=na.omit(algae))

If there are any columns of algae that contain NA in rows where algae$mn02 is not NA, the rows will not line up, and the plot will not be as expected.

Here's an example where this will happen:

algae<- data.frame(a3=c(NA,1,2), mn02=c(1,2,NA))
algae
## a3 mn02
## 1 NA 1
## 2 1 2
## 3 2 NA

Note the difference between the following two expressions:

na.omit(algae)
## a3 mn02
## 2 1 2

algae[!is.na(algae$mn02),]
## a3 mn02
## 1 NA 1
## 2 1 2

The latter will line up with the shingle produced by equal.count(na.omit(algae$mn02)) but the former will not. The first expression here has one less row because there is an incomplete case where mn02 is not NA.

Note:

equal.count(na.omit(algae$mn02))
##
## Data:
## [1] 1 2

...

There are two elements here. This expression does not check for NA in columns other than mn02.

Difference between NA_real_ and NaN

Well. First off, remember that NA is an R concept that has no equivalent in C. So, by necessity, NA needs to be represented differently in C. The fact that .Internal(inspect()) does not make this distinction doesn’t mean it isn’t made elsewhere. In fact, it so happens that .Internal(inspect()) uses Rprintf to print the value’s internal double floating point representation. And, indeed, R NAs are encoded as an NaN value in a C floating point type.

Secondly, you observe that “their only difference is the memory address.” — So what? At least conceptually, distinct memory addresses are fully sufficient to distinguish NA and NaN, nothing more is required.

But as a matter of fact R distinguishes these values by a different route. This is possible because the IEEE 754 double precision floating point format has multiple different representations of NaN, and R reserves a specific one for NAs:

static double R_ValueOfNA(void)
{
/* The gcc shipping with Fedora 9 gets this wrong without
* the volatile declaration. Thanks to Marc Schwartz. */
volatile ieee_double x;
x.word[hw] = 0x7ff00000;
x.word[lw] = 1954;
return x.value;
}

and:

/* is a value known to be a NaN also an R NA? */
int attribute_hidden R_NaN_is_R_NA(double x)
{
ieee_double y;
y.value = x;
return (y.word[lw] == 1954);
}

int R_IsNA(double x)
{
return isnan(x) && R_NaN_is_R_NA(x);
}

int R_IsNaN(double x)
{
return isnan(x) && ! R_NaN_is_R_NA(x);
}

(src/main/arithmetic.c)

difference between first non-NA and last non-NA in each row

A vectorized way would be using max.col where we get "first" and "last" non-NA value using ties.method parameter

#Get column number of first and last col
first_col <- max.col(!is.na(df[x_cols]), ties.method = "first")
last_col <- max.col(!is.na(df[x_cols]), ties.method = "last")

#subset the dataframe to include only `"x"` cols
new_df <- as.data.frame(df[grep("^x", names(df))])

#Subtract last non-NA value with the first one
df$new_calc <- new_df[cbind(1:nrow(df), last_col)] -
new_df[cbind(1:nrow(df), first_col)]

Using apply you could do

x_cols <- grep("^x", names(df))

df$new_calc <- apply(df[x_cols], 1, function(x) {
new_x <- x[!is.na(x)]
if (length(new_x) > 0)
new_x[length(new_x)] - new_x[1L]
else NA
})

Difference between complete.cases and !is.na

For an atomic vector, complete.cases and is.na will be identical. For more complex objects this will not be the case.

Eg, for, a data.frame is.na.data.frame will return a logical matrix of the same dimension as the input.

test <- data.frame(a, b =1)

is.na(test)
# a b
# [1,] FALSE FALSE
# [2,] FALSE FALSE
# [3,] FALSE FALSE
# [4,] TRUE FALSE
# [5,] FALSE FALSE
#[6,] FALSE FALSE
complete.cases(test)
# [1] TRUE TRUE TRUE FALSE TRUE TRUE


Related Topics



Leave a reply



Submit