R Equality While Ignoring Nas

R Equality while ignoring NAs

1 == NA returns a logical NA rather than TRUE or FALSE. If you want to call NA FALSE, you could add a second conditional:

set.seed(1)
x <- 1:10
x[4] <- NA
y <- sample(1:10, 10)

x <= y
# [1] TRUE TRUE TRUE NA FALSE TRUE TRUE FALSE TRUE FALSE

x <= y & !is.na(x)
# [1] TRUE TRUE TRUE FALSE FALSE TRUE TRUE FALSE TRUE FALSE

You could also use a second processing step to convert all the NA values from your equality test to FALSE.

foo <- x <= y
foo[is.na(foo)] <- FALSE
foo
# [1] TRUE TRUE TRUE FALSE FALSE TRUE TRUE FALSE TRUE FALSE

Also, for what its worth, NA == NA returns NA as does NA != NA.

How to treat NAs like values when comparing elementwise in R

Taking advantage of the fact that:

T & NA = NA
but
F & NA = F

and

F | NA = NA
but
T | NA = T

The following solution works, with carefully placed brackets:

(a != b | (is.na(a) & !is.na(b)) | (is.na(b) & !is.na(a))) & !(is.na(a) & is.na(b))

You could define:

`%!=na%` <- function(e1, e2) (e1 != e2 | (is.na(e1) & !is.na(e2)) | (is.na(e2) & !is.na(e1))) & !(is.na(e1) & is.na(e2))

and then use:

a %!=na% b

How to properly ignore NA when comparing two dates?

I think you can remove is.na() in your ifelse() statement, which will keep the missing information

df %>% 
group_by(id) %>%
mutate(Result = ifelse(Date1 < Date2, "Yes", "No"))

r, does not equal, nas are not included

We can include another condition in the filter function which will keep the NA values:

df %>%
filter(a != "B" | is.na(a))

# a
# 1 A
# 2 C
# 3 <NA>
# 4 C
# 5 A
# 6 <NA>
# 7 A

From ?NA

Logical computations treat NA as a missing `TRUE/FALSE value...

There's more to the explanation, but you can consult the help file.

Comparing Column Values With NA

To simplify things, let's first redefine the data frame with stringsAsFactors=FALSE:

df <- read.table(header = TRUE, text = "A B
NA TEST
TEST TEST
Abaxasdas Test", stringsAsFactors=FALSE)

You can compare the columns in a NA-safe way using identical:

mapply(identical, df$A, df$B)

To get the output with "YES" and "NO" instead of TRUE and FALSE:

ifelse(mapply(identical, df$A, df$B), "YES", "NO")

Output

> df$Output <- ifelse(mapply(identical, df$A, df$B), "YES", "NO")
> df
A B Output
1 <NA> TEST NO
2 TEST TEST YES
3 Abaxasdas Test NO

An alternative

As joran suggested in a comment, replacing NA's with a value would make the comparison easier. If you don't want to change the values in the data frame (but maybe you should!), you could use a helper function like this:

rna <- function(x) replace(x, is.na(x), "")
ifelse(rna(df$A)==rna(df$B), "YES", "NO")

Selecting rows where consecutive values change while ignoring NAs

I would recommend to change NA with other conventional representation of missing values, such as -9999. After this you can use your method which(x[-1] != x[-length(x)]) + 1, or try rle function from base R.

# Sample data
x = c(1, 2, NA, 3, 3, NA, 3, NA, NA, 4, 4)

# Replace missing values with -9999
x[is.na(x)] <- -9999

# Calculate position of non-equal consecutive values
cumsum(rle(x)$length) + 1

# NOTE: you will need to remove last element of the output

R: Using rnorm() ignoring NAs

There are NA elements in the columns. An option is to convert the NA to 0 and then then use that in the calculation for the mean value as any arithmetic operation (+) with NA returns NA. Also, the na.rm = TRUE in OP's code is doing nothing as it is an argument for the function mean and not for the parameter mean in rnorm

tmp <- replace(airquality, is.na(airquality), 0)
rnorm(n = nrow(tmp), mean = (50 + tmp$Ozone*1.2 +
tmp$Solar.R*0.5 + tmp$Wind*3 + tmp$Temp*0.2), sd = 5)

-output

[1] 230.54274 175.63684 191.09352 272.75306 108.56648 147.00571 263.74854 176.54437 138.34712 193.08597  91.91993 244.38106 246.75853 254.24543
[15] 151.91976 278.17564 291.79558 166.25150 291.45414 124.37602 91.79503 285.40843 107.71572 180.18917 133.19906 230.15677 85.62659 135.79484
[29] 290.33483 328.01940 268.59444 233.47771 240.75695 231.80327 182.09191 204.27378 239.15409 197.93795 222.74681 338.83640 305.89129 226.42353
[43] 221.24226 190.21195 269.85521 260.70751 223.62116 319.93789 135.87652 171.82266 174.49938 159.01576 102.26742 128.54807 211.20653 147.96291
[57] 147.09487 118.30181 137.67034 128.21919 155.11839 380.05157 276.10527 247.92156 142.26157 249.46309 308.70804 299.31816 344.92304 341.46871
[71] 276.41410 162.72329 255.91151 224.55619 253.61126 139.57793 284.49149 277.79043 313.16401 273.82248 284.77515 109.29104 219.85529 249.98464
[85] 331.40262 334.25997 155.60046 209.86921 308.24376 283.52677 287.17854 289.04658 173.97213 135.25243 148.04647 180.20284 135.46867 159.87413
[99] 354.59991 317.06474 324.99885 207.30217 176.03234 243.66570 262.85251 251.59681 127.67214 161.26074 174.00305 177.45175 247.33476 246.72993
[113] 262.35549 136.78726 226.31141 259.75610 397.83067 293.34740 156.55135 289.65580 322.11876 301.68719 280.98863 297.88105 276.24682 251.60182
[127] 290.87347 188.69579 200.89883 246.30607 242.00302 232.00063 250.95042 279.32326 268.79634 235.51977 130.42808 170.68919 258.11289 241.38551
[141] 122.56174 239.76333 206.85573 230.98431 120.80214 214.62313 126.46720 135.36000 213.49774 178.97910 216.34848 175.17351 231.16300

Another option is to multiply the columns separately with the weights and then get the sum with rowSums where we can use na.rm = TRUE

mn <- rowSums(transform(airquality, Ozone = Ozone * 1.2,
Solar.R = Solar.R * 0.5, Wind = Wind * 0.3,
Temp = Temp * 0.2)[c("Ozone", "Solar.R", "Wind", "Temp")], na.rm = TRUE)
rnorm(n = nrow(airquality), mean = mn, sd = 5)

Combining columns, while ignoring duplicates and NAs

if 'df1' is the output, then we remove the 'NA' that follows a - with sub

df1 %>% 
mutate(Var3 = sub("-NA", "", Var3))
# A tibble: 8 x 4
# id Var1 Var2 Var3
# <chr> <chr> <chr> <chr>
#1 A A1 A1 A1
#2 B F2 A2 A2-F2
#3 C <NA> A3 A3
#4 D A4-E9 A4 A4-E9
#5 E E5 A5 A5-E5
#6 F <NA> <NA> NA
#7 G B2-R4 A3-B2 A3-B2-R4
#8 H B3-B4 E1-G5 B3-B4-E1-G5

We can also do this slightly differently with tidyverse by gather into 'long' format, then split the 'value' column using separate_rows, grouped by 'id', summarise the 'Var3' column by pasteing the sorted unique elements of 'Var3' and left_join with the original dataset 'df'

library(tidyverse)
gather(df, key, value, -id) %>%
separate_rows(value) %>%
group_by(id) %>%
summarise(Var3 = paste(sort(unique(value)), collapse='-')) %>%
mutate(Var3 = replace(Var3, Var3=='', NA)) %>%
left_join(df, .)
# id Var1 Var2 Var3
#1 A A1 A1 A1
#2 B F2 A2 A2-F2
#3 C <NA> A3 A3
#4 D A4-E9 A4 A4-E9
#5 E E5 A5 A5-E5
#6 F <NA> <NA> <NA>
#7 G B2-R4 A3-B2 A3-B2-R4
#8 H B3-B4 E1-G5 B3-B4-E1-G5

NOTE: The %>% makes even a simple code to appear in multiple lines, but if required, we can put all those statements in a single line and term as one-liner


Here is a one-liner

library(data.table)
setDT(df)[, Var3 := paste(sort(unique(unlist(strsplit(unlist(.SD),"-")))), collapse="-"), id]

Logically compare columns in a data frame that contain NA?

This seems to work. Please let me know what you think!

mapply(identical, df_example$metgoal.ref, df_example$metgoal.tri)

Then you can use it to filter your original data. I added ! because you're interested in rows where these fields are NOT identical.

In tidy it might look like this

filter(df_example, !mapply(identical, df_example$metgoal.ref, df_example$metgoal.tri))

or in base

df_example[!mapply(identical, df_example$metgoal.ref, df_example$metgoal.tri),]


Related Topics



Leave a reply



Submit