How to Extend '==' Behavior to Vectors That Include Nas

How to extend `==` behavior to vectors that include NAs?

Another option, but is it better than mapply('%in%', a , b)?:

(!is.na(a) & !is.na(b) & a==b) | (is.na(a) & is.na(b))

Following @AnthonyDamico 's suggestion, creation of the "mutt" operator:

"%==%" <- function(a, b) (!is.na(a) & !is.na(b) & a==b) | (is.na(a) & is.na(b))

Edit: or, slightly different and shorter version by @Frank (which is also more efficient)

"%==%" <- function(a, b) (is.na(a) & is.na(b)) | (!is.na(eq <- a==b) & eq)

With the different examples:

a <- c( 1 , 2 , 3 )
b <- c( 1 , 2 , 4 )
a %==% b
# [1]  TRUE  TRUE FALSE

a <- c( 1 , NA , 3 )
b <- c( 1 , NA , 4 )
a %==% b
# [1]  TRUE  TRUE FALSE

a <- c( 1 , NA , 3 )
b <- c( 1 , 2 , 4 )
a %==% b
#[1]  TRUE FALSE FALSE

a <- c( 1 , NA , 3 )
b <- c( 3 , NA , 1 )
a %==% b
#[1] FALSE  TRUE FALSE

aggregate across multiple vectors, retain entries that only have NAs for particular vectors

aggregate(.~site + horizon,data=data,FUN=mean, na.action=na.pass)

Replacing NAs in R with nearest value

Here is a very fast one. It uses findInterval to find what two positions should be considered for each NA in your original data:

f1 <- function(dat) {
  N <- length(dat)
  na.pos <- which(is.na(dat))
  if (length(na.pos) %in% c(0, N)) {
    return(dat)
  }
  non.na.pos <- which(!is.na(dat))
  intervals  <- findInterval(na.pos, non.na.pos,
                             all.inside = TRUE)
  left.pos   <- non.na.pos[pmax(1, intervals)]
  right.pos  <- non.na.pos[pmin(N, intervals+1)]
  left.dist  <- na.pos - left.pos
  right.dist <- right.pos - na.pos

  dat[na.pos] <- ifelse(left.dist <= right.dist,
                        dat[left.pos], dat[right.pos])
  return(dat)
}

And here I test it:

# sample data, suggested by @JeffAllen
dat <- as.integer(runif(50000, min=0, max=10))
dat[dat==0] <- NA

# computation times
system.time(r0 <- f0(dat))    # your function
# user  system elapsed 
# 5.52    0.00    5.52
system.time(r1 <- f1(dat))    # this function
# user  system elapsed 
# 0.01    0.00    0.03
identical(r0, r1)
# [1] TRUE

Detect change from previous rows with missing values - speed up for loop - R

Here's an alternative approach, which removes any rows with NAs, performs some calculations and joins back the NA rows in the right place.

library(tidyverse)
library(zoo)

# example data
test <- data.frame(resp = c(9, NA, NA, 11, NA, NA, 6, 16, NA, 12, 0, 0, 0, 0, 0, NA, 0, 11, NA, NA, NA, NA, NA, NA, 14))

# add an id for each row
test = test %>% mutate(id = row_number())

test %>%
  na.omit() %>%                                                               # exclude rows with NAs
  mutate(flag = case_when(resp == lag(resp, default = first(resp)) ~ 0,
                          resp > lag(resp, default = first(resp)) ~ 1,
                          resp < lag(resp, default = first(resp)) ~ -1)) %>%  # check relationship between current and previous value
  mutate(g = cumsum(flag != lag(flag, default = first(flag)))) %>%            # create a grouping based on change in flag column
  group_by(g) %>%                                                             # for each group
  mutate(change = ifelse(flag != 0, flag * row_number(), flag)) %>%           # calculate the change column
  ungroup() %>%                                                               # forget the grouping
  select(id, change) %>%                                                      # keep useful columns
  right_join(test, by="id") %>%                                               # join back to get NA rows in the right place
  select(resp, change)                                                        # keep useful columns

As a result you'll get:

#    resp change
# 1     9      0
# 2    NA     NA
# 3    NA     NA
# 4    11      1
# 5    NA     NA
# 6    NA     NA
# 7     6     -1
# 8    16      1
# 9    NA     NA
# 10   12     -1
# 11    0     -2
# 12    0      0
# 13    0      0
# 14    0      0
# 15    0      0
# 16   NA     NA
# 17    0      0
# 18   11      1
# 19   NA     NA
# 20   NA     NA
# 21   NA     NA
# 22   NA     NA
# 23   NA     NA
# 24   NA     NA
# 25   14      2

Convert Vector to Matrix without Recycling

You can't turn recycling off, but you can do some manipulations to the vector before you form the matrix. We can extend the length of the vector based on what the dimensions of the matrix will be. The length<- replacement function will pad the vector with NA up to the desired length.

x <- 1:11
length(x) <- prod(dim(matrix(x, ncol = 2)))
## you will get a warning here unless suppressWarnings() is used
matrix(x, ncol = 2, byrow = TRUE)
#      [,1] [,2]
# [1,]    1    2
# [2,]    3    4
# [3,]    5    6
# [4,]    7    8
# [5,]    9   10
# [6,]   11   NA

Why pmax(dataFrame, int) would introduce NAs?

pmax is not designed to be used with data.frame input.

The error is introduced in line 35 of pmax:

mmm[change] <- each[change]

because each is defined to be as long as the length of the input, which for a data.frame is the number of columns. Therefore when it tries to address the 5th element, it gets NA.

each
[1] 6 6 6 6
each[change]
[1]  6  6  6  6 NA

The obvious workaround is to convert to data.frame after using pmax:

data.frame(pmax(matrix(1:16, nrow=4), c(6)))
  X1 X2 X3 X4
1  6  6  9 13
2  6  6 10 14
3  6  7 11 15
4  6  8 12 16

Or convert back and forth as required.

`x^(1/3)` behaves differently for negative scalar `x` and vector `x` with negative values

I'm not looking for a workaround e.g. function(x) {sign(x) * (abs(x)) ^ (1/3)}.

I'm interested in an answer that explains what is happening differently to the vector than to the negative value when provided as a numeric scalar.

how does the ^ operator think differently about vectors and scalars?

You seem to believe that c(-0.2, 1)^(1/3) translates to c(-0.2^(1/3), 1^(1/3)). This is incorrect. Operator ^ is actually a function, that is, (a) ^ (b) is as same as "^"(a, b). Therefore, the correct interpretation goes as follows:

   c(-0.2, 1)^(1/3)
=> "^"(c(-0.2, 1), 1/3)
=> c( "^"(-0.2, 1/3), "^"(1, 1/3) )
=> c( (-0.2)^(1/3), (1)^(1/3) )
=> c( NaN, 1 )

Now, why doesn't -0.2^(1/3) give NaN? Because ^ has higher operation precedence than +, -, * and /. So as it is written, it really implies -(0.2^(1/3)) instead of (-0.2)^(1/3).

The lesson is that, to avoid buggy code, write your code as (a) ^ (b) instead of just a ^ b.

Additional Remark:

I often compare ^ and : when teaching R to my students, because they have different behaviors. But they all show the importance of protecting operands with brackets.

(-1):2
#[1] -1  0  1  2

-1:2
#[1] -1  0  1  2

-(1:2)
#[1] -1 -2

2*3:10
#[1]  6  8 10 12 14 16 18 20

(2*3):10
#[1]  6  7  8  9 10

2*(3:10)
#[1]  6  8 10 12 14 16 18 20

See ?Syntax for details of operator precedence.

R Convert NA's only after the first non-zero value

Easy to do using match() and numeric indices:

use match() to find the first occurence of a non-NA value
use which() to convert the logical vector from is.na() to a numeric index
use that information to find the correct positions in x

Hence:

x <- c(NA,NA,NA,1,2,3,NA,NA,4,5,NA)
isna <- is.na(x)
nonna <- match(FALSE,isna)
id <- which(isna)
x[id[id>nonna]] <- 0

gives:

> x
 [1] NA NA NA  1  2  3  0  0  4  5  0

Force a std::vector to free its memory?

Use the swap trick:

#include <vector>

template <typename T>
void FreeAll( T & t ) {
    T tmp;
    t.swap( tmp );
}

int main() {
    std::vector <int> v;
    v.push_back( 1 );
    FreeAll( v );
}

How to Extend '==' Behavior to Vectors That Include Nas