Replacing Nas in R with Nearest Value

Replacing NAs in R with nearest value

Here is a very fast one. It uses findInterval to find what two positions should be considered for each NA in your original data:

f1 <- function(dat) {
  N <- length(dat)
  na.pos <- which(is.na(dat))
  if (length(na.pos) %in% c(0, N)) {
    return(dat)
  }
  non.na.pos <- which(!is.na(dat))
  intervals  <- findInterval(na.pos, non.na.pos,
                             all.inside = TRUE)
  left.pos   <- non.na.pos[pmax(1, intervals)]
  right.pos  <- non.na.pos[pmin(N, intervals+1)]
  left.dist  <- na.pos - left.pos
  right.dist <- right.pos - na.pos

  dat[na.pos] <- ifelse(left.dist <= right.dist,
                        dat[left.pos], dat[right.pos])
  return(dat)
}

And here I test it:

# sample data, suggested by @JeffAllen
dat <- as.integer(runif(50000, min=0, max=10))
dat[dat==0] <- NA

# computation times
system.time(r0 <- f0(dat))    # your function
# user  system elapsed 
# 5.52    0.00    5.52
system.time(r1 <- f1(dat))    # this function
# user  system elapsed 
# 0.01    0.00    0.03
identical(r0, r1)
# [1] TRUE

Replace NA with the nearest value based on another variable, while keeping NA for observation which doesn't have non-missing neighbour

One option would be to make use of case_when from tidyverse. Essentially, if the previous row has a closer year and is not NA, then return x from that row. If not, then choose the row below. Or if the year is closer above but there is an NA, then return the row below. Then, same for if the row below has a closer year, but has an NA, then return the row above. If a row does not have an NA, then just return x.

library(tidyverse)

dat %>%
  mutate(x = case_when(is.na(x) & !is.na(lag(x)) & year - lag(year) < lead(year) - year ~ lag(x),
                       is.na(x) & !is.na(lead(x)) & year - lag(year) > lead(year) - year ~ lead(x),
                       is.na(x) & is.na(lag(x)) ~ lead(x),
                       is.na(x) & is.na(lead(x)) ~ lag(x),
                       TRUE ~ x))

Output

Replacing NA values by nearest value and factor

An option using nearest rolling from data.table:

cols <- paste0("V", 1L:6L)
for (x in cols) {
    DT[is.na(get(x)), (x) := 
        DT[!is.na(get(x))][.SD, on=.(factorID, RDate), roll="nearest", get(paste0("x.",x))]]
}

output:

   factorID       Date RDate  V1  V2  V3   V4  V5  V6
1:        1 1989-02-06  6976 318 351 172  570 260 108
2:        1 1989-05-13  7072  77 305 591  835 801 550
3:        1 1989-05-29  7088  77 305 591  835 801 550
4:        1 1989-06-14  7104 252 305 286  835 271  85
5:        2 1989-02-06  6976 236 389 323 2078 908 373
6:        2 1989-05-13  7072  77  62 591 2001 801 550
7:        2 1989-05-29  7088  55  62 410 2001 801 550
8:        2 1989-06-14  7104 351 508 456 1618 780 421

data:

library(data.table)
DT <- fread("factorID   Date         RDate   V1   V2   V3   V4   V5   V6
1          1989-02-06   6976    318  351  172  570  260  108
1          1989-05-13   7072    77   NA   591  NA   801  550
1          1989-05-29   7088    NA   NA   NA   NA   NA   NA
1          1989-06-14   7104    252  305  286  835  271  85
2          1989-02-06   6976    236  389  323  2078 908  373
2          1989-05-13   7072    77   NA   591  NA   801  550
2          1989-05-29   7088    55   62   410  2001 NA   NA
2          1989-06-14   7104    351  508  456  1618 780  421")

Note that for factorID=1, for V2, 1989-06-14 is the nearest date both 1989-05-13 and 1989-05-29 and hence 305 should be used to fill these NA rows.

How to replace NAs with the average of the nearest two values

We can use na.approx

library(zoo)
data[] <- lapply(data,  function(x) na.locf0(na.approx(x, na.rm = FALSE)))

Or with tidyverse

library(dplyr)
library(tidyr)
data %>%
  mutate_all(na.approx, na.rm = FALSE) %>%
  fill(everything(), .direction = 'updown')
#  A B C
#1 10 2 4
#2 25 2 4
#3 21 2 6
#4 17 2 4
#5  8 2 3
#6  3 2 3

data

data <- structure(list(A = c(10L, 25L, NA, 17L, 8L, 3L), 
   B = c(2L, NA, NA, NA, NA, 2L), 
   C = c(NA, 4L, 6L, 4L, 3L, NA)), class = "data.frame", row.names = c(NA, -6L))

R: replacing NA with value of closest point

Yup.

First, make your data frame with data.frame or things all get coerced to characters:

data<-data.frame(LAT=LAT,LON=LON,COLOR=COLOR)

Split the data frame up - you could probably do this in one go but this makes things a bit more obvious:

query = data[is.na(data$COLOR),]
colours = data[!is.na(data$COLOR),]
library(FNN)
neighs = get.knnx(colours[,c("LAT","LON")],query[,c("LAT","LON")],k=1)

Now insert the replacement colours directly into the data dataframe:

data[is.na(data$COLOR),"COLOR"]=colours$COLOR[neighs$nn.index]
plot(data$LON,data$LAT,col=data$COLOR,pch=19)

Note however that distance is being computed using pythagoras geometry on lat-long, which isn't true because the earth isn't flat. You might have to transform your coordinates to something else first.

Replacing NAs with latest non-NA value

You probably want to use the na.locf() function from the zoo package to carry the last observation forward to replace your NA values.

Here is the beginning of its usage example from the help page:

library(zoo)

az <- zoo(1:6)

bz <- zoo(c(2,NA,1,4,5,2))

na.locf(bz)
1 2 3 4 5 6 
2 2 1 4 5 2 

na.locf(bz, fromLast = TRUE)
1 2 3 4 5 6 
2 1 1 4 5 2 

cz <- zoo(c(NA,9,3,2,3,2))

na.locf(cz)
2 3 4 5 6 
9 3 2 3 2

Replacing missing value with mean of 2 nearest rows in R

na.approx in the zoo package does that. If there can be leading or trailing NA values and you want to:

extend the nearest non-NA values add the rule = 2 argument to na.approx or
leave those as NA add the na.rm = FALSE argument to na.approx.

See ?na.approx for further arguments. Other possibilities from the same package include na.spline (fill in with cubic spline fit), na.aggregate (mean of all non-NA values), na.locf (last value carried forward) and na.StructTS (seasonal Kalman filter).

library(zoo)

A[, list(Value = na.approx(Value))]

giving:

Replacing row values with the closest conditional values in R

Here's one way to do it.

rle will give you run length encodings, from which you can replace the negative values with NA and then using na.locf function from zoo package to carry forward (and carry backward) the nearest non negative values. Finally, inverse.rle function can create your desired vector back which we can add to our original data.frame df as newlocNumb

As for any additional condition can be used to replace back some of the original values in locNumb column into newlocNumb column

require(zoo)
pred_trip <- c(0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1)
locNumb <- c(-1, -1, -1, -1, -1, 2, 2, 2, 2, 3, 3, 0, 0, 0, 4, 4, 4, 4, -1, -1, -1, -1, -1, 0, 0, 0, 0, 0, 0, 5, 5, 5, 5)
df <- data.frame(pred_trip, locNumb)

RLE <- rle(df$locNumb)

RLE
## Run Length Encoding
##   lengths: int [1:8] 5 4 2 3 4 5 6 4
##   values : num [1:8] -1 2 3 0 4 -1 0 5

RLE$values[RLE$values < 0] <- NA

while (any(is.na(RLE$values))) {
    RLE$values <- na.locf(na.locf(RLE$values, na.rm = FALSE), fromLast = TRUE, na.rm = FALSE)
}

df$newlocNumb <- inverse.rle(RLE)

df
##    pred_trip locNumb newlocNumb
## 1          0      -1          2
## 2          0      -1          2
## 3          0      -1          2
## 4          0      -1          2
## 5          0      -1          2
## 6          1       2          2
## 7          1       2          2
## 8          1       2          2
## 9          1       2          2
## 10         0       3          3
## 11         0       3          3
## 12         0       0          0
## 13         1       0          0
## 14         1       0          0
## 15         1       4          4
## 16         0       4          4
## 17         0       4          4
## 18         0       4          4
## 19         0      -1          4
## 20         0      -1          4
## 21         0      -1          4
## 22         0      -1          4
## 23         0      -1          4
## 24         0       0          0
## 25         0       0          0
## 26         0       0          0
## 27         1       0          0
## 28         1       0          0
## 29         1       0          0
## 30         1       5          5
## 31         1       5          5
## 32         1       5          5
## 33         1       5          5

Replacing Nas in R with Nearest Value