R: Interpolation of Nas by Group

R: Interpolation of NAs by group

Use data.frame, rather than cbind to create your data. cbind returns a matrix, but you need a data frame for dplyr. Then use na.approx inside mutate. I've commented out group_by, as you haven't provided the grouping variable in your data, but the approach should work once you've added the grouping variable to the data frame.

df <- data.frame(time=c(1,2,3,4,5,6,7,1,2,3),
            Individuals=c(1,1,1,1,1,1,1,2,2,2),
            Value=c(NA, 2, 3, NA, 5, NA, 7, 8, NA, 10))

library(dplyr)
library(zoo)

df %>%
  group_by(Individuals) %>%
  mutate(ValueInterp = na.approx(Value, na.rm=FALSE))

   time Individuals Value ValueInterp
1     1           1    NA          NA
2     2           1     2           2
3     3           1     3           3
4     4           1    NA           4
5     5           1     5           5
6     6           1    NA           6
7     7           1     7           7
8     1           2     8           8
9     2           2    NA           9
10    3           2    10          10

Update: To interpolate multiple columns, we can use mutate_at. Here's an example with two value columns. We use mutate_at to run na.approx on all columns that include "Value" in the column name. list(interp=na.approx) tells mutate_at to generate new column names by running na.approx and adding interp as a suffix to generate the new column names:

df <- data.frame(time=c(1,2,3,4,5,6,7,1,2,3),
                 Individuals=c(1,1,1,1,1,1,1,2,2,2),
                 Value1=c(NA, 2, 3, NA, 5, NA, 7, 8, NA, 10),
                 Value2=c(NA, 2, 3, NA, 5, NA, 7, 8, NA, 10)*2)

df %>%
  group_by(Individuals) %>%
  mutate_at(vars(matches("Value")), list(interp=na.approx), na.rm=FALSE)

    time Individuals Value1 Value2 Value1_interp Value2_interp
   <dbl>       <dbl>  <dbl>  <dbl>         <dbl>         <dbl>
 1     1           1     NA     NA            NA            NA
 2     2           1      2      4             2             4
 3     3           1      3      6             3             6
 4     4           1     NA     NA             4             8
 5     5           1      5     10             5            10
 6     6           1     NA     NA             6            12
 7     7           1      7     14             7            14
 8     1           2      8     16             8            16
 9     2           2     NA     NA             9            18
10     3           2     10     20            10            20

If you don't want to preserve the original, uninterpolated columns, you can do:

df %>%
  group_by(Individuals) %>%
  mutate_at(vars(matches("Value")), na.approx, na.rm=FALSE)

Replace NAs by Interpolation in groups

This one-liner will interpolate the NAs by group and for NAs on the ends of a group will extend the nearest non-NA to it giving it the same value, i.e. it does linear interpolation and constant extrapolation, which is not exactly what was asked for but may be close enough. Note that this also implies that if there is only one non-NA then all NAs are set to it.

library(zoo)
transform(DF, newCol = ave(Value, Group, FUN = function(x) na.approx(x, rule = 2)))

giving:

   Group Value newCol
1    ALB    NA     10
2    ALB    10     10
3    ALB    NA     11
4    ALB    12     12
5    ARE    NA      2
6    ARE    NA      2
7    ARE     2      2
8    ARE    NA      2
9    ARE    NA      2
10   ARG     4      4
11   ARG    NA      5
12   ARG     6      6

Note

DF <- structure(list(Group = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), Value = c(NA, 
10L, NA, 12L, 4L, NA, NA, 7L)), class = "data.frame", row.names = c(NA, 
-8L))

DF <- 
  structure(list(Group = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 
  2L, 2L, 3L, 3L, 3L), .Label = c("ALB", "ARE", "ARG"), class = "factor"), 
  Value = c(NA, 10L, NA, 12L, NA, NA, 2L, NA, NA, 4L, NA, 6L
  )), class = "data.frame", row.names = c(NA, -12L))

Interpolating NA's by group using dplyr on multiple columns

You can re-write your code using mutate_at so that conversion can be done in one go as:

library(dplyr)
library(zoo)

df %>% 
  group_by(iso) %>%
  mutate_at(vars(starts_with("var")), 
            funs(na.locf(na.locf(na.approx(., na.rm = FALSE, rule = 1),na.rm=FALSE),
                                                              fromLast=TRUE)))

# # A tibble: 6 x 5
# # Groups: iso [2]
# iso    year  var1   var2  var3
# <chr> <int> <dbl>  <dbl> <dbl>
# 1 XXX    2005   165  29.0   2151
# 2 XXX    2006   160  21.0   2139
# 3 XXX    2007   172  15.0   2890
# 4 XXX    2008   184   9.00  3640
# 5 XXX    2009   184   9.00  3640
# 6 YYY    2005   206 461     8049
#

Data:

df <- read.table(text=
"iso year var1 var2 var3
1 XXX 2005  165   29 2151
2 XXX 2006  160   21 2139
3 XXX 2007   NA   NA   NA
4 XXX 2008  184    9 3640
5 XXX 2009   NA   NA   NA
6 YYY 2005  206  461 8049",
header = TRUE, stringsAsFactors = FALSE)

Skip na_interpolation on dplyr group/variable pairs with full NAs in R

Found a solution:

for only interpolation:

library(TSimpute)
library(dplyr)
library(zoo)

DF <- DF %>% 
  group_by(Country) %>% 
  mutate_at(vars(acnt_class, wages), funs(if(sum(!is.na(.))<2) {.} else{replace(na_interpolation(., option = "stine"), is.na(na.approx(., na.rm=FALSE)), NA)}))

Fill in groups of NAs by interpolation between known values in R

This function from the zoo package seems to do the job:

zoo::na.fill(mytemp, fill = "extend")

[1] 12.00000 13.00000 13.00000 15.00000 16.00000 15.00000 14.54545
[8] 14.09091 13.63636 13.18182 12.72727 12.27273 11.81818 11.36364
[15] 10.90909 10.45455 10.00000 10.00000  9.00000  9.00000

Edit: this question and it's answer deal with a more general situation where the time points aren't equidistant, using zoo::na.approx. One difference is that na.approx does not extend to the leading and trailing NAs, while na.fill does (when fill = "extend").

R: Interpolation multiple columns by group using target values column

We can use complete to include new rows where MEAS = MEAS_TARGET and interpolate INT and COL columns with zoo::na.approx.

library(dplyr)
library(tidyr)

df %>%
  group_by(PART, LIMIT) %>%
  complete(MEAS = unique(c(MEAS, MEAS_TARGET))) %>%
  mutate(across(c(INT, COL), zoo::na.approx)) %>%
  fill(MEAS_TARGET) %>%
  ungroup

Multiple columns with NAs, impute NAs by grouped linear interpolation

We can do a group by interpolation

library(dplyr)
library(forecast)
df1 %>%
     group_by(state) %>%
     mutate_at(vars(-group_cols()), list(interp= ~ na.interp(.)))

If the columns are not all numeric, use mutate_if(is.numeric, list(interp= ~ na.interp(.)))

Interpolate and extrapolated by group using na.spline() and case_when()

This looks like a bug in the case of na.spline. Use this to work around it.

In the case of na.approx we use na.fill to extend the data to NA's at the beginning and end. The second argument of na.fill is a 3-vector which gives the replacement rule for the left end, internal NAs and right end. It recycles so we can omit the right end.

na_spline <- function(x) if (all(is.na(x))) NA else na.spline(x, na.rm = FALSE)
na_approx <- function(x) na.fill(na.approx(x, na.rm = FALSE), c("extend", NA))

df %>%
  group_by(a) %>%
  mutate(spline = na_spline(b), approx = na_approx(b)) %>%
  ungroup

giving:

# A tibble: 10 x 4
   a          b spline approx
   <chr>  <dbl>  <dbl>  <dbl>
 1 group1     1      1      1
 2 group1     2      2      2
 3 group1    NA      3      3
 4 group1     4      4      4
 5 group2     1      1      1
 6 group2    NA      1      1
 7 group2    NA      1      1
 8 group2    NA      1      1
 9 group3    NA     NA     NA
10 group3    NA     NA     NA

Interpolate NA values when column ends on NA

Add na.rm=F to remove the error message. Add rule=2 to get the value from the last non-NA value.

df %>%
  mutate(Diam_intpl = na.approx(Diam_av, na.rm=F),
         Diam_intpl2 = na.approx(Diam_av, na.rm=F, rule=2))

   Diam_av Diam_intpl Diam_intpl2
1    12.30      12.30       12.30
2    13.00      13.00       13.00
3    15.50      15.50       15.50
4       NA      15.14       15.14
5       NA      14.78       14.78
6       NA      14.42       14.42
7       NA      14.06       14.06
8    13.70      13.70       13.70
9       NA      12.77       12.77
10      NA      11.84       11.84
11      NA      10.91       10.91
12    9.98       9.98        9.98
13    4.00       4.00        4.00
14    0.00       0.00        0.00
15    8.76       8.76        8.76
16      NA         NA        8.76
17      NA         NA        8.76
18      NA         NA        8.76

R: Interpolation of Nas by Group