R: Interpolation of Nas by Group

R: Interpolation of NAs by group

Use data.frame, rather than cbind to create your data. cbind returns a matrix, but you need a data frame for dplyr. Then use na.approx inside mutate. I've commented out group_by, as you haven't provided the grouping variable in your data, but the approach should work once you've added the grouping variable to the data frame.

df <- data.frame(time=c(1,2,3,4,5,6,7,1,2,3),
Individuals=c(1,1,1,1,1,1,1,2,2,2),
Value=c(NA, 2, 3, NA, 5, NA, 7, 8, NA, 10))

library(dplyr)
library(zoo)

df %>%
group_by(Individuals) %>%
mutate(ValueInterp = na.approx(Value, na.rm=FALSE))
   time Individuals Value ValueInterp
1 1 1 NA NA
2 2 1 2 2
3 3 1 3 3
4 4 1 NA 4
5 5 1 5 5
6 6 1 NA 6
7 7 1 7 7
8 1 2 8 8
9 2 2 NA 9
10 3 2 10 10

Update: To interpolate multiple columns, we can use mutate_at. Here's an example with two value columns. We use mutate_at to run na.approx on all columns that include "Value" in the column name. list(interp=na.approx) tells mutate_at to generate new column names by running na.approx and adding interp as a suffix to generate the new column names:

df <- data.frame(time=c(1,2,3,4,5,6,7,1,2,3),
Individuals=c(1,1,1,1,1,1,1,2,2,2),
Value1=c(NA, 2, 3, NA, 5, NA, 7, 8, NA, 10),
Value2=c(NA, 2, 3, NA, 5, NA, 7, 8, NA, 10)*2)

df %>%
group_by(Individuals) %>%
mutate_at(vars(matches("Value")), list(interp=na.approx), na.rm=FALSE)
    time Individuals Value1 Value2 Value1_interp Value2_interp
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 1 NA NA NA NA
2 2 1 2 4 2 4
3 3 1 3 6 3 6
4 4 1 NA NA 4 8
5 5 1 5 10 5 10
6 6 1 NA NA 6 12
7 7 1 7 14 7 14
8 1 2 8 16 8 16
9 2 2 NA NA 9 18
10 3 2 10 20 10 20

If you don't want to preserve the original, uninterpolated columns, you can do:

df %>%
group_by(Individuals) %>%
mutate_at(vars(matches("Value")), na.approx, na.rm=FALSE)

Replace NAs by Interpolation in groups

This one-liner will interpolate the NAs by group and for NAs on the ends of a group will extend the nearest non-NA to it giving it the same value, i.e. it does linear interpolation and constant extrapolation, which is not exactly what was asked for but may be close enough. Note that this also implies that if there is only one non-NA then all NAs are set to it.

library(zoo)
transform(DF, newCol = ave(Value, Group, FUN = function(x) na.approx(x, rule = 2)))

giving:

   Group Value newCol
1 ALB NA 10
2 ALB 10 10
3 ALB NA 11
4 ALB 12 12
5 ARE NA 2
6 ARE NA 2
7 ARE 2 2
8 ARE NA 2
9 ARE NA 2
10 ARG 4 4
11 ARG NA 5
12 ARG 6 6

Note

DF <- structure(list(Group = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), Value = c(NA, 
10L, NA, 12L, 4L, NA, NA, 7L)), class = "data.frame", row.names = c(NA,
-8L))

DF <-
structure(list(Group = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L, 2L, 3L, 3L, 3L), .Label = c("ALB", "ARE", "ARG"), class = "factor"),
Value = c(NA, 10L, NA, 12L, NA, NA, 2L, NA, NA, 4L, NA, 6L
)), class = "data.frame", row.names = c(NA, -12L))

Interpolating NA's by group using dplyr on multiple columns

You can re-write your code using mutate_at so that conversion can be done in one go as:

library(dplyr)
library(zoo)

df %>%
group_by(iso) %>%
mutate_at(vars(starts_with("var")),
funs(na.locf(na.locf(na.approx(., na.rm = FALSE, rule = 1),na.rm=FALSE),
fromLast=TRUE)))

# # A tibble: 6 x 5
# # Groups: iso [2]
# iso year var1 var2 var3
# <chr> <int> <dbl> <dbl> <dbl>
# 1 XXX 2005 165 29.0 2151
# 2 XXX 2006 160 21.0 2139
# 3 XXX 2007 172 15.0 2890
# 4 XXX 2008 184 9.00 3640
# 5 XXX 2009 184 9.00 3640
# 6 YYY 2005 206 461 8049
#

Data:

df <- read.table(text=
"iso year var1 var2 var3
1 XXX 2005 165 29 2151
2 XXX 2006 160 21 2139
3 XXX 2007 NA NA NA
4 XXX 2008 184 9 3640
5 XXX 2009 NA NA NA
6 YYY 2005 206 461 8049",
header = TRUE, stringsAsFactors = FALSE)

Skip na_interpolation on dplyr group/variable pairs with full NAs in R

Found a solution:

for only interpolation:

library(TSimpute)
library(dplyr)
library(zoo)

DF <- DF %>%
group_by(Country) %>%
mutate_at(vars(acnt_class, wages), funs(if(sum(!is.na(.))<2) {.} else{replace(na_interpolation(., option = "stine"), is.na(na.approx(., na.rm=FALSE)), NA)}))

Fill in groups of NAs by interpolation between known values in R

This function from the zoo package seems to do the job:

zoo::na.fill(mytemp, fill = "extend")

[1] 12.00000 13.00000 13.00000 15.00000 16.00000 15.00000 14.54545
[8] 14.09091 13.63636 13.18182 12.72727 12.27273 11.81818 11.36364
[15] 10.90909 10.45455 10.00000 10.00000 9.00000 9.00000

Edit: this question and it's answer deal with a more general situation where the time points aren't equidistant, using zoo::na.approx. One difference is that na.approx does not extend to the leading and trailing NAs, while na.fill does (when fill = "extend").

R: Interpolation multiple columns by group using target values column

We can use complete to include new rows where MEAS = MEAS_TARGET and interpolate INT and COL columns with zoo::na.approx.

library(dplyr)
library(tidyr)

df %>%
group_by(PART, LIMIT) %>%
complete(MEAS = unique(c(MEAS, MEAS_TARGET))) %>%
mutate(across(c(INT, COL), zoo::na.approx)) %>%
fill(MEAS_TARGET) %>%
ungroup

Multiple columns with NAs, impute NAs by grouped linear interpolation

We can do a group by interpolation

library(dplyr)
library(forecast)
df1 %>%
group_by(state) %>%
mutate_at(vars(-group_cols()), list(interp= ~ na.interp(.)))

If the columns are not all numeric, use mutate_if(is.numeric, list(interp= ~ na.interp(.)))

Interpolate and extrapolated by group using na.spline() and case_when()

This looks like a bug in the case of na.spline. Use this to work around it.

In the case of na.approx we use na.fill to extend the data to NA's at the beginning and end. The second argument of na.fill is a 3-vector which gives the replacement rule for the left end, internal NAs and right end. It recycles so we can omit the right end.

na_spline <- function(x) if (all(is.na(x))) NA else na.spline(x, na.rm = FALSE)
na_approx <- function(x) na.fill(na.approx(x, na.rm = FALSE), c("extend", NA))

df %>%
group_by(a) %>%
mutate(spline = na_spline(b), approx = na_approx(b)) %>%
ungroup

giving:

# A tibble: 10 x 4
a b spline approx
<chr> <dbl> <dbl> <dbl>
1 group1 1 1 1
2 group1 2 2 2
3 group1 NA 3 3
4 group1 4 4 4
5 group2 1 1 1
6 group2 NA 1 1
7 group2 NA 1 1
8 group2 NA 1 1
9 group3 NA NA NA
10 group3 NA NA NA

Interpolate NA values when column ends on NA

Add na.rm=F to remove the error message. Add rule=2 to get the value from the last non-NA value.

df %>%
mutate(Diam_intpl = na.approx(Diam_av, na.rm=F),
Diam_intpl2 = na.approx(Diam_av, na.rm=F, rule=2))

Diam_av Diam_intpl Diam_intpl2
1 12.30 12.30 12.30
2 13.00 13.00 13.00
3 15.50 15.50 15.50
4 NA 15.14 15.14
5 NA 14.78 14.78
6 NA 14.42 14.42
7 NA 14.06 14.06
8 13.70 13.70 13.70
9 NA 12.77 12.77
10 NA 11.84 11.84
11 NA 10.91 10.91
12 9.98 9.98 9.98
13 4.00 4.00 4.00
14 0.00 0.00 0.00
15 8.76 8.76 8.76
16 NA NA 8.76
17 NA NA 8.76
18 NA NA 8.76


Related Topics



Leave a reply



Submit