R: Interpolation of NAs by group
Use data.frame
, rather than cbind
to create your data. cbind
returns a matrix, but you need a data frame for dplyr
. Then use na.approx
inside mutate
. I've commented out group_by
, as you haven't provided the grouping variable in your data, but the approach should work once you've added the grouping variable to the data frame.
df <- data.frame(time=c(1,2,3,4,5,6,7,1,2,3),
Individuals=c(1,1,1,1,1,1,1,2,2,2),
Value=c(NA, 2, 3, NA, 5, NA, 7, 8, NA, 10))
library(dplyr)
library(zoo)
df %>%
group_by(Individuals) %>%
mutate(ValueInterp = na.approx(Value, na.rm=FALSE))
time Individuals Value ValueInterp
1 1 1 NA NA
2 2 1 2 2
3 3 1 3 3
4 4 1 NA 4
5 5 1 5 5
6 6 1 NA 6
7 7 1 7 7
8 1 2 8 8
9 2 2 NA 9
10 3 2 10 10
Update: To interpolate multiple columns, we can use mutate_at
. Here's an example with two value columns. We use mutate_at
to run na.approx
on all columns that include "Value"
in the column name. list(interp=na.approx)
tells mutate_at
to generate new column names by running na.approx
and adding interp
as a suffix to generate the new column names:
df <- data.frame(time=c(1,2,3,4,5,6,7,1,2,3),
Individuals=c(1,1,1,1,1,1,1,2,2,2),
Value1=c(NA, 2, 3, NA, 5, NA, 7, 8, NA, 10),
Value2=c(NA, 2, 3, NA, 5, NA, 7, 8, NA, 10)*2)
df %>%
group_by(Individuals) %>%
mutate_at(vars(matches("Value")), list(interp=na.approx), na.rm=FALSE)
time Individuals Value1 Value2 Value1_interp Value2_interp
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 1 NA NA NA NA
2 2 1 2 4 2 4
3 3 1 3 6 3 6
4 4 1 NA NA 4 8
5 5 1 5 10 5 10
6 6 1 NA NA 6 12
7 7 1 7 14 7 14
8 1 2 8 16 8 16
9 2 2 NA NA 9 18
10 3 2 10 20 10 20
If you don't want to preserve the original, uninterpolated columns, you can do:
df %>%
group_by(Individuals) %>%
mutate_at(vars(matches("Value")), na.approx, na.rm=FALSE)
Replace NAs by Interpolation in groups
This one-liner will interpolate the NAs by group and for NAs on the ends of a group will extend the nearest non-NA to it giving it the same value, i.e. it does linear interpolation and constant extrapolation, which is not exactly what was asked for but may be close enough. Note that this also implies that if there is only one non-NA then all NAs are set to it.
library(zoo)
transform(DF, newCol = ave(Value, Group, FUN = function(x) na.approx(x, rule = 2)))
giving:
Group Value newCol
1 ALB NA 10
2 ALB 10 10
3 ALB NA 11
4 ALB 12 12
5 ARE NA 2
6 ARE NA 2
7 ARE 2 2
8 ARE NA 2
9 ARE NA 2
10 ARG 4 4
11 ARG NA 5
12 ARG 6 6
Note
DF <- structure(list(Group = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), Value = c(NA,
10L, NA, 12L, 4L, NA, NA, 7L)), class = "data.frame", row.names = c(NA,
-8L))
DF <-
structure(list(Group = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L, 2L, 3L, 3L, 3L), .Label = c("ALB", "ARE", "ARG"), class = "factor"),
Value = c(NA, 10L, NA, 12L, NA, NA, 2L, NA, NA, 4L, NA, 6L
)), class = "data.frame", row.names = c(NA, -12L))
Interpolating NA's by group using dplyr on multiple columns
You can re-write your code using mutate_at
so that conversion can be done in one go as:
library(dplyr)
library(zoo)
df %>%
group_by(iso) %>%
mutate_at(vars(starts_with("var")),
funs(na.locf(na.locf(na.approx(., na.rm = FALSE, rule = 1),na.rm=FALSE),
fromLast=TRUE)))
# # A tibble: 6 x 5
# # Groups: iso [2]
# iso year var1 var2 var3
# <chr> <int> <dbl> <dbl> <dbl>
# 1 XXX 2005 165 29.0 2151
# 2 XXX 2006 160 21.0 2139
# 3 XXX 2007 172 15.0 2890
# 4 XXX 2008 184 9.00 3640
# 5 XXX 2009 184 9.00 3640
# 6 YYY 2005 206 461 8049
#
Data:
df <- read.table(text=
"iso year var1 var2 var3
1 XXX 2005 165 29 2151
2 XXX 2006 160 21 2139
3 XXX 2007 NA NA NA
4 XXX 2008 184 9 3640
5 XXX 2009 NA NA NA
6 YYY 2005 206 461 8049",
header = TRUE, stringsAsFactors = FALSE)
Skip na_interpolation on dplyr group/variable pairs with full NAs in R
Found a solution:
for only interpolation:
library(TSimpute)
library(dplyr)
library(zoo)
DF <- DF %>%
group_by(Country) %>%
mutate_at(vars(acnt_class, wages), funs(if(sum(!is.na(.))<2) {.} else{replace(na_interpolation(., option = "stine"), is.na(na.approx(., na.rm=FALSE)), NA)}))
Fill in groups of NAs by interpolation between known values in R
This function from the zoo package seems to do the job:
zoo::na.fill(mytemp, fill = "extend")
[1] 12.00000 13.00000 13.00000 15.00000 16.00000 15.00000 14.54545
[8] 14.09091 13.63636 13.18182 12.72727 12.27273 11.81818 11.36364
[15] 10.90909 10.45455 10.00000 10.00000 9.00000 9.00000
Edit: this question and it's answer deal with a more general situation where the time points aren't equidistant, using zoo::na.approx
. One difference is that na.approx
does not extend to the leading and trailing NAs, while na.fill
does (when fill = "extend"
).
R: Interpolation multiple columns by group using target values column
We can use complete
to include new rows where MEAS = MEAS_TARGET
and interpolate INT
and COL
columns with zoo::na.approx
.
library(dplyr)
library(tidyr)
df %>%
group_by(PART, LIMIT) %>%
complete(MEAS = unique(c(MEAS, MEAS_TARGET))) %>%
mutate(across(c(INT, COL), zoo::na.approx)) %>%
fill(MEAS_TARGET) %>%
ungroup
Multiple columns with NAs, impute NAs by grouped linear interpolation
We can do a group by interpolation
library(dplyr)
library(forecast)
df1 %>%
group_by(state) %>%
mutate_at(vars(-group_cols()), list(interp= ~ na.interp(.)))
If the columns are not all numeric, use mutate_if(is.numeric, list(interp= ~ na.interp(.)))
Interpolate and extrapolated by group using na.spline() and case_when()
This looks like a bug in the case of na.spline. Use this to work around it.
In the case of na.approx we use na.fill to extend the data to NA's at the beginning and end. The second argument of na.fill is a 3-vector which gives the replacement rule for the left end, internal NAs and right end. It recycles so we can omit the right end.
na_spline <- function(x) if (all(is.na(x))) NA else na.spline(x, na.rm = FALSE)
na_approx <- function(x) na.fill(na.approx(x, na.rm = FALSE), c("extend", NA))
df %>%
group_by(a) %>%
mutate(spline = na_spline(b), approx = na_approx(b)) %>%
ungroup
giving:
# A tibble: 10 x 4
a b spline approx
<chr> <dbl> <dbl> <dbl>
1 group1 1 1 1
2 group1 2 2 2
3 group1 NA 3 3
4 group1 4 4 4
5 group2 1 1 1
6 group2 NA 1 1
7 group2 NA 1 1
8 group2 NA 1 1
9 group3 NA NA NA
10 group3 NA NA NA
Interpolate NA values when column ends on NA
Add na.rm=F
to remove the error message. Add rule=2
to get the value from the last non-NA value.
df %>%
mutate(Diam_intpl = na.approx(Diam_av, na.rm=F),
Diam_intpl2 = na.approx(Diam_av, na.rm=F, rule=2))
Diam_av Diam_intpl Diam_intpl2
1 12.30 12.30 12.30
2 13.00 13.00 13.00
3 15.50 15.50 15.50
4 NA 15.14 15.14
5 NA 14.78 14.78
6 NA 14.42 14.42
7 NA 14.06 14.06
8 13.70 13.70 13.70
9 NA 12.77 12.77
10 NA 11.84 11.84
11 NA 10.91 10.91
12 9.98 9.98 9.98
13 4.00 4.00 4.00
14 0.00 0.00 0.00
15 8.76 8.76 8.76
16 NA NA 8.76
17 NA NA 8.76
18 NA NA 8.76
Related Topics
How to Merge Two Nodes into a Single Node Using Igraph
Combining More Than 2 Columns by Removing Na's in R
How to Remove Columns from a Data.Frame by Data Type
Converting to Date in a Character Column That Contains Two Date Formats
Unexpected Symbol Error in Parse(Text = Str) with Hyphen After a Digit
Highlight a Line in Ggplot with Multiple Lines
Likert Plot Showing Percentage Values
How to Clear an Na Flag for a Posix Value
Fit Many Formulae at Once, Faster Options Than Lapply
Replace Na with Previous and Next Rows Mean in R
Print Tibble with Column Breaks as in V1.3.0
Dygraph in R Multiple Plots at Once
Accessing Y Columns with Duplicated Names in J of X[Y, J] Merges
Write Different Data Frame in One .CSV File with R
Why Are the Colors Wrong on This Ggplot