How to Interpolate Data in R

How to interpolate data in R

To deal with minute data, I would recommend using package xts and the function na.approx from package zoo. In a nutshell, you need to create an empty vector of minute data that you will merge with your original data. Then, you can use na.approx to approximate the missing values.

#Intial data, not by minute    
datetime <- Sys.time()
date_time_init <- Sys.time()+c(0,3,5,8)*60
df1 <- xts(c(1:4),date_time_init)
> df1
[,1]
2017-06-02 03:10:20 1
2017-06-02 03:13:20 2
2017-06-02 03:15:20 3
2017-06-02 03:18:20 4

#Create time sequence by minute
date_time_complete <- seq.POSIXt(from=min(date_time_init),
to=max(date_time_init),by="min")

#Merge initial data with new time sequence
df2 <- merge(df1,xts(,date_time_complete))
df1
2017-06-02 03:10:20 1
2017-06-02 03:11:20 NA
2017-06-02 03:12:20 NA
2017-06-02 03:13:20 2
2017-06-02 03:14:20 NA
2017-06-02 03:15:20 3
2017-06-02 03:16:20 NA
2017-06-02 03:17:20 NA
2017-06-02 03:18:20 4

na.approx(df2)
df1
2017-06-02 03:07:24 1.000000
2017-06-02 03:08:24 1.333333
2017-06-02 03:09:24 1.666667
2017-06-02 03:10:24 2.000000
2017-06-02 03:11:24 2.500000
2017-06-02 03:12:24 3.000000
2017-06-02 03:13:24 3.333333
2017-06-02 03:14:24 3.666667
2017-06-02 03:15:24 4.000000

How to interpolate data with R?

We can use complete to expand the data for each 'ent' and the 'year' range, then with na.approx interpolate the missing values in 'pobtot'

library(dplyr)
library(tidyr)
z %>%
complete(ent, year = 1995:2008) %>%
mutate(pobtot = zoo::na.approx(pobtot, na.rm = FALSE))

How can I interpolate each row of a data frame in R?

ting = "wavelength       plt1       plt2
404.502 0.01451395 0.01394186
411.006 0.01538372 0.01455814
989.878 0.25398372 0.25955116
996.382 0.25714419 0.25986279"

data <- read.table(text = ting, header = TRUE)

interpolated1 <- apply(data, 2, function(x) approx(x, y=NULL, method = "linear", n = 371)$y)

dim(interpolated1)
# [1] 371 3

head(interpolated1)
# wavelength plt1 plt2
# [1,] 404.5020 0.01451395 0.01394186
# [2,] 404.5547 0.01452100 0.01394686
# [3,] 404.6075 0.01452805 0.01395185
# [4,] 404.6602 0.01453511 0.01395685
# [5,] 404.7129 0.01454216 0.01396185
# [6,] 404.7657 0.01454921 0.01396684

Interpolation of time series data with specific output time

Try this (assuming your time index is POSIXct):

library(zoo)
st <- as.POSIXct("2012-01-21 18:45")
g <- seq(st, end(z), by = "15 min") # grid
na.approx(z, xout = g)

See ?na.approx.zoo for more info.

Note: Since the question did not provide the data in reproducible form we do so here:

Lines <- "Id date Time humid humtemp prtemp press t1
1 2012-01-21 18:41:50 47.7 14.12 13.870 1005.70 -0.05277778
1 2012-01-21 18:46:43 44.5 15.37 15.100 1005.20 0.02861111
1 2012-01-21 18:51:35 43.2 15.88 15.576 1005.10 0.10972222
1 2012-01-21 18:56:28 42.5 16.17 15.833 1004.90 0.19111111
1 2012-01-21 19:01:21 42.2 16.31 15.986 1004.80 0.27250000
1 2012-01-21 19:06:14 41.8 16.47 16.118 1004.60 0.35388889
1 2012-01-21 19:11:07 41.6 16.51 16.177 1004.60 0.43527778"

library(zoo)
z <- read.zoo(text = Lines, header = TRUE, index = 2:3, tz = "")
st <- as.POSIXct("2012-01-21 18:45")
g <- seq(st, end(z), by = "15 min") # grid
na.approx(z, xout = g)

giving:

                    Id    humid  humtemp   prtemp    press            t1
2012-01-21 18:45:00 1 45.62491 14.93058 14.66761 1005.376 -1.501706e-09
2012-01-21 19:00:00 1 42.28294 16.27130 15.94370 1004.828 2.500000e-01

Interpolate to obtaining certain value in R

I believe what you want is this. At the moment there's nothing to interpolate, you need a NA column first. You could append one after the second position.

(d <- as.data.frame(append(d, list(X2=NA), 2)))
# Date X1 X2 X3 X4 X5 X6
# 1 2020-02-10 0.04919382 NA 0.04962555 0.04579872 0.03546890 0.048592
# 2 2020-05-20 0.04909930 NA 0.04957330 0.04587720 0.04741000 0.052167
# 3 2020-08-12 0.04909930 NA 0.04957330 0.04525272 0.03554400 0.045489
# 4 2020-10-18 0.04915135 NA 0.04957330 0.04782200 0.03485484 0.024452

Now you want to apply the approx function row-wise, i.e. with MARGIN=1. The logic is, that you feed it with a sequence of length of the values to interpolate, i.e. of the x of each apply iteration which is seq(x) as well as the values containing the NAs which is x itself. From the output you want the y, and, because it is the whole (transposed) matrix, just row [2,].

d$X2 <- apply(d[-1], MARGIN=1, function(x) approx(seq(x), x, seq(x))$y)[2,]
d
# Date X1 X2 X3 X4 X5 X6
# 1 2020-02-10 0.04919382 0.04940968 0.04962555 0.04579872 0.03546890 0.048592
# 2 2020-05-20 0.04909930 0.04933630 0.04957330 0.04587720 0.04741000 0.052167
# 3 2020-08-12 0.04909930 0.04933630 0.04957330 0.04525272 0.03554400 0.045489
# 4 2020-10-18 0.04915135 0.04936232 0.04957330 0.04782200 0.03485484 0.024452

Data:

d <- structure(list(Date = structure(c(18302, 18402, 18486, 18553), class = "Date"), 
X1 = c(0.04919382, 0.0490993, 0.0490993, 0.04915135), X3 = c(0.04962555,
0.0495733, 0.0495733, 0.0495733), X4 = c(0.04579872, 0.0458772,
0.04525272, 0.047822), X5 = c(0.0354689, 0.04741, 0.035544,
0.03485484), X6 = c(0.048592, 0.052167, 0.045489, 0.024452
)), row.names = c(NA, -4L), class = "data.frame")

Linear interpolation of Panel Data in R over multiple columns

I'd write a function to interpolate an arbitrary variable within country group, and then map() it over all the variables and join them back together.

library(tidyverse)
data("population")

# create some data to interpolate
population_5 <- population %>%
filter(year %% 5 == 0) %>%
mutate(female_pop = population / 2,
male_pop = population / 2)

interpolate_func <- function(variable, data) {
data %>%
group_by(country) %>%
# can't interpolate if only one year
filter(n() >= 2) %>%
group_modify(~as_tibble(approx(.x$year, .x[[variable]],
xout = min(.x$year):max(.x$year)))) %>%
set_names("country", "year", paste0(variable, "_interpolated")) %>%
ungroup()
}

vars_to_interpolate <- names(select(population_5, -country, -year))

map(vars_to_interpolate, interpolate_func,
data = population_5) %>%
reduce(full_join, by = c("country", "year"))

#> # A tibble: 3,395 × 5
#> country year population_interpolated female_pop_interp… male_pop_interp…
#> <chr> <int> <dbl> <dbl> <dbl>
#> 1 Afghanistan 1995 17586073 8793036. 8793036.
#> 2 Afghanistan 1996 18187930. 9093965. 9093965.
#> 3 Afghanistan 1997 18789788. 9394894. 9394894.
#> 4 Afghanistan 1998 19391645. 9695823. 9695823.
#> 5 Afghanistan 1999 19993503. 9996751. 9996751.
#> 6 Afghanistan 2000 20595360 10297680 10297680
#> 7 Afghanistan 2001 21448459 10724230. 10724230.
#> 8 Afghanistan 2002 22301558 11150779 11150779
#> 9 Afghanistan 2003 23154657 11577328. 11577328.
#> 10 Afghanistan 2004 24007756 12003878 12003878
#> # … with 3,385 more rows

Created on 2022-06-01 by the reprex package (v2.0.1)

Interpolation of panel data based on a flag column in r

You can replace c values where flag is TRUE with NA and then interpolate values with na.approx from zoo.

library(dplyr)

df %>% mutate(c = zoo::na.approx(replace(c, flag, NA)))

# year id c flag
#1 2012 1 11 FALSE
#2 2013 1 12 TRUE
#3 2014 1 13 FALSE
#4 2012 2 16 FALSE
#5 2013 2 15 FALSE
#6 2014 2 15 FALSE

Interpolating data only between first and last observation in R

Try using the na.approx function in the "zoo" package.

library("zoo")
data %>% group_by(country) %>% mutate(int = na.approx(value, na.rm=FALSE))

Hope this is what you are looking for, This would keep the NA in country 1 as NA.

Linear interpolation in R for columns

You could use na.approx from the zoo package. The function according to document:

Generic functions for replacing each NA with interpolated values.

Code:

df <- data.frame(rates = c(0.66, NA, NA, NA, 0.77, 0.75, 0.79, NA, NA, 0.79))

library(zoo)
df$new_rates <- na.approx(df$rates)
df
#> rates new_rates
#> 1 0.66 0.6600
#> 2 NA 0.6875
#> 3 NA 0.7150
#> 4 NA 0.7425
#> 5 0.77 0.7700
#> 6 0.75 0.7500
#> 7 0.79 0.7900
#> 8 NA 0.7900
#> 9 NA 0.7900
#> 10 0.79 0.7900

Created on 2022-07-05 by the reprex package (v2.0.1)



Related Topics



Leave a reply



Submit