How to Interpolate Data in R

How to interpolate data in R

To deal with minute data, I would recommend using package xts and the function na.approx from package zoo. In a nutshell, you need to create an empty vector of minute data that you will merge with your original data. Then, you can use na.approx to approximate the missing values.

#Intial data, not by minute    
datetime <- Sys.time()
date_time_init <- Sys.time()+c(0,3,5,8)*60
df1 <- xts(c(1:4),date_time_init)
> df1
                    [,1]
2017-06-02 03:10:20    1
2017-06-02 03:13:20    2
2017-06-02 03:15:20    3
2017-06-02 03:18:20    4

#Create time sequence by minute
date_time_complete <- seq.POSIXt(from=min(date_time_init),
                                 to=max(date_time_init),by="min") 

#Merge initial data with new time sequence
df2 <- merge(df1,xts(,date_time_complete))
                    df1
2017-06-02 03:10:20   1
2017-06-02 03:11:20  NA
2017-06-02 03:12:20  NA
2017-06-02 03:13:20   2
2017-06-02 03:14:20  NA
2017-06-02 03:15:20   3
2017-06-02 03:16:20  NA
2017-06-02 03:17:20  NA
2017-06-02 03:18:20   4

na.approx(df2)
                         df1
2017-06-02 03:07:24 1.000000
2017-06-02 03:08:24 1.333333
2017-06-02 03:09:24 1.666667
2017-06-02 03:10:24 2.000000
2017-06-02 03:11:24 2.500000
2017-06-02 03:12:24 3.000000
2017-06-02 03:13:24 3.333333
2017-06-02 03:14:24 3.666667
2017-06-02 03:15:24 4.000000

How to interpolate data with R?

We can use complete to expand the data for each 'ent' and the 'year' range, then with na.approx interpolate the missing values in 'pobtot'

library(dplyr)
library(tidyr)
z %>% 
   complete(ent, year = 1995:2008) %>% 
   mutate(pobtot = zoo::na.approx(pobtot, na.rm = FALSE))

How can I interpolate each row of a data frame in R?

ting = "wavelength       plt1       plt2
404.502 0.01451395 0.01394186
411.006 0.01538372 0.01455814
989.878 0.25398372 0.25955116
996.382 0.25714419 0.25986279"

data <- read.table(text = ting, header = TRUE)

interpolated1 <- apply(data, 2, function(x) approx(x, y=NULL, method = "linear", n = 371)$y)

dim(interpolated1)
# [1] 371   3

head(interpolated1)
# wavelength       plt1       plt2
# [1,]   404.5020 0.01451395 0.01394186
# [2,]   404.5547 0.01452100 0.01394686
# [3,]   404.6075 0.01452805 0.01395185
# [4,]   404.6602 0.01453511 0.01395685
# [5,]   404.7129 0.01454216 0.01396185
# [6,]   404.7657 0.01454921 0.01396684

Interpolation of time series data with specific output time

Try this (assuming your time index is POSIXct):

library(zoo)
st <- as.POSIXct("2012-01-21 18:45")
g <- seq(st, end(z), by = "15 min") # grid
na.approx(z, xout = g)

See ?na.approx.zoo for more info.

Note: Since the question did not provide the data in reproducible form we do so here:

Lines <- "Id date Time humid humtemp prtemp press t1
1   2012-01-21 18:41:50     47.7    14.12   13.870  1005.70     -0.05277778
1   2012-01-21 18:46:43     44.5    15.37   15.100  1005.20     0.02861111
1   2012-01-21 18:51:35     43.2    15.88   15.576  1005.10     0.10972222
1   2012-01-21 18:56:28     42.5    16.17   15.833  1004.90     0.19111111
1   2012-01-21 19:01:21     42.2    16.31   15.986  1004.80     0.27250000
1   2012-01-21 19:06:14     41.8    16.47   16.118  1004.60     0.35388889
1   2012-01-21 19:11:07     41.6    16.51   16.177  1004.60     0.43527778"

library(zoo)
z <- read.zoo(text = Lines, header = TRUE, index = 2:3, tz = "")
st <- as.POSIXct("2012-01-21 18:45")
g <- seq(st, end(z), by = "15 min") # grid
na.approx(z, xout = g)

giving:

                    Id    humid  humtemp   prtemp    press            t1
2012-01-21 18:45:00  1 45.62491 14.93058 14.66761 1005.376 -1.501706e-09
2012-01-21 19:00:00  1 42.28294 16.27130 15.94370 1004.828  2.500000e-01

Interpolate to obtaining certain value in R

I believe what you want is this. At the moment there's nothing to interpolate, you need a NA column first. You could append one after the second position.

(d <- as.data.frame(append(d, list(X2=NA), 2)))
#         Date         X1 X2         X3         X4         X5       X6
# 1 2020-02-10 0.04919382 NA 0.04962555 0.04579872 0.03546890 0.048592
# 2 2020-05-20 0.04909930 NA 0.04957330 0.04587720 0.04741000 0.052167
# 3 2020-08-12 0.04909930 NA 0.04957330 0.04525272 0.03554400 0.045489
# 4 2020-10-18 0.04915135 NA 0.04957330 0.04782200 0.03485484 0.024452

Now you want to apply the approx function row-wise, i.e. with MARGIN=1. The logic is, that you feed it with a sequence of length of the values to interpolate, i.e. of the x of each apply iteration which is seq(x) as well as the values containing the NAs which is x itself. From the output you want the y, and, because it is the whole (transposed) matrix, just row [2,].

d$X2 <- apply(d[-1], MARGIN=1, function(x) approx(seq(x), x, seq(x))$y)[2,]
d
#         Date         X1         X2         X3         X4         X5       X6
# 1 2020-02-10 0.04919382 0.04940968 0.04962555 0.04579872 0.03546890 0.048592
# 2 2020-05-20 0.04909930 0.04933630 0.04957330 0.04587720 0.04741000 0.052167
# 3 2020-08-12 0.04909930 0.04933630 0.04957330 0.04525272 0.03554400 0.045489
# 4 2020-10-18 0.04915135 0.04936232 0.04957330 0.04782200 0.03485484 0.024452

Data:

d <- structure(list(Date = structure(c(18302, 18402, 18486, 18553), class = "Date"), 
    X1 = c(0.04919382, 0.0490993, 0.0490993, 0.04915135), X3 = c(0.04962555, 
    0.0495733, 0.0495733, 0.0495733), X4 = c(0.04579872, 0.0458772, 
    0.04525272, 0.047822), X5 = c(0.0354689, 0.04741, 0.035544, 
    0.03485484), X6 = c(0.048592, 0.052167, 0.045489, 0.024452
    )), row.names = c(NA, -4L), class = "data.frame")

Linear interpolation of Panel Data in R over multiple columns

I'd write a function to interpolate an arbitrary variable within country group, and then map() it over all the variables and join them back together.

library(tidyverse)
data("population")

# create some data to interpolate
population_5 <- population %>% 
  filter(year %% 5 == 0) %>% 
  mutate(female_pop = population / 2,
         male_pop = population / 2)

interpolate_func <- function(variable, data) {
  data %>% 
    group_by(country) %>% 
    # can't interpolate if only one year
    filter(n() >= 2) %>% 
    group_modify(~as_tibble(approx(.x$year, .x[[variable]], 
                                   xout = min(.x$year):max(.x$year)))) %>% 
    set_names("country", "year", paste0(variable, "_interpolated")) %>% 
    ungroup()
}

vars_to_interpolate <- names(select(population_5, -country, -year))

map(vars_to_interpolate, interpolate_func, 
    data = population_5) %>% 
  reduce(full_join, by = c("country", "year"))

#> # A tibble: 3,395 × 5
#>    country      year population_interpolated female_pop_interp… male_pop_interp…
#>    <chr>       <int>                   <dbl>              <dbl>            <dbl>
#>  1 Afghanistan  1995               17586073            8793036.         8793036.
#>  2 Afghanistan  1996               18187930.           9093965.         9093965.
#>  3 Afghanistan  1997               18789788.           9394894.         9394894.
#>  4 Afghanistan  1998               19391645.           9695823.         9695823.
#>  5 Afghanistan  1999               19993503.           9996751.         9996751.
#>  6 Afghanistan  2000               20595360           10297680         10297680 
#>  7 Afghanistan  2001               21448459           10724230.        10724230.
#>  8 Afghanistan  2002               22301558           11150779         11150779 
#>  9 Afghanistan  2003               23154657           11577328.        11577328.
#> 10 Afghanistan  2004               24007756           12003878         12003878 
#> # … with 3,385 more rows

^{Created on 2022-06-01 by the reprex package (v2.0.1)}

Interpolation of panel data based on a flag column in r

You can replace c values where flag is TRUE with NA and then interpolate values with na.approx from zoo.

library(dplyr)

df %>% mutate(c = zoo::na.approx(replace(c, flag, NA)))

#  year id  c  flag
#1 2012  1 11 FALSE
#2 2013  1 12  TRUE
#3 2014  1 13 FALSE
#4 2012  2 16 FALSE
#5 2013  2 15 FALSE
#6 2014  2 15 FALSE

Interpolating data only between first and last observation in R

Try using the na.approx function in the "zoo" package.

library("zoo")
data %>% group_by(country) %>% mutate(int = na.approx(value, na.rm=FALSE))

Hope this is what you are looking for, This would keep the NA in country 1 as NA.

Linear interpolation in R for columns

You could use na.approx from the zoo package. The function according to document:

Generic functions for replacing each NA with interpolated values.

Code:

df <- data.frame(rates = c(0.66, NA, NA, NA, 0.77, 0.75, 0.79, NA, NA, 0.79))

library(zoo)
df$new_rates <- na.approx(df$rates)
df
#>    rates new_rates
#> 1   0.66    0.6600
#> 2     NA    0.6875
#> 3     NA    0.7150
#> 4     NA    0.7425
#> 5   0.77    0.7700
#> 6   0.75    0.7500
#> 7   0.79    0.7900
#> 8     NA    0.7900
#> 9     NA    0.7900
#> 10  0.79    0.7900

^{Created on 2022-07-05 by the reprex package (v2.0.1)}

How to Interpolate Data in R