How to interpolate data in R
To deal with minute data, I would recommend using package xts
and the function na.approx
from package zoo
. In a nutshell, you need to create an empty vector of minute data that you will merge
with your original data. Then, you can use na.approx
to approximate the missing values.
#Intial data, not by minute
datetime <- Sys.time()
date_time_init <- Sys.time()+c(0,3,5,8)*60
df1 <- xts(c(1:4),date_time_init)
> df1
[,1]
2017-06-02 03:10:20 1
2017-06-02 03:13:20 2
2017-06-02 03:15:20 3
2017-06-02 03:18:20 4
#Create time sequence by minute
date_time_complete <- seq.POSIXt(from=min(date_time_init),
to=max(date_time_init),by="min")
#Merge initial data with new time sequence
df2 <- merge(df1,xts(,date_time_complete))
df1
2017-06-02 03:10:20 1
2017-06-02 03:11:20 NA
2017-06-02 03:12:20 NA
2017-06-02 03:13:20 2
2017-06-02 03:14:20 NA
2017-06-02 03:15:20 3
2017-06-02 03:16:20 NA
2017-06-02 03:17:20 NA
2017-06-02 03:18:20 4
na.approx(df2)
df1
2017-06-02 03:07:24 1.000000
2017-06-02 03:08:24 1.333333
2017-06-02 03:09:24 1.666667
2017-06-02 03:10:24 2.000000
2017-06-02 03:11:24 2.500000
2017-06-02 03:12:24 3.000000
2017-06-02 03:13:24 3.333333
2017-06-02 03:14:24 3.666667
2017-06-02 03:15:24 4.000000
How to interpolate data with R?
We can use complete
to expand the data for each 'ent' and the 'year' range, then with na.approx
interpolate the missing values in 'pobtot'
library(dplyr)
library(tidyr)
z %>%
complete(ent, year = 1995:2008) %>%
mutate(pobtot = zoo::na.approx(pobtot, na.rm = FALSE))
How can I interpolate each row of a data frame in R?
ting = "wavelength plt1 plt2
404.502 0.01451395 0.01394186
411.006 0.01538372 0.01455814
989.878 0.25398372 0.25955116
996.382 0.25714419 0.25986279"
data <- read.table(text = ting, header = TRUE)
interpolated1 <- apply(data, 2, function(x) approx(x, y=NULL, method = "linear", n = 371)$y)
dim(interpolated1)
# [1] 371 3
head(interpolated1)
# wavelength plt1 plt2
# [1,] 404.5020 0.01451395 0.01394186
# [2,] 404.5547 0.01452100 0.01394686
# [3,] 404.6075 0.01452805 0.01395185
# [4,] 404.6602 0.01453511 0.01395685
# [5,] 404.7129 0.01454216 0.01396185
# [6,] 404.7657 0.01454921 0.01396684
Interpolation of time series data with specific output time
Try this (assuming your time index is POSIXct):
library(zoo)
st <- as.POSIXct("2012-01-21 18:45")
g <- seq(st, end(z), by = "15 min") # grid
na.approx(z, xout = g)
See ?na.approx.zoo
for more info.
Note: Since the question did not provide the data in reproducible form we do so here:
Lines <- "Id date Time humid humtemp prtemp press t1
1 2012-01-21 18:41:50 47.7 14.12 13.870 1005.70 -0.05277778
1 2012-01-21 18:46:43 44.5 15.37 15.100 1005.20 0.02861111
1 2012-01-21 18:51:35 43.2 15.88 15.576 1005.10 0.10972222
1 2012-01-21 18:56:28 42.5 16.17 15.833 1004.90 0.19111111
1 2012-01-21 19:01:21 42.2 16.31 15.986 1004.80 0.27250000
1 2012-01-21 19:06:14 41.8 16.47 16.118 1004.60 0.35388889
1 2012-01-21 19:11:07 41.6 16.51 16.177 1004.60 0.43527778"
library(zoo)
z <- read.zoo(text = Lines, header = TRUE, index = 2:3, tz = "")
st <- as.POSIXct("2012-01-21 18:45")
g <- seq(st, end(z), by = "15 min") # grid
na.approx(z, xout = g)
giving:
Id humid humtemp prtemp press t1
2012-01-21 18:45:00 1 45.62491 14.93058 14.66761 1005.376 -1.501706e-09
2012-01-21 19:00:00 1 42.28294 16.27130 15.94370 1004.828 2.500000e-01
Interpolate to obtaining certain value in R
I believe what you want is this. At the moment there's nothing to interpolate, you need a NA
column first. You could append
one after the second position.
(d <- as.data.frame(append(d, list(X2=NA), 2)))
# Date X1 X2 X3 X4 X5 X6
# 1 2020-02-10 0.04919382 NA 0.04962555 0.04579872 0.03546890 0.048592
# 2 2020-05-20 0.04909930 NA 0.04957330 0.04587720 0.04741000 0.052167
# 3 2020-08-12 0.04909930 NA 0.04957330 0.04525272 0.03554400 0.045489
# 4 2020-10-18 0.04915135 NA 0.04957330 0.04782200 0.03485484 0.024452
Now you want to apply
the approx
function row-wise, i.e. with MARGIN=1
. The logic is, that you feed it with a sequence of length of the values to interpolate, i.e. of the x
of each apply
iteration which is seq(x)
as well as the values containing the NA
s which is x
itself. From the output you want the y
, and, because it is the whole (transposed) matrix, just row [2,]
.
d$X2 <- apply(d[-1], MARGIN=1, function(x) approx(seq(x), x, seq(x))$y)[2,]
d
# Date X1 X2 X3 X4 X5 X6
# 1 2020-02-10 0.04919382 0.04940968 0.04962555 0.04579872 0.03546890 0.048592
# 2 2020-05-20 0.04909930 0.04933630 0.04957330 0.04587720 0.04741000 0.052167
# 3 2020-08-12 0.04909930 0.04933630 0.04957330 0.04525272 0.03554400 0.045489
# 4 2020-10-18 0.04915135 0.04936232 0.04957330 0.04782200 0.03485484 0.024452
Data:
d <- structure(list(Date = structure(c(18302, 18402, 18486, 18553), class = "Date"),
X1 = c(0.04919382, 0.0490993, 0.0490993, 0.04915135), X3 = c(0.04962555,
0.0495733, 0.0495733, 0.0495733), X4 = c(0.04579872, 0.0458772,
0.04525272, 0.047822), X5 = c(0.0354689, 0.04741, 0.035544,
0.03485484), X6 = c(0.048592, 0.052167, 0.045489, 0.024452
)), row.names = c(NA, -4L), class = "data.frame")
Linear interpolation of Panel Data in R over multiple columns
I'd write a function to interpolate an arbitrary variable within country group, and then map()
it over all the variables and join them back together.
library(tidyverse)
data("population")
# create some data to interpolate
population_5 <- population %>%
filter(year %% 5 == 0) %>%
mutate(female_pop = population / 2,
male_pop = population / 2)
interpolate_func <- function(variable, data) {
data %>%
group_by(country) %>%
# can't interpolate if only one year
filter(n() >= 2) %>%
group_modify(~as_tibble(approx(.x$year, .x[[variable]],
xout = min(.x$year):max(.x$year)))) %>%
set_names("country", "year", paste0(variable, "_interpolated")) %>%
ungroup()
}
vars_to_interpolate <- names(select(population_5, -country, -year))
map(vars_to_interpolate, interpolate_func,
data = population_5) %>%
reduce(full_join, by = c("country", "year"))
#> # A tibble: 3,395 × 5
#> country year population_interpolated female_pop_interp… male_pop_interp…
#> <chr> <int> <dbl> <dbl> <dbl>
#> 1 Afghanistan 1995 17586073 8793036. 8793036.
#> 2 Afghanistan 1996 18187930. 9093965. 9093965.
#> 3 Afghanistan 1997 18789788. 9394894. 9394894.
#> 4 Afghanistan 1998 19391645. 9695823. 9695823.
#> 5 Afghanistan 1999 19993503. 9996751. 9996751.
#> 6 Afghanistan 2000 20595360 10297680 10297680
#> 7 Afghanistan 2001 21448459 10724230. 10724230.
#> 8 Afghanistan 2002 22301558 11150779 11150779
#> 9 Afghanistan 2003 23154657 11577328. 11577328.
#> 10 Afghanistan 2004 24007756 12003878 12003878
#> # … with 3,385 more rows
Created on 2022-06-01 by the reprex package (v2.0.1)
Interpolation of panel data based on a flag column in r
You can replace
c
values where flag
is TRUE
with NA
and then interpolate values with na.approx
from zoo
.
library(dplyr)
df %>% mutate(c = zoo::na.approx(replace(c, flag, NA)))
# year id c flag
#1 2012 1 11 FALSE
#2 2013 1 12 TRUE
#3 2014 1 13 FALSE
#4 2012 2 16 FALSE
#5 2013 2 15 FALSE
#6 2014 2 15 FALSE
Interpolating data only between first and last observation in R
Try using the na.approx function in the "zoo" package.
library("zoo")
data %>% group_by(country) %>% mutate(int = na.approx(value, na.rm=FALSE))
Hope this is what you are looking for, This would keep the NA in country 1 as NA.
Linear interpolation in R for columns
You could use na.approx
from the zoo
package. The function according to document:
Generic functions for replacing each NA with interpolated values.
Code:
df <- data.frame(rates = c(0.66, NA, NA, NA, 0.77, 0.75, 0.79, NA, NA, 0.79))
library(zoo)
df$new_rates <- na.approx(df$rates)
df
#> rates new_rates
#> 1 0.66 0.6600
#> 2 NA 0.6875
#> 3 NA 0.7150
#> 4 NA 0.7425
#> 5 0.77 0.7700
#> 6 0.75 0.7500
#> 7 0.79 0.7900
#> 8 NA 0.7900
#> 9 NA 0.7900
#> 10 0.79 0.7900
Created on 2022-07-05 by the reprex package (v2.0.1)
Related Topics
Include Non-Cran Package in Cran Package
R -Apply- Convert Many Columns from Numeric to Factor
Applying Function (Ks.Test) Between Two Data Frames Column-Wise in R
Creating a Stacked Bar Chart Centered on Zero Using Ggplot
Assigning/Referencing a Column Name in Data.Table Dynamically (In I, J and By)
Data.Table Objects Aren't Updated in Rstudio Environment Panel
How to Set Themes Globally for Ggplot2
Prevent Selectinput from Wrapping Text
R - Error When Using Geturl from Curl After Site Was Changed
How to Install/Locate R.H and Rmath.H Header Files
Passing Ellipsis Arguments to Map Function Purrr Package, R
How to Rotate 3D Plotly Continuous for R Shiny App
How to Install Doredis Package Version 1.0.5 into R 3.0.1 on Windows
Adding a Layer to The Current Plot Without Creating a New One in Ggplot2
How to Split a Dataframe Column by The First Instance of a Character in Its Values