Fill Na in a Time Series Only to a Limited Number

Fill NA in a time series only to a limited number

Here's another way:

l <- cumsum(! is.na(x))
c(NA, x[! is.na(x)])[replace(l, ave(l, l, FUN=seq_along) > 4, 0) + 1]
# [1]  1  1  1  1  5  5  5  5 NA NA 11 12 12 12 12 NA NA NA 19 20

edit: my previous answer required that x have no duplicates. The current answer does not.

benchmarks

x <- rep(x, length.out=1e4)

plourde <- function(x) {
    l <- cumsum(! is.na(x))
    c(NA, x[! is.na(x)])[replace(l, ave(l, l, FUN=seq_along) > 4, 0) + 1]
}

agstudy <- function(x) {
    unlist(sapply(split(coredata(x),cumsum(!is.na(x))),
           function(sx){
             if(length(sx)>3) 
               sx[2:4] <- rep(sx[1],3)
             else sx <- rep(sx[1],length(sx))
             sx
           }))
}

microbenchmark(plourde(x), agstudy(x))
# Unit: milliseconds
#        expr   min     lq median     uq   max neval
#  plourde(x)  5.30  5.591  6.409  6.774 57.13   100
#  agstudy(x) 16.04 16.249 16.454 17.516 20.64   100

Fill NAs in a time series for specific number of dates only

rollapplyr in zoo can take a vector width. We can set each of its elements to be the number of points from the current point to a month ago (i.e. the current position minus the position a month ago) and then rollapplyr can apply na.locf0 to only the last month.

library(lubridate)
library(xts)

tt <- time(tempY)
w <- seq_along(tt) - findInterval(tt %m-% months(1), tt)
Fill <- function(x) {
  last <- tail(x, 1)
  if (is.na(last)) tail(na.locf0(x), 1) else last
}

rollapplyr(as.zoo(tempY), w, Fill)

How to fill down values with limit in R?

One potential solution is to use vec_fill_missing() from the vctrs package which has a "max_fill" option:

library(tidyverse)
library(vctrs)

df <- read.table(text = "DATE                      ELE.CN
 2009-06-30 00:00:00 115942928608
 2009-06-28 00:00:00 115942928608
 2009-06-27 00:00:00 115942928608
 2009-06-26 00:00:00 115942928608
 2009-06-24 00:00:00           NA
 2009-06-23 00:00:00           NA
 2009-06-21 00:00:00           NA
 2009-06-20 00:00:00           NA
 2009-06-19 00:00:00           NA
 2009-06-17 00:00:00           NA", header = TRUE)
df
#>                DATE       ELE.CN
#> 2009-06-30 00:00:00 115942928608
#> 2009-06-28 00:00:00 115942928608
#> 2009-06-27 00:00:00 115942928608
#> 2009-06-26 00:00:00 115942928608
#> 2009-06-24 00:00:00           NA
#> 2009-06-23 00:00:00           NA
#> 2009-06-21 00:00:00           NA
#> 2009-06-20 00:00:00           NA
#> 2009-06-19 00:00:00           NA
#> 2009-06-17 00:00:00           NA

df %>%
  mutate(ELE.CN = vec_fill_missing(ELE.CN, max_fill = 3))
#>                DATE       ELE.CN
#> 2009-06-30 00:00:00 115942928608
#> 2009-06-28 00:00:00 115942928608
#> 2009-06-27 00:00:00 115942928608
#> 2009-06-26 00:00:00 115942928608
#> 2009-06-24 00:00:00 115942928608
#> 2009-06-23 00:00:00 115942928608
#> 2009-06-21 00:00:00 115942928608
#> 2009-06-20 00:00:00           NA
#> 2009-06-19 00:00:00           NA
#> 2009-06-17 00:00:00           NA

^{Created on 2022-07-14 by the reprex package (v2.0.1)}

How to fillna limited by date in a groupby

You can group the dataframe on columns Security and ID along with an additional grouper for column day with frequency set to 60 days then use ffill to forward fill the values for the next 60 days:

g = pd.Grouper(key='day', freq='60d')
df.assign(**df.groupby(["Security","ID", g]).ffill())

         day TradeID Security     ID BSType    Price   Quantity
0 2020-01-01      01   GOOGLE  ID001      B  105.901  1000000.0
1 2020-01-03      02   GOOGLE  ID001      S  106.969  -300000.0
2 2020-01-04      03    APPLE  ID001      B      NaN        NaN
3 2020-01-05      04   GOOGLE  ID001      B  107.037  7500000.0
4 2020-01-06      05   GOOGLE  ID001      B  107.038   100000.0
5 2020-01-07      06   GOOGLE  ID001      S  107.136  -100000.0
6 2020-01-08      07   GOOGLE  ID001      S  107.136  -100000.0
7 2020-01-08      08   GOOGLE  ID001      S  107.250 -7800000.0
8 2020-06-09      09   GOOGLE  ID001      B      NaN        NaN

Backward replacement of NAs in time series only to a limited number of observations

Hacky, but why not just flip your column?

Code

# Using your result as basis
dt$Value <- rev(dt$Value)                  
dt$backward <- NA
r <- rle(is.na(dt$Value))
dt$backward <- na.locf(dt$Value, fromLast = F, na.rm = F)
is.na(dt$backward) <- sequence(r$lengths) > 3 & rep(r$values, r$lengths)
dt$Value <- rev(dt$Value)                 
dt$backward <- rev(dt$backward)

Result

> dt
        Value   forward  backward
 1:        NA        NA        NA
 2:        NA        NA        NA
 3:        NA        NA        NA
 4:        NA        NA        NA
 5:        NA        NA        NA
 6:        NA        NA        NA
 7:        NA        NA 0.1359223
 8:        NA        NA 0.1359223
 9:        NA        NA 0.1359223
10: 0.1359223 0.1359223 0.1359223
11:        NA 0.1359223        NA
12:        NA 0.1359223 0.0000000
13:        NA 0.1359223 0.0000000
14:        NA        NA 0.0000000
15: 0.0000000 0.0000000 0.0000000
16: 0.0000000 0.0000000 0.0000000
17: 0.0000000 0.0000000 0.0000000
18: 0.0000000 0.0000000 0.0000000
19: 0.0000000 0.0000000 0.0000000
20:        NA 0.0000000        NA

How to fill nan values from a specific date range in a python time series?

IIUC, you can use simple indexing:

# if needed, convert to datetime
#df.index = pd.to_datetime(df.index)

df.loc[df.index.month==5, 'min'] = df.loc[df.index.month.isin([4,6]), 'min'].mean()

or if you have non NaN for the 5th month:

mask = df.index.month==5
df.loc[mask, 'min'] = (df.loc[mask, 'min']
                         .fillna(df.loc[df.index.month.isin([4,6]), 'min'].mean())
                       )

output:

                       min
date                      
2013-04-01 12:00:00  16.00
2013-04-02 12:00:00  16.00
2013-05-22 12:00:00  15.88
2013-05-23 12:00:00  15.88
2013-05-24 12:00:00  15.88
2013-05-27 12:00:00  15.88
2013-05-28 12:00:00  15.88
2013-05-29 12:00:00  15.88
2013-05-30 12:00:00  15.88
2013-05-31 12:00:00  15.88
2013-06-03 12:00:00    NaN
2013-06-04 12:00:00    NaN
2013-06-05 12:00:00    NaN
2013-06-06 12:00:00    NaN
2013-06-07 12:00:00    NaN
2013-06-10 12:00:00    NaN
2013-06-11 12:00:00    NaN
2013-06-12 12:00:00    NaN
2013-06-13 12:00:00    NaN
2013-06-14 12:00:00    NaN
2013-06-17 12:00:00    NaN
2013-06-18 12:00:00    NaN
2013-06-19 12:00:00  15.80
2013-06-20 12:00:00  15.80
2013-06-21 12:00:00  15.80

Missing values in Time Series in python

Consider interpolate (Series - DataFrame). This example shows how to fill gaps of any size with a straight line:

df = pd.DataFrame({'date': pd.date_range(start='2013-01-01', periods=10, freq='H'), 'value': range(10)})
df.loc[2:3, 'value'] = np.nan
df.loc[6, 'value'] = np.nan
df
                 date  value
0 2013-01-01 00:00:00    0.0
1 2013-01-01 01:00:00    1.0
2 2013-01-01 02:00:00    NaN
3 2013-01-01 03:00:00    NaN
4 2013-01-01 04:00:00    4.0
5 2013-01-01 05:00:00    5.0
6 2013-01-01 06:00:00    NaN
7 2013-01-01 07:00:00    7.0
8 2013-01-01 08:00:00    8.0
9 2013-01-01 09:00:00    9.0

df['value'].interpolate(method='linear', inplace=True)
                 date  value
0 2013-01-01 00:00:00    0.0
1 2013-01-01 01:00:00    1.0
2 2013-01-01 02:00:00    2.0
3 2013-01-01 03:00:00    3.0
4 2013-01-01 04:00:00    4.0
5 2013-01-01 05:00:00    5.0
6 2013-01-01 06:00:00    6.0
7 2013-01-01 07:00:00    7.0
8 2013-01-01 08:00:00    8.0
9 2013-01-01 09:00:00    9.0

Replace NA with previous value with limit on number of consecutive NA

Here's an option using na.locf and rle

library(zoo)
r <- rle(is.na(a))
a <- na.locf(a)
is.na(a) <- sequence(r$lengths) > n & rep(r$values, r$lengths)
a
# [1]  1  1  1 NA NA NA  2  2  1  1  1 NA

So here I first computed the run lengths of elements in a (including the NA entries), then replaced all NA's using na.locf and finally turned those elements back to NA's where the run lengths were greater than n and the elements were NA.

Fill Na in a Time Series Only to a Limited Number