Last Observation Carried Forward In a data frame?
This already exists:
library(zoo)
na.locf(data.frame(rep("a",4), 1:4,1:4, c(1,NA,NA,NA)))
Last observation carried forward and last observation carried backward in R
This should work :
library(tidyverse)
df <- data.frame(ID=c(1,1,1,1,1,2,2,2,3,3), values=c(NA, NA, 12, 13, NA, 5, NA, NA, NA, 1))
df2 <- data.frame(ID=c(1,1,1,1,1,2,2,2,3,3), values=c(12, 12, 12, 13, 13, 5, 5, 5, 1, 1))
df <- df %>%
group_by(ID) %>%
fill(values, .direction = "downup") %>%
fill(values, .direction = "updown")
Last observation carried forward (LOCF) for all NA in R
You could use fill
from tidyr
.
tidyr::fill(df, everything())
# Value1 Value2 Value3 Group
#1 100 56 82 Group1
#2 150 56 97 Group2
#3 150 66 97 Group3
#4 120 66 97 Group4
Or with zoo::na.locf
df[] <- lapply(df, zoo::na.locf)
data
df <- structure(list(Value1 = c(100L, 150L, NA, 120L), Value2 = c(56L,
NA, 66L, NA), Value3 = c(82L, 97L, NA, NA), Group = structure(1:4,
.Label = c("Group1", "Group2", "Group3", "Group4"), class = "factor")),
class = "data.frame", row.names = c(NA, -4L))
Carrying forward last observation with a limit, by group, using data.table
Note that normally when dealing with longer stretches of NA's one either fills them all or none of them and na.locf already handles that using the maxgap argument which only fills gaps that are no longer than specified. The idea is that interpolation is only reliable over short stretches so you should not interpolate at all over longer ones. Nevertheless, the following shows how to implement the scheme in the question but consider whether you should change strategy and use maxgap instead.
1) Compute na.locf using na.locf0 giving locf and also create a grouping variable for stretches of NA's and non-NAs, g. Then for each run of NA's take the first two elements of the na.locf column and fill the remainder with NA's from value. This does not overwrite df so it can be used in a pipeline without side effects.
library(data.table)
library(zoo)
df[, .(values, locf = na.locf0(values), g = rleid(is.na(values))), by = country][
, .(values = c(head(locf, 2), tail(values, -2))), by = .(country, g)][
, .(country, values)]
giving:
country values
1: USA 2
2: USA 1
3: USA 1
4: USA 1
5: USA NA
6: FR 2
7: FR 1
8: FR 2
9: FR 2
10: FR 2
2) A slightly modified formulation that still uses the same basic idea is the following. It also does not overwrite.
library(data.table)
library(zoo)
# like na.locf0 but only specifies vector, x, and limit to fill, k
na.locf2 <- function(x, k) {
nalocf <- na.locf0(x)
f <- function(ix) c(head(nalocf[ix], k), tail(x[ix], -k))
unlist(tapply(seq_along(x), rleid(is.na(x)), f))
}
df[, .(values = na.locf2(values, 2)), by = country]
giving:
country values
1: USA 2
2: USA 1
3: USA 1
4: USA 1
5: USA NA
6: FR 2
7: FR 1
8: FR 2
9: FR 2
10: FR 2
Last observation carried forward conditional on value and colums
This is the use of accumulate2
ie
df%>%
group_by(ID)%>%
mutate(d = unlist(accumulate2(var,cond[-1],function(z,x,y) if(y) min(z,x,na.rm=TRUE) else max(z,x,na.rm=TRUE))))
# A tibble: 15 x 5
# Groups: ID [3]
ID date cond var d
<dbl> <dbl> <dbl> <dbl> <dbl>
1 1 1 0 1 1
2 1 2 0 NA 1
3 1 3 0 2 2
4 1 4 1 0 0
5 1 5 0 NA 0
6 2 1 0 NA NA
7 2 2 0 3 3
8 2 3 0 NA 3
9 2 4 1 0 0
10 2 5 0 NA 0
11 3 1 0 2 2
12 3 2 0 NA 2
13 3 3 0 1 2
14 3 4 0 NA 2
15 3 5 0 NA 2
Last observation carried forward in a data frame
Sorry for writing an answer (not enough reputation to just comment)
But what prevents you from doing as @ProcrastinatusMaximus said?
(you can include the zoo
call in your loop)
Would look like this:
for (i in badcols)
{
df[,i] <- zoo::na.locf(df[,i])
}
I am not sure if zoo
is faster than your implementation. You would have to try this out. You could also check spacetime::na.locf
, imputeTS::na.locf
to see which of the existing locf
implementations is the fastest.
Carry last observation forward (na.locf?) - but only when another column value changes
You can use fill
:
df %>%
mutate(sum = rowSums(!is.na(select(., DK0004612884:DK0002025170))),
across(DK0004612884:DK0002025170, ~ case_when(sum > 0 ~ replace_na(.x, 0)))) %>%
fill(DK0004612884:DK0002025170)
output
Date DK0004612884 DK0002025170 ym sum
1 2021-01-29 14.48 0.58 2021-1 2
2 2021-02-01 14.48 0.58 2021-2 0
3 2021-02-02 14.48 0.58 2021-2 0
4 2021-02-03 14.48 0.58 2021-2 0
5 2021-02-04 14.48 0.58 2021-2 0
6 2021-02-05 14.48 0.58 2021-2 0
7 2021-02-08 14.48 0.58 2021-2 0
8 2021-02-09 14.48 0.58 2021-2 0
9 2021-02-10 14.48 0.58 2021-2 0
10 2021-02-11 14.48 0.58 2021-2 0
11 2021-02-12 14.48 0.58 2021-2 0
12 2021-02-15 14.48 0.58 2021-2 0
13 2021-02-16 14.48 0.58 2021-2 0
14 2021-02-17 14.48 0.58 2021-2 0
15 2021-02-18 14.48 0.58 2021-2 0
16 2021-02-19 14.48 0.58 2021-2 0
17 2021-02-22 14.48 0.58 2021-2 0
18 2021-02-23 14.48 0.58 2021-2 0
19 2021-02-24 14.48 0.58 2021-2 0
20 2021-02-25 14.48 0.58 2021-2 0
21 2021-02-26 14.15 0.00 2021-2 1
22 2021-03-01 14.15 0.00 2021-3 0
23 2021-03-02 14.15 0.00 2021-3 0
24 2021-03-03 14.15 0.00 2021-3 0
25 2021-03-04 14.15 0.00 2021-3 0
26 2021-03-05 14.15 0.00 2021-3 0
27 2021-03-08 14.15 0.00 2021-3 0
28 2021-03-09 14.15 0.00 2021-3 0
29 2021-03-10 14.15 0.00 2021-3 0
30 2021-03-11 14.15 0.00 2021-3 0
31 2021-03-12 14.15 0.00 2021-3 0
R Replace Intermittent NA Values With Last Observation Carried Forward (NA.LOCF)
We can use na.locf(..., fromLast = TRUE)
to identify the trailing NA
values and use pmax
on them with the baseline. We'll demonstrate on the examples from your question in a nice all-together format:
# consolidate example data
dd = data.frame(
example = rep(1:3, each = 10),
visit = rep(1:10, 3),
value = c(34,NA,NA,15,16,19,NA,12,23,31,
34,22,18,15,16,19,NA,NA,NA,NA,
34,NA,NA,42,16,19,NA,38,NA,NA),
goal = c(34,34,34,15,16,19,19,12,23,31,
34,22,18,15,16,19,34,34,34,34,
34,34,34,42,16,19,19,38,38,38)
)
library(dplyr)
dd = dd %>% group_by(example) %>%
mutate(to_fill = !is.na(zoo::na.locf(value, fromLast = TRUE, na.rm = FALSE)),
result = if_else(to_fill,
zoo::na.locf(value, na.rm = FALSE),
pmax(first(value), zoo::na.locf(value, na.rm = FALSE))),
)
all(dd$goal == dd$result)
# [1] TRUE
As you can see, the result
matches the goal
column perfectly.
Related Topics
How to Produce Stacked Bars Within Grouped Barchart in R
How to Remove an Element from a List
Alternative to Expand.Grid for Data.Frames
Split Date into Different Columns for Year, Month and Day
Prevent Row Names to Be Written to File When Using Write.Csv
Difference Between Passing Options in Aes() and Outside of It in Ggplot2
Why Is Using Update on a Lm Inside a Grouped Data.Table Losing Its Model Data
Replace Contents of Factor Column in R Dataframe
If/Else Constructs Inside and Outside Functions
Dynamically Creating Tabs with Plots in Shiny Without Re-Creating Existing Tabs
Select Every Other Element from a Vector
R Shiny: Reactivevalues VS Reactive
Using Lists Inside Data.Table Columns
Min for Each Row in a Data Frame
How to Add a Ggplot2 Subtitle with Different Size and Colour
Ggplot2, Facet_Grid, Free Scales