Replace Na With Previous or Next Value, by Group, Using Dplyr

Replace NA with previous or next value, by group, using dplyr

library(tidyr) #fill is part of tidyr

ps1 %>% 
  group_by(userID) %>% 
  #fill(color, age, gender) %>% #default direction down
  fill(color, age, gender, .direction = "downup")

Which gives you:

Source: local data frame [9 x 4]
Groups: userID [3]

  userID  color    age gender
   <dbl> <fctr> <fctr> <fctr>
1     21   blue   3yrs      F
2     21   blue   2yrs      F
3     21    red   2yrs      M
4     22   blue   3yrs      F
5     22   blue   3yrs      F
6     22   blue   3yrs      F
7     23    red   4yrs      F
8     23    red   4yrs      F
9     23   gold   4yrs      F

Fill missing values with previous values by row using dplyr

One solution could be using na.locf function from package zoo combining with pmap function in row-wise operation. na.locf takes the most recent non-NA value and replace all the upcoming NA values by that. Just as a reminder c(...) in both solutions captures all values of V1:V4 in each row in every iteration. However, I excluded id column in both as it is not involved in the our calculations.

library(zoo)
library(purrr)

df %>%
  mutate(pmap_df(., ~ na.locf(c(...)[-1])))

  id V1 V2 V3 V4
1 01  1  1  1  1
2 02  2  1  1  1
3 03  3  1  1  1
4 04  4  1  2  2

Or we can use coalesce function from dplyr. We can replace every NA values in each row with the last non-NA value, something we did earlier with na.locf. However this solution is a bit verbose:

df %>%
  mutate(pmap_df(., ~ {x <- c(...)[!is.na(c(...))]; 
  coalesce(c(...), x[length(x)])}))

  id V1 V2 V3 V4
1 01  1  1  1  1
2 02  2  1  1  1
3 03  3  1  1  1
4 04  4  1  2  2

Or you could also use this:

library(purrr)

df %>%
  mutate(across(!id, ~ replace(., is.na(.), invoke(coalesce, rev(df[-1])))))

  id V1 V2 V3 V4
1 01  1  1  1  1
2 02  2  1  1  1
3 03  3  1  1  1
4 04  4  1  2  2

The warning message can be ignored. It is in fact produced because we have 6 NA values but the result of applying coalesce on every vector is 1 element resulting in 4 elements to replace 6 slots.

Replace missing values (NA) with most recent non-NA by group

These all use na.locf from the zoo package. Also note that na.locf0 (also defined in zoo) is like na.locf except it defaults to na.rm = FALSE and requires a single vector argument. na.locf2 defined in the first solution is also used in some of the others.

dplyr

library(dplyr)
library(zoo)

na.locf2 <- function(x) na.locf(x, na.rm = FALSE)
df %>% group_by(houseID) %>% do(na.locf2(.)) %>% ungroup

giving:

Source: local data frame [15 x 3]
Groups: houseID

   houseID year price
1        1 1995    NA
2        1 1996   100
3        1 1997   100
4        1 1998   120
5        1 1999   120
6        2 1995    NA
7        2 1996    NA
8        2 1997    NA
9        2 1998    30
10       2 1999    30
11       3 1995    NA
12       3 1996    44
13       3 1997    44
14       3 1998    44
15       3 1999    44

A variation of this is:

df %>% group_by(houseID) %>% mutate(price = na.locf0(price)) %>% ungroup

Other solutions below give output which is quite similar so we won't repeat it except where the format differs substantially.

Another possibility is to combine the by solution (shown further below) with dplyr:

df %>% by(df$houseID, na.locf2) %>% bind_rows

library(zoo)

do.call(rbind, by(df, df$houseID, na.locf2))

ave

library(zoo)

transform(df, price = ave(price, houseID, FUN = na.locf0))

data.table

library(data.table)
library(zoo)

data.table(df)[, na.locf2(.SD), by = houseID]

zoo This solution uses zoo alone. It returns a wide rather than long result:

library(zoo)

z <- read.zoo(df, index = 2, split = 1, FUN = identity)
na.locf2(z)

giving:

       1  2  3
1995  NA NA NA
1996 100 NA 44
1997 100 NA 44
1998 120 30 44
1999 120 30 44

This solution could be combined with dplyr like this:

library(dplyr)
library(zoo)

df %>% read.zoo(index = 2, split = 1, FUN = identity) %>% na.locf2

input

Here is the input used for the examples above:

df <- structure(list(houseID = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 
  2L, 3L, 3L, 3L, 3L, 3L), year = c(1995L, 1996L, 1997L, 1998L, 
  1999L, 1995L, 1996L, 1997L, 1998L, 1999L, 1995L, 1996L, 1997L, 
  1998L, 1999L), price = c(NA, 100L, NA, 120L, NA, NA, NA, NA, 
  30L, NA, NA, 44L, NA, NA, NA)), .Names = c("houseID", "year", 
  "price"), class = "data.frame", row.names = c(NA, -15L))

REVISED Re-arranged and added more solutions. Revised dplyr/zoo solution to conform to latest changes dplyr. Applied fixed and factored out na.locf2 from all solutions.

Replace NAs for a group of values with a non-NA character in group in R

Here is an alternative way using na.locf from zoo package:

library(zoo)
library(dplyr)
df %>% 
  group_by(participant_id) %>% 
  arrange(participant_id, test) %>% 
  mutate(test = zoo::na.locf(test, na.rm=FALSE))

   participant_id test 
   <chr>          <chr>
 1 ps1            test1
 2 ps1            test1
 3 ps1            test1
 4 ps1            test1
 5 ps2            test2
 6 ps2            test2
 7 ps3            test3
 8 ps3            test3
 9 ps3            test3
10 ps3            test3

replace NA value with the group value

Try ave. It applies a function to groups. Have a look at ?ave for details, e.g.:

df$med_card_new <- ave(df$med_card, df$hhold_no, FUN=function(x)unique(x[!is.na(x)]))

#   person_id hhold_no med_card med_card_new
#1          1        1        1            1
#2          2        1        1            1
#3          3        1       NA            1
#4          4        1       NA            1
#5          5        1       NA            1
#6          6        2        0            0
#7          7        2        0            0
#8          8        2        0            0
#9          9        2        0            0

Please note that this will only work if not all values in a household are NA and the should not differ (e.g. person 1 == 1, person 2 == 0).

Replacing NA value in dataframe by first or last value of other columns within group

An option is to use replace_na (from tidyr) after grouping by 'Group' to replace the NA elements with either the first or last values of 'Dates' column

library(dplyr)
library(tidyr)
df1 %>% 
   group_by(Group) %>% 
   mutate(Departure = replace_na(Departure, first(Dates)), 
          Arrival = replace_na(Arrival, last(Dates))) %>% 
   ungroup

NOTE: Here we assume that 'Dates' are already ordered. If not, take the min and max after converting to Date class

library(lubridate)
df1 %>% 
   mutate(across(-Group, dmy)) %>%
   group_by(Group) %>% 
   mutate(Departure = replace_na(Departure, min(Dates)), 
          Arrival = replace_na(Arrival, max(Dates))) %>% 
   ungroup

R replace NA with last value for group ID ordered by date

You can use zoo::na.locf along with dplyr verbs. Make sure you group_by customer id first, and ensure your times are ordered correctly.

library(dplyr)

eg_data %>% 
  group_by(custID) %>% 
  arrange(custID, year, quarter) %>% 
  mutate(orderType = zoo::na.locf(orderType))

#> # A tibble: 32 x 4
#> # Groups:   custID [2]
#>    custID year  quarter orderType 
#>    <chr>  <chr> <chr>   <chr>     
#>  1 655321 2018  1       retail    
#>  2 655321 2018  2       retail    
#>  3 655321 2018  3       wholesale 
#>  4 655321 2018  4       wholesale 
#>  5 655321 2019  1       commercial
#>  6 655321 2019  2       retail    
#>  7 655321 2019  3       retail    
#>  8 655321 2019  4       retail    
#>  9 655321 2020  1       retail    
#> 10 655321 2020  2       wholesale 
#> # ... with 22 more rows

Replace value with previous row value

Does this work:

library(dplyr)
library(tidyr)
df %>% mutate(DSWP10 = as.numeric(na_if(DSWP10, '.'))) %>%  fill(DSWP10, .direction = 'up')
# A tibble: 7 x 2
  Date       DSWP10
  <chr>       <dbl>
1 07/01/2015   2.1 
2 06/01/2015   1.99
3 05/01/2015   1.99
4 04/01/2015   1.99
5 03/01/2015   1.98
6 02/01/2015   1.95
7 01/01/2015   1.95

How to replace NA with most recent non-NA by group?

As another base R solution, here is a poor man's na.locf

fill_down <- function(v) {
    if (length(v) > 1) {
        keep <- c(TRUE, !is.na(v[-1]))
        v[keep][cumsum(keep)]
    } else v
}

To fill down by group, the approach is to use tapply() to split and apply to each group, and split<- to combine groups to the original geometry, as

fill_down_by_group <- function(v, grp) {
    ## original 'by hand':
    ##     split(v, grp) <- tapply(v, grp, fill_down)
    ##     v
    ## done by built-in function `ave()`
    ave(v, grp, FUN=fill_down)
}

To process multiple columns, one might

elts <- c("age", "birthplace")
df[elts] <- lapply(df[elts], fill_down_by_group, df$name)

Notes

I would be interested in seeing how a dplyr solution handles many columns, without hard-coding each? Answering my own question, I guess this is
```
library(dplyr); library(tidyr)
df %>% group_by(name) %>% fill_(elts)
```

A more efficient base solution when the groups are already 'grouped' (e.g., identical(grp, sort(grp))) is

fill_down_by_grouped <- function(v, grp) {
    if (length(v) > 1) {
        keep <- !(duplicated(v) & is.na(v))
        v[keep][cumsum(keep)]
    } else v
}

For me, fill_down() on a vector with about 10M elements takes ~225ms; fill_down_by_grouped() takes ~300ms independent of the number of groups; fill_down_by_group() scales with the number of groups; for 10000 groups ~2s, 10M groups about 36s

Replace NA values with a fractional increase in previous values

You can use Reduce in base R :

Reduce(function(x, y) if(is.na(y)) x * 1.1 else y, df$value, accumulate = TRUE)
#[1] 10.0 15.0 20.0 22.0 24.2

If you want a tidyverse solution use accumulate.

library(dplyr)
library(purrr)

df %>% mutate(value = accumulate(value, ~if(is.na(.y)) .x * 1.1 else .y))

#   year value
#  <dbl> <dbl>
#1  2011  10  
#2  2012  15  
#3  2013  20  
#4  2014  22  
#5  2015  24.2

.x and .y in accumulate (or x and y in Reduce) are the current value and next value respectively. So for the first iteration .x is 10 and .y is 15, for next .x will become 15 and .y will be 20 and so on for all the values in df$value. We check here if the next value (.y) is NA then replace the next value with 1.1 times the previous value (.x) and if it is not NA keep it as it is.

Replace Na With Previous or Next Value, by Group, Using Dplyr