Extract Last Non-Missing Value in Row with Data.Table

How to get value of last non-NA column

You can use max.col with ties.method set as "last" to get last non-NA value in each row.

test$val <- test[cbind(1:nrow(test), max.col(!is.na(test), ties.method = 'last'))]
test

#        date a  b  c val
#1 2020-01-01 4 NA NA   4
#2 2020-01-02 3  2 NA   2
#3 2020-01-03 4  1  5   5

Extract last non-missing value in row with data.table

Here's another way:

dat[, res := NA_character_]
for (v in rev(names(dat))[-1]) dat[is.na(res), res := get(v)]

   X1 X2 X3 X4 X5 res
1:  u NA NA NA NA   u
2:  f  q NA NA NA   q
3:  f  b  w NA NA   w
4:  k  g  h NA NA   h
5:  u  b  r NA NA   r
6:  f  q  w  x  t   t
7:  u  g  h  i  e   e
8:  u  q  r  n  t   t

Benchmarks Using the same data as @alexis_laz and making (apparently) superficial changes to the functions, I see different results. Just showing them here in case anyone is curious. Alexis' answer (with small modifications) still comes out ahead.

Functions:

alex = function(x, ans = rep_len(NA, length(x[[1L]])), wh = seq_len(length(x[[1L]]))){
    if(!length(wh)) return(ans)
    ans[wh] = as.character(x[[length(x)]])[wh]
    Recall(x[-length(x)], ans, wh[is.na(ans[wh])])
}   

alex2 = function(x){
    x[, res := NA_character_]
    wh = x[, .I]
    for (v in (length(x)-1):1){
      if (!length(wh)) break
      set(x, j="res", i=wh, v = x[[v]][wh])
      wh = wh[is.na(x$res[wh])]
    }
    x$res
}

frank = function(x){
    x[, res := NA_character_]
    for(v in rev(names(x))[-1]) x[is.na(res), res := get(v)]
    return(x$res)       
}

frank2 = function(x){
    x[, res := NA_character_]
    for(v in rev(names(x))[-1]) x[is.na(res), res := .SD, .SDcols=v]
    x$res
}

Example data and benchmark:

DAT1 = as.data.table(lapply(ceiling(seq(0, 1e4, length.out = 1e2)), 
                     function(n) c(rep(NA, n), sample(letters, 3e5 - n, TRUE))))
DAT2 = copy(DAT1)
DAT3 = as.list(copy(DAT1))
DAT4 = copy(DAT1)

library(microbenchmark)
microbenchmark(frank(DAT1), frank2(DAT2), alex(DAT3), alex2(DAT4), times = 30)

Unit: milliseconds
         expr       min        lq      mean    median         uq        max neval
  frank(DAT1) 850.05980 909.28314 985.71700 979.84230 1023.57049 1183.37898    30
 frank2(DAT2)  88.68229  93.40476 118.27959 107.69190  121.60257  346.48264    30
   alex(DAT3)  98.56861 109.36653 131.21195 131.20760  149.99347  183.43918    30
  alex2(DAT4)  26.14104  26.45840  30.79294  26.67951   31.24136   50.66723    30

keep last non missing observation for all variables by group

Using data.table :

library(data.table)

d[, lapply(.SD, function(x) last(na.omit(x))), g]

#   g a b    c
#1: 1 1 2 <NA>
#2: 2 4 4    c

Getting the position of the the last non-NA value in a row in an R data.table

We can use max.col :

max.col(!is.na(dt[, -1]), ties.method = 'last') * +(rowSums(!is.na(dt[,-1])) > 0)
#[1] 4 2 3 0

Extract and collapse non-missing elements by row in the data.table

Using melt() / dcast():

data[, row := .I
     ][, melt(.SD, id.vars = "row")
        ][order(row, value), paste0(unique(value[!is.na(value)]), collapse = "&&&"), by = row]

    row    V1
 1:   1     1
 2:   2      
 3:   3     1
 4:   4     1
 5:   5 1&&&2
 6:   6 1&&&2
 7:   7     2
 8:   8 1&&&2
 9:   9 1&&&2
10:  10     2

Alterntively using your original function:

data[, function_non_missing(unlist(.SD)), by = 1:nrow(data)]

    nrow     V1
 1:    1      1
 2:    2       
 3:    3      2
 4:    4 1&&&&2
 5:    5 1&&&&2
 6:    6 1&&&&2
 7:    7      1
 8:    8      2
 9:    9 1&&&&2
10:   10 1&&&&2

Get value of last non-NA row per column in data.table

If the dataset is data.table, loop through the Subset of Data.table (.SD), subset the non-NA element (x[!is.na(x)]) and extract the last element among those with tail.

df1[, lapply(.SD, function(x) tail(x[!is.na(x)],1))]
#   a  b c
#1: 63 57 4

Selecting non `NA` values from duplicate rows with `data.table` -- when having more than one grouping variable

Here some data.table-based solutions.

setDT(df_id_year_and_type)

method 1

na.omit(df_id_year_and_type, cols="type") drops NA rows based on column type.
unique(df_id_year_and_type[, .(id, year)], fromLast=TRUE) finds all the groups.
And by joining them (using the last match: mult="last"), we obtain the desired output.

na.omit(df_id_year_and_type, cols="type"
        )[unique(df_id_year_and_type[, .(id, year)], fromLast=TRUE), 
          on=c('id', 'year'), 
          mult="last"]

#       id  year   type
#    <num> <num> <char>
# 1:     1  2002      A
# 2:     2  2008      B
# 3:     3  2010      D
# 4:     3  2013   <NA>
# 5:     4  2020      C
# 6:     5  2009      A
# 7:     6  2010      B
# 8:     6  2012   <NA>

method 2

df_id_year_and_type[df_id_year_and_type[, .I[which.max(cumsum(!is.na(type)))], .(id, year)]$V1,]

method 3

(likely slower because of [ overhead)

df_id_year_and_type[, .SD[which.max(cumsum(!is.na(type)))], .(id, year)]

How to fill NA with last non-missing value from previous columns?

You could use

library(dplyr)
df %>% 
  mutate(V5 = coalesce(V4, V3, V2, V1))

This returns

# A tibble: 7 x 5
     V1    V2    V3    V4    V5
  <dbl> <dbl> <dbl> <dbl> <dbl>
1  1.19  2.45  0.83  0.87  0.87
2  1.13  0.79  0.68  5.43  5.43
3  1.18  1.09  1.04 NA     1.04
4  1.11  1.1   4.24 NA     4.24
5  1.16  1.13 NA    NA     1.13
6  1.18 NA    NA    NA     1.18
7  1.44 NA     9.17 NA     9.17

Or more general from https://github.com/tidyverse/funs/issues/54#issuecomment-892377998

df %>% 
  mutate(V5 = do.call(coalesce, rev(across(-V5))))

or https://github.com/tidyverse/funs/issues/54#issuecomment-1096449488

df %>% 
  mutate(V5 = coalesce(!!!rev(select(., -V5))))

update non-missing values based on most recent date

A data.table option

setDT(data)[, employ := last(na.omit(employ[order(year)])), id]

gives

    id year employ
 1:  1 2010    yes
 2:  1 2011    yes
 3:  2 2010    yes
 4:  2 2011    yes
 5:  3 2010     no
 6:  3 2011     no
 7:  4 2010    yes
 8:  4 2011    yes
 9:  5 2010     no
10:  5 2011     no

A dplyr way might be

data %>%
  group_by(id) %>%
  mutate(employ = last(na.omit(employ[order(year)])))

which gives

      id  year employ
   <dbl> <dbl> <chr>
 1     1  2010 yes
 2     1  2011 yes
 3     2  2010 yes
 4     2  2011 yes
 5     3  2010 no
 6     3  2011 no
 7     4  2010 yes
 8     4  2011 yes
 9     5  2010 no
10     5  2011 no