How to get value of last non-NA column
You can use max.col
with ties.method
set as "last"
to get last non-NA value in each row.
test$val <- test[cbind(1:nrow(test), max.col(!is.na(test), ties.method = 'last'))]
test
# date a b c val
#1 2020-01-01 4 NA NA 4
#2 2020-01-02 3 2 NA 2
#3 2020-01-03 4 1 5 5
Extract last non-missing value in row with data.table
Here's another way:
dat[, res := NA_character_]
for (v in rev(names(dat))[-1]) dat[is.na(res), res := get(v)]
X1 X2 X3 X4 X5 res
1: u NA NA NA NA u
2: f q NA NA NA q
3: f b w NA NA w
4: k g h NA NA h
5: u b r NA NA r
6: f q w x t t
7: u g h i e e
8: u q r n t t
Benchmarks Using the same data as @alexis_laz and making (apparently) superficial changes to the functions, I see different results. Just showing them here in case anyone is curious. Alexis' answer (with small modifications) still comes out ahead.
Functions:
alex = function(x, ans = rep_len(NA, length(x[[1L]])), wh = seq_len(length(x[[1L]]))){
if(!length(wh)) return(ans)
ans[wh] = as.character(x[[length(x)]])[wh]
Recall(x[-length(x)], ans, wh[is.na(ans[wh])])
}
alex2 = function(x){
x[, res := NA_character_]
wh = x[, .I]
for (v in (length(x)-1):1){
if (!length(wh)) break
set(x, j="res", i=wh, v = x[[v]][wh])
wh = wh[is.na(x$res[wh])]
}
x$res
}
frank = function(x){
x[, res := NA_character_]
for(v in rev(names(x))[-1]) x[is.na(res), res := get(v)]
return(x$res)
}
frank2 = function(x){
x[, res := NA_character_]
for(v in rev(names(x))[-1]) x[is.na(res), res := .SD, .SDcols=v]
x$res
}
Example data and benchmark:
DAT1 = as.data.table(lapply(ceiling(seq(0, 1e4, length.out = 1e2)),
function(n) c(rep(NA, n), sample(letters, 3e5 - n, TRUE))))
DAT2 = copy(DAT1)
DAT3 = as.list(copy(DAT1))
DAT4 = copy(DAT1)
library(microbenchmark)
microbenchmark(frank(DAT1), frank2(DAT2), alex(DAT3), alex2(DAT4), times = 30)
Unit: milliseconds
expr min lq mean median uq max neval
frank(DAT1) 850.05980 909.28314 985.71700 979.84230 1023.57049 1183.37898 30
frank2(DAT2) 88.68229 93.40476 118.27959 107.69190 121.60257 346.48264 30
alex(DAT3) 98.56861 109.36653 131.21195 131.20760 149.99347 183.43918 30
alex2(DAT4) 26.14104 26.45840 30.79294 26.67951 31.24136 50.66723 30
keep last non missing observation for all variables by group
Using data.table
:
library(data.table)
d[, lapply(.SD, function(x) last(na.omit(x))), g]
# g a b c
#1: 1 1 2 <NA>
#2: 2 4 4 c
Getting the position of the the last non-NA value in a row in an R data.table
We can use max.col
:
max.col(!is.na(dt[, -1]), ties.method = 'last') * +(rowSums(!is.na(dt[,-1])) > 0)
#[1] 4 2 3 0
Extract and collapse non-missing elements by row in the data.table
Using melt()
/ dcast()
:
data[, row := .I
][, melt(.SD, id.vars = "row")
][order(row, value), paste0(unique(value[!is.na(value)]), collapse = "&&&"), by = row]
row V1
1: 1 1
2: 2
3: 3 1
4: 4 1
5: 5 1&&&2
6: 6 1&&&2
7: 7 2
8: 8 1&&&2
9: 9 1&&&2
10: 10 2
Alterntively using your original function:
data[, function_non_missing(unlist(.SD)), by = 1:nrow(data)]
nrow V1
1: 1 1
2: 2
3: 3 2
4: 4 1&&&&2
5: 5 1&&&&2
6: 6 1&&&&2
7: 7 1
8: 8 2
9: 9 1&&&&2
10: 10 1&&&&2
Get value of last non-NA row per column in data.table
If the dataset is data.table
, loop through the Subset of Data.table (.SD
), subset the non-NA element (x[!is.na(x)]
) and extract the last element among those with tail
.
df1[, lapply(.SD, function(x) tail(x[!is.na(x)],1))]
# a b c
#1: 63 57 4
Selecting non `NA` values from duplicate rows with `data.table` -- when having more than one grouping variable
Here some data.table-based solutions.
setDT(df_id_year_and_type)
method 1
na.omit(df_id_year_and_type, cols="type")
drops NA
rows based on column type
.unique(df_id_year_and_type[, .(id, year)], fromLast=TRUE)
finds all the groups.
And by joining them (using the last match: mult="last"
), we obtain the desired output.
na.omit(df_id_year_and_type, cols="type"
)[unique(df_id_year_and_type[, .(id, year)], fromLast=TRUE),
on=c('id', 'year'),
mult="last"]
# id year type
# <num> <num> <char>
# 1: 1 2002 A
# 2: 2 2008 B
# 3: 3 2010 D
# 4: 3 2013 <NA>
# 5: 4 2020 C
# 6: 5 2009 A
# 7: 6 2010 B
# 8: 6 2012 <NA>
method 2
df_id_year_and_type[df_id_year_and_type[, .I[which.max(cumsum(!is.na(type)))], .(id, year)]$V1,]
method 3
(likely slower because of [
overhead)
df_id_year_and_type[, .SD[which.max(cumsum(!is.na(type)))], .(id, year)]
How to fill NA with last non-missing value from previous columns?
You could use
library(dplyr)
df %>%
mutate(V5 = coalesce(V4, V3, V2, V1))
This returns
# A tibble: 7 x 5
V1 V2 V3 V4 V5
<dbl> <dbl> <dbl> <dbl> <dbl>
1 1.19 2.45 0.83 0.87 0.87
2 1.13 0.79 0.68 5.43 5.43
3 1.18 1.09 1.04 NA 1.04
4 1.11 1.1 4.24 NA 4.24
5 1.16 1.13 NA NA 1.13
6 1.18 NA NA NA 1.18
7 1.44 NA 9.17 NA 9.17
Or more general from https://github.com/tidyverse/funs/issues/54#issuecomment-892377998
df %>%
mutate(V5 = do.call(coalesce, rev(across(-V5))))
or https://github.com/tidyverse/funs/issues/54#issuecomment-1096449488
df %>%
mutate(V5 = coalesce(!!!rev(select(., -V5))))
update non-missing values based on most recent date
A data.table
option
setDT(data)[, employ := last(na.omit(employ[order(year)])), id]
gives
id year employ
1: 1 2010 yes
2: 1 2011 yes
3: 2 2010 yes
4: 2 2011 yes
5: 3 2010 no
6: 3 2011 no
7: 4 2010 yes
8: 4 2011 yes
9: 5 2010 no
10: 5 2011 no
A dplyr
way might be
data %>%
group_by(id) %>%
mutate(employ = last(na.omit(employ[order(year)])))
which gives
id year employ
<dbl> <dbl> <chr>
1 1 2010 yes
2 1 2011 yes
3 2 2010 yes
4 2 2011 yes
5 3 2010 no
6 3 2011 no
7 4 2010 yes
8 4 2011 yes
9 5 2010 no
10 5 2011 no
Related Topics
Rename Columns by Pattern in R
Remove Columns of Dataframe Based on Conditions in R
Blend of Na.Omit and Na.Pass Using Aggregate
Scatterplot3D: Regression Plane with Residuals
Dplyr - Mutate Dynamically Named Variables Using Other Dynamically Named Variables
Use Dplyr to Concatenate a Column
Ggplot Bar Plot Side by Side Using Two Variables
Removing One Table from Another in R
R - Unable to Install R Packages - Cannot Open the Connection
R Shiny Ggplot Bar and Line Charts with Dynamic Variable Selection and Y Axis to Be Percentages
Factor with Comma and Percentage to Numeric
R: Formatting Plotly Hover Text
Flattening a Delimited Composite Column
How to Convert Numeric Values to Time Without the Date
Convergence Error for Development Version of Lme4
Ddply + Summarize for Repeating Same Statistical Function Across Large Number of Columns
How to Run a High Pass or Low Pass Filter on Data Points in R