dplyr::first() to choose first non NA value
Use na.omit
, compare:
first(c(NA, 11, 22))
# [1] NA
first(na.omit(c(NA, 11, 22)))
# [1] 11
Using example data:
d %>%
mutate(
value = case_when(
group == 2 & year ==2000 ~ NA_integer_,
group == 3 & year ==2002 ~ NA_integer_,
TRUE ~ value))%>%
group_by(group) %>%
mutate(
first = dplyr::first(na.omit(value)),
last = dplyr::last(na.omit(value)))
# # A tibble: 9 x 5
# # Groups: group [3]
# group year value first last
# <int> <dbl> <int> <int> <int>
# 1 1 2000 3 3 4
# 2 1 2001 8 3 4
# 3 1 2002 4 3 4
# 4 2 2000 NA 9 1
# 5 2 2001 9 9 1
# 6 2 2002 1 9 1
# 7 3 2000 5 5 9
# 8 3 2001 9 5 9
# 9 3 2002 NA 5 9
How to select only first non NA value of each group in R?
A dplyr
alternative. Assuming that by "first" you simply mean the first row, in the order given, by group.
Note that (Id, VISIT) in your example data gives 2 groups for Baseline
.
library(dplyr)
mydata %>%
group_by(Id, VISIT) %>%
filter(!is.na(Score)) %>%
slice(1) %>%
ungroup()
Result:
# A tibble: 5 x 3
Id VISIT Score
<dbl> <chr> <dbl>
1 1 Baseline 2
2 1 Screeing 1
3 1 Week 9 78
4 2 Baseline 5
5 2 Week 2 3
Select first non-NA value by row
tidyverse
library(dplyr)
mutate(df, E = coalesce(A, B, C, D))
# id A B C D E
# 1 1 6 3 4 4 6
# 2 2 NA 2 3 3 2
# 3 3 NA NA 5 1 5
# 4 4 4 NA NA 2 4
coalesce
is effectively "return the first non-NA
in each vector". It has a SQL equivalent (or it is an equivalent of SQL's COALESCE
, actually).
base R
df$E <- apply(df[,-1], 1, function(z) na.omit(z)[1])
df
# id A B C D E
# 1 1 6 3 4 4 6
# 2 2 NA 2 3 3 2
# 3 3 NA NA 5 1 5
# 4 4 4 NA NA 2 4
na.omit
removes all of the NA
values, and [1]
makes sure we always return just the first of them. The advantage of [1]
over (say) head(., 1)
is that head
will return NULL
if there are no non-NA
elements, whereas .[1]
will always return at least an NA
(indicating to you that it was the only option).
Select first non-NA value using R
We can use first
on the non-NA elements after grouping
library(dplyr)
df <- df %>%
group_by(ID) %>%
mutate(value = first(test[complete.cases(test)]))
Get the first non-NA element in a row
One dplyr
option could be:
df %>%
mutate_all(~ replace(., . == "-1", NA_integer_)) %>%
transmute(tracc = coalesce(!!!.))
tracc
1 1
2 1
3 1
4 1
5 1
6 1
7 1
8 1
9 3
10 1
An option since dplyr 1.0.0
could be:
df %>%
transmute(tracc = Reduce(coalesce, across(everything(), ~ replace(., . == "-1", NA_integer_))))
Fill NAs with either last or next non NA value in R
Here is an answer that would match your expected output exactly: it will impute to the nearest non-missing value, either upward or downward.
Here is the code, using a spiced up version of your example:
library(tidyverse)
df = structure(list(id = c("E1", "E2", "E2", "E2", "E2", "E3", "E3", "E3", "E4", "E4", "E4", "E4", "E4", "E4", "E4", "E4", "E5", "E5"),
year = c(2000L, 2000L, 2001L, 2003L, 2005L, 1999L, 2001L, 2003L, 2004L, 2005L, 2006L, 2007L, 2008L, 2009L, 2018L, 2019L, 2002L, 2003L),
pop = c(NA, NA, NA, 120L, 125L, 115L, 300L, NA, 10L, NA, NA, NA, NA, 9L, NA, 8L, 12L, 80L),
pop_exp = c(NA, 120L, 120L, 120L, 125L, 115L, 300L, 300L, 10L, 10L, 10L, 9L, 9L, 9L, 9L, 8L, 12L, 80L)),
class = "data.frame", row.names = c(NA, -18L))
fill_nearest = function(x){
keys=which(!is.na(x))
if(length(keys)==0) return(NA)
b = map_dbl(seq.int(x), ~keys[which.min(abs(.x-keys))])
x[b]
}
df %>%
group_by(id) %>%
arrange(id, year) %>%
mutate(pop_imputated = fill_nearest(pop)) %>%
ungroup()
#> # A tibble: 18 x 5
#> id year pop pop_exp pop_imputated
#> <chr> <int> <int> <int> <int>
#> 1 E1 2000 NA NA NA
#> 2 E2 2000 NA 120 120
#> 3 E2 2001 NA 120 120
#> 4 E2 2003 120 120 120
#> 5 E2 2005 125 125 125
#> 6 E3 1999 115 115 115
#> 7 E3 2001 300 300 300
#> 8 E3 2003 NA 300 300
#> 9 E4 2004 10 10 10
#> 10 E4 2005 NA 10 10
#> 11 E4 2006 NA 10 10
#> 12 E4 2007 NA 9 9
#> 13 E4 2008 NA 9 9
#> 14 E4 2009 9 9 9
#> 15 E4 2018 NA 9 9
#> 16 E4 2019 8 8 8
#> 17 E5 2002 12 12 12
#> 18 E5 2003 80 80 80
Created on 2021-05-13 by the reprex package (v2.0.0)
As I had to use a purrr
loop, it might get a bit slow in a huge dataset though.
EDIT: I suggested to add this option in tidyr::fill()
: https://github.com/tidyverse/tidyr/issues/1119. The issue also contains a tweaked version of this function to use the year
column as the reference to calculate the "distance" between the values. For instance, you would rather have row 15 as 8 than as 9 because the year is much closer.
Find the index position of the first non-NA value in an R vector?
Use a combination of is.na
and which
to find the non-NA index locations.
NonNAindex <- which(!is.na(z))
firstNonNA <- min(NonNAindex)
# set the next 3 observations to NA
is.na(z) <- seq(firstNonNA, length.out=3)
How to get the first and last non-Inf, non-NaN, non-NA, non-0 value from the variable?
If all your values are positive, you can use df$data > 0
as a condition and then you only have to handle Infinite
, i.e.
i1 <- which(df$data > 0 & !is.infinite(df$data))
df$data[i1[1]]
#[1] 100
df$data[i1[length(i1)]]
#[1] 430
In case you also have negative values, you can switch the condition from greater than, to not-equal, (compliment of @markus)
i1 <- which(df$data != 0 & !is.infinite(df$data))
R dplyr replace missing column data with first non-missing value
Here's another approach, using rowwise()
in combination with across()
.
- We are using
rowwise
because it helps in using a row as a single vector throughcur_data()
across(everything(), ~)
helps us in mutating all columns at oncemax.col(cur_data() != 'dropped', ties.method = 'last')
will retrieve last column index where the value!= 'dropped'
- we store its column name in a temp variable say
x
- lastly we use
if()..else
from base R to mutate only those columns where value isdropped
Hope the answer is clear enough
library(tidyverse)
otu_table %>% rowwise() %>%
mutate(across(everything(), ~ {x<- names(cur_data())[max.col(cur_data() != 'dropped', ties.method = 'last')];
if (. == 'dropped') paste0('unidentified ', get(x)) else . }))
#> # A tibble: 21 x 4
#> # Rowwise:
#> domain class order species
#> <chr> <chr> <chr> <chr>
#> 1 Eukaryota unidentified Eukaryo~ unidentified Eukaryo~ unidentified Eukaryota
#> 2 Eukaryota unidentified Eukaryo~ unidentified Eukaryo~ unidentified Eukaryota
#> 3 Eukaryota unidentified Eukaryo~ unidentified Eukaryo~ unidentified Eukaryota
#> 4 Eukaryota unidentified Eukaryo~ unidentified Eukaryo~ unidentified Eukaryota
#> 5 Eukaryota unidentified Eukaryo~ unidentified Eukaryo~ unidentified Eukaryota
#> 6 Eukaryota unidentified Eukaryo~ unidentified Eukaryo~ unidentified Eukaryota
#> 7 Eukaryota Hexanauplia Calanoida unidentified Calanoida
#> 8 Eukaryota unidentified Eukaryo~ unidentified Eukaryo~ unidentified Eukaryota
#> 9 Eukaryota Dinophyceae Syndiniales unidentified Syndinial~
#> 10 Animals Polychaeta Terebellida unidentified Terebelli~
#> # ... with 11 more rows
Created on 2021-06-19 by the reprex package (v2.0.0)
Related Topics
Dply: Order Columns Alphabetically in R
Unimplemented Type List When Trying to Write.Table
Plot Mixed Effects Model in Ggplot
Put Multiple Data Frames into List (Smart Way)
Fastest Way to Detect If Vector Has at Least 1 Na
R: in Rstudio How to Make Knitr Output to a Different Folder to Avoid Cluttering Up My Drive
Quickly Remove Zero Variance Variables from a Data.Frame
Understanding the Differences Between Mclapply and Parlapply in R
Use R to Convert PDF Files to Text Files for Text Mining
Draw a Chronological Timeline with Ggplot2
Name Columns Within Aggregate in R
Controlling the 'Alpha' Level in a Ggplot2 Legend
How to Modify This Correlation Matrix Plot