Replace NAs in one variable with values from another variable
One way is to use ifelse
:
DF <- transform(DF, VAR3 = ifelse(!is.na(VAR1), VAR1, VAR2))
where transform
was used to avoid typing DF$
over and over, but maybe you will prefer:
DF$VAR3 <- ifelse(!is.na(DF$VAR1), DF$VAR1, DF$VAR2)
How to replace NAs of a variable with values from another dataframe
Here's a quick solution using data.table
s binary join this will join only gender
with sex
and leave all the rest of the columns untouched
library(data.table)
setkey(setDT(df1), ID)
df1[df2, gender := i.sex][]
# ID gender
# 1: 1 2
# 2: 2 2
# 3: 3 1
# 4: 4 2
# 5: 5 2
# 6: 6 2
# 7: 7 2
# 8: 8 2
# 9: 9 2
# 10: 10 2
# 11: 11 2
# 12: 12 2
# 13: 13 1
# 14: 14 1
# 15: 15 2
# 16: 16 2
# 17: 17 2
# 18: 18 2
# 19: 19 2
# 20: 20 2
# 21: 21 1
# 22: 22 2
# 23: 23 2
# 24: 24 2
# 25: 25 2
# 26: 26 2
# 27: 27 2
# 28: 28 2
# 29: 29 2
# 30: 30 2
How to only replace NA with specific values based on a condition in another variable
Changed the vector to have an %in% statement and added an else statement.
d %>%
mutate(Udd = case_when(is.na(Udd) & Edu < 8 ~ 1,
is.na(Udd) & Edu %in% c(8:11) ~ 2,
is.na(Udd) & Edu > 11 ~ 3,
TRUE ~ Udd))
Creating a function to replace NAs from one data frame with values from another
Functions behave a little differently. It is not a good practice to change dataframes within the function, return the changed dataframe from the function and pass the column name as string.
impute <- function(x) {
df_raw[[x]] <- ifelse(is.na(df_raw[[x]]), miceoutput[[x]][inds],df_raw[[x]])
df_raw
}
df_raw <- impute("PIDS_14")
df_raw
Replace NA with the nearest value based on another variable, while keeping NA for observation which doesn't have non-missing neighbour
One option would be to make use of case_when
from tidyverse
. Essentially, if the previous row has a closer year and is not NA
, then return x
from that row. If not, then choose the row below. Or if the year is closer above but there is an NA
, then return the row below. Then, same for if the row below has a closer year, but has an NA
, then return the row above. If a row does not have an NA
, then just return x
.
library(tidyverse)
dat %>%
mutate(x = case_when(is.na(x) & !is.na(lag(x)) & year - lag(year) < lead(year) - year ~ lag(x),
is.na(x) & !is.na(lead(x)) & year - lag(year) > lead(year) - year ~ lead(x),
is.na(x) & is.na(lag(x)) ~ lead(x),
is.na(x) & is.na(lead(x)) ~ lag(x),
TRUE ~ x))
Output
year x
1 2000 1
2 2001 2
3 2002 3
4 2003 3
5 2005 5
6 2006 5
7 2007 NA
8 2008 9
9 2009 9
10 2010 10
Conditonally replace NA with value from other rows
Your mutate won't work because you did not assign any value to a variable. your mutate()
should look like this mutate(value = unique(value[is.na(value)]))
. Althought this will not be my approach. What I did below was create a look up table of distinct non NA values and then joined them onto the original dataset. valuedis should be the values you want.
temporal <- c("Monday", "Monday", "Tuesday", "Tuesday","Wednesday", "Wednesday", "Thursday", "Thursday", "Friday", "Friday","Monday", "Monday", "Tuesday", "Tuesday","Wednesday", "Wednesday", "Thursday", "Thursday", "Friday", "Friday")
spatial <- c("North", "South","North", "South","North", "South","North", "South","North", "South", "North", "South","North", "South","North", "South","North", "South","North", "South")
value <- c(NA,2,3,4,5,6,7,NA,9,10,1,NA,3,4,5,6,7,8,9,NA)
df <- as.data.frame(cbind(temporal, spatial, value))
library(dplyr)
dfdis <- df %>%
filter(!is.na(value)) %>%
distinct(temporal,spatial,value) %>%
rename(valuedis = value)
df2 <- left_join(df,dfdis, by = c("temporal","spatial"))
Replace a value NA with the value from another column in R
Perhaps the easiest to read/understand answer in R lexicon is to use ifelse. So borrowing Richard's dataframe we could do:
df <- structure(list(A = c(56L, NA, NA, 67L, NA),
B = c(75L, 45L, 77L, 41L, 65L),
Year = c(1921L, 1921L, 1922L, 1923L, 1923L)),.Names = c("A",
"B", "Year"), class = "data.frame", row.names = c(NA, -5L))
df$A <- ifelse(is.na(df$A), df$B, df$A)
Replace NA by value of another variable
You can use sapply
in base R:
mydat[,c("X5","X6")] <- with(mydat, sapply(mydat[8:9],function(x) ifelse(is.na(X6),X5,X6)))
Giving the desired solution:
ItemRelation DocumentNum CalendarYear X1 X2 X3 X4 X5 X6
1 158200 1715 2018 0 0 0 NA 107 107
2 158204 1715 2018 0 0 0 NA 105 105
Explanation:
ifelse
examines whether the X6
value for a given row is NA
, and if so, selects the value of X5
from that row. If X6
is not NA, then just X6
is used.
sapply
allows you to quickly apply this ifelse
function to every row of your data.frame.
with
changes the environment so that you're "within" your mydat
object so that you can refer to its parts without using $
or []
.
Replacing NAs in a column with the values of other column
You can use coalesce
:
library(dplyr)
df1 <- data.frame(Letters, Char, stringsAsFactors = F)
df1 %>%
mutate(Char1 = coalesce(Char, Letters))
Letters Char Char1
1 A a a
2 B b b
3 C <NA> C
4 D d d
5 E <NA> E
Replace missing values (NA) in one data set with values from another where columns match
I would do this:
library(data.table)
setDT(DF1); setDT(DF2)
DF1[DF2, x := ifelse(is.na(x), i.x, x), on=c("y","z")]
which gives
x y z
1: 153 a 1
2: 163 b 1
3: 184 d 1
4: 123 a 2
5: 145 e 2
6: 176 c 2
7: 124 b 1
8: 199 a 2
Comments. This approach isn't so great, since it merges the whole of DF1
, while we only need to merge the subset where is.na(x)
. Here, the improvement looks like (thanks, @Arun):
DF1[is.na(x), x := DF2[.SD, x, on=c("y", "z")]]
This way is analogous to @RHertel's answer.
From @Jakob's comment:
does this work for more than one x variable? If I want to fill up entire datasets with several columns?
You can enumerate the desired columns:
DF1[DF2, `:=`(
x = ifelse(is.na(x), i.x, x),
w = ifelse(is.na(w), i.w, w)
), on=c("y","z")]
The expression could be constructed using lapply
and substitute
, probably, but if the set of columns is fixed, it might be cleanest just to write it out as above.
Related Topics
Str_Extract_All: Return All Patterns Found in String Concatenated as Vector
How to Add My Outlook Email Signature to the Com Object Using Rdcomclient
Possible Issue About Random Number Generator
R Ggplot2: Labeling a Horizontal Line Without Associating the Label with a Series
Create Line Graph with Ggplot2, Using Time Periods as X-Variable
Integrate a Very Peaked Function
Updating a Subset of a Dataframe
Change All Columns from Factor to Numeric in R
How to Create Group Indices for Nested Groups in R
Generate All Combinations, of All Lengths, in R, from a Vector
Unique.Data.Table Select Last Row in Place of the First
Locator Equivalent in Ggplot2 (For Maps)
How to Overlay an Image on to a Ggplot