When trying to replace values, missing values are not allowed in subscripted assignments of data frames
You can use ifelse
, like so
pe94.person$foo <- ifelse(!is.na(pe94.person$H01) & pe94.person$H01 == 12, 0, pe94.person$H03)
check if foo meets your criteria and then go ahead and assign it to pe94.person$H03
directly. I find it safer to assign it a new variable and usually use that in subsequent analysis.
Replace specific value in R
From your error message, it's either your dat$i_huvisfatin_v00
column doesn't contain the value 16527.98, or it already have NA
in the column.
dat$i_huvisfatin_v00 == 16527.98
returns a logical vector, which cannot be treated as index itself if it contains NA
. Use which()
in the row index seems to solve the problem.
dat[which(dat$i_huvisfatin_v00 == 16527.98), "i_huvisfatin_v00"] <- NA
NAs are not allowed in subscripted assignments
You're trying to assign to these three rows:
> match(dat$code, age.and.sex$code)
[1] 1 2 NA
because dat$code
and age.and.sex$code
are not the same length, so the third comparison is NA
.
I'm not sure what you actually mean to be matching, but you might just try subsetting to the first two observations, or na.omit
, etc.
But a better way to join data from two tables is to use a join
.
library(data.table)
dat <- data.table(dat)
setkey(dat,code)
age.and.sex <- data.table(age.and.sex)
setkey(age.and.sex,code)
dat[age.and.sex]
> dat[age.and.sex]
code age sex more i.age i.sex
1: A11 NA m 7 15 m
2: B22 NA f 4 10 f
Note how the columns of the inner table get appended to those of the outer table.
More... Per @joran's suggestion...you can use this technique to fill in missing observations:
joined <- dat[age.and.sex]
joined[is.na(age),age:=i.age] #only replace the value missing from left table
joined[,c("i.age","i.sex"):=NULL]
joined
> joined
code age sex more
1: A11 15 m 7
2: B22 10 f 4
Update to address your comment...just reverse the join. There are some cleverer ways to do this less manually, but this should be simple to follow:
joined <- age.and.sex[dat]
joined[is.na(age),age:=i.age]
joined[is.na(sex),sex:=i.sex]
joined[,c("i.age","i.sex"):=NULL]
> joined
code age sex more
1: A11 15 m 7
2: B22 10 f 4
3: C33 12 m 9
If this technique is to your liking you should definitely read ?data.table
and the related vignette to learn more about joins.
How to fulfill missing cells of a data frame in R?
If you want to modify all cells that are not 20, including other valid values for age, I would do the following:
# Creating a data frame with another valid age
df = data.frame( name= c("Tommy", "John", "Dan","Bob"), age = c(20, NA, NA,12) )
# Substitute values different than 20 for 15
df[df$age!=20 | is.na(df$age),"age"] <- 15
name age
1 Tommy 20
2 John 15
3 Dan 15
4 Bob 15
NAs are not allowed in subscripted assignments
Your logic will need to also exclude NAs in the subset. See the following example. Note the subsets vectors are stored away before x
is modified.
x <- c(1,3,5,7,NA,2,4,6)
subset1 <- x>=5 & !is.na(x)
subset2 <- x<5 & !is.na(x)
x[subset1] <- which(subset1)
x[subset2] <- 10*which(subset2)
Fill in missing values in column with a different dataframe
One option is a join with data.table
library(data.table)
setDT(df1)[df2, Date := i.Date, on = .(Alphabet)]
df1
# Alphabet Date Colour
#1: ABC 2018-09-10 green
#2: DEF 2017-06-11 red
#3: GHI 2016-05-12 blue
#4: JKL 2017-06-07 yellow
#5: MNO 2018-08-03 orange
#6: PQR 2019-10-07 brown
Update
Using the new 'df2n' dataset
i1 <- is.na(df1$Date)|df1$Date %in% "Unknown"
setDT(df1)[df2n[df2n$Alphabet %in% df1$Alphabet[i1],],
Date := i.Date, on = .(Alphabet)]
df1
# Alphabet Date Colour
#1: ABC 2018-09-10 green
#2: DEF 2017-06-11 red
#3: GHI 2016-05-12 blue
#4: JKL 2017-06-07 yellow
#5: MNO 2018-08-03 orange
#6: PQR 2019-10-07 brown
Or using match
from base R
i1 <- match(df2$Alphabet, df1$Alphabet)
df1$Date[i1] <- df2$Date
data
df1 <- structure(list(Alphabet = c("ABC", "DEF", "GHI", "JKL", "MNO",
"PQR"), Date = c("2018-09-10", "2017-06-11", "2016-05-12", NA,
NA, "Unknown"), Colour = c("green", "red", "blue", "yellow",
"orange", "brown")), class = "data.frame", row.names = c(NA,
-6L))
df2 <- structure(list(Alphabet = c("JKL", "MNO", "PQR"), Date = c("2017-06-07",
"2018-08-03", "2019-10-07")), class = "data.frame", row.names = c(NA,
-3L))
df2a <- structure(list(Alphabet = c("JKL", "MNO", "PQR", "STU", "VWX"
), Date = c("2017-06-07", "2018-08-03", "2019-10-07", "2019-11-08",
"2019-12-08")), class = "data.frame", row.names = c(NA, -5L))
Error when replacing a value in a data frame with NAs
The first problem is to specify the variable name correctly, that is with the name and not the value (probably just a typo in your question): "y"
and not "yes"
.
Then another problem arises when you use ==
and it tries to think of what to do with the NA
in the third row:
x=="NS"
[1] TRUE TRUE NA
hmm, should it be kept or not ? It is neither TRUE
nor FALSE
... so it just gives an error as it cannot "decide".
While, using %in%
(which is actually match(x, table, nomatch = 0)
), we get:
x %in% "NS"
[1] TRUE TRUE FALSE
There you go, NA
doesn't match the value "NS"
so it returns 0, or, in logical
, FALSE
: we shouldn't keep it.
Thus, to get what you want:
z[z$x %in% "NS", "y"] <- "a"
z
# x y
#1 NS a
#2 NS a
#3 <NA> b
Related Topics
What Are Some Good Books, Web Resources, and Projects for Learning R
Delete Rows with Blank Values in One Particular Column
Options for Deploying R Models in Production
Which Library Could Be Used to Make a Chord Diagram in R
Effectively Debugging Shiny Apps
How to Calculate the 95% Confidence Interval for the Slope in a Linear Regression Model in R
Scatterplot with Alpha Transparent Histograms in R
The Difference Between Domc and Doparallel in R
How and When Should I Use On.Exit
Error: --With-Readline=Yes (Default) and Headers/Libs Are Not Available
Confidence Intervals for Predictions from Logistic Regression
Screening (Multi)Collinearity in a Regression Model
R Partial Reshape Data from Long to Wide
Running Multiple Linear Regressions Across Several Columns of a Data Frame in R