R - Replace specific value contents with NA
Since you're already using tidyverse functions, you can easily use na_if
from dplyr
within your pipes.
For example, I have a dataset where 999 is used to fill in a non-answer:
df <- tibble(
alpha = c("a", "b", "c", "d", "e"),
val1 = c(1, 999, 3, 8, 999),
val2 = c(2, 8, 999, 1, 2))
If I wanted to change val1
so 999 is NA, I could do:
df %>%
mutate(val1 = na_if(val1, 999))
In your case, it sounds like you want to replace a value across multiple variables, so using across
for multiple columns would be more appropriate:
df %>%
mutate(across(c(val1, val2), na_if, 999)) # or val1:val2
replaces all instances of 999 in both val1
and val2
with NA
and now looks like this:
# A tibble: 5 x 3
alpha val1 val2
<chr> <dbl> <dbl>
1 a 1. 2.
2 b NA 8.
3 c 3. NA
4 d 8. 1.
5 e NA 2.
How to replace certain values in a specific rows and columns with NA in R?
Since your data structure is 2 dimensional, you can find the indices of the rows containing a specific value first and then use this information.
which(DF$Fruits == "Pineapple")
[1] 3
DF$Weight[which(DF$Fruits == "Pineapple")] <- NA
You should be aware of that which
will return a vector, so if you have multiple fruits called "Pineapple" then the previous command will return all indices of them.
Replacing character values with NA in a data frame
This:
df[df == "foo"] <- NA
Replace a value NA with the value from another column in R
Perhaps the easiest to read/understand answer in R lexicon is to use ifelse. So borrowing Richard's dataframe we could do:
df <- structure(list(A = c(56L, NA, NA, 67L, NA),
B = c(75L, 45L, 77L, 41L, 65L),
Year = c(1921L, 1921L, 1922L, 1923L, 1923L)),.Names = c("A",
"B", "Year"), class = "data.frame", row.names = c(NA, -5L))
df$A <- ifelse(is.na(df$A), df$B, df$A)
replacing specific values with NA using na_if
na_if
works on vectors, not data.frame, thus your first attempt using mutate
would be most correct. Furthermore, it compares exact values to replace with NA
.
However, your very large values are only displayed with 15 digits; I suspect there are a lot, lot more. Therefore, no values are matched exactly to your conditional (y
). This is a common problem when trying to exactly compare to real values.
Also note that you are trying to compare the two values. Which is largest?
9.969210e+36
9.96920996838687e+36
You can do it quickly by:
df %>%> mutate(
IMD=ifelse(IMD > 9e36, NA, IMD),
CRU=ifelse(CRU > 9e36, NA, CRU)
)
or create a function as,
na_when_larger <- function(x, y) {
x[x > y] <- NA
x
}
df %>% mutate_at(vars(IMD, CRU), na_when_larger, 9.96e+36)
(try typing na_if
into the console without parenthesis).
Replace a value in a data frame based on a conditional (`if`) statement
Easier to convert nm to characters and then make the change:
junk$nm <- as.character(junk$nm)
junk$nm[junk$nm == "B"] <- "b"
EDIT: And if indeed you need to maintain nm as factors, add this in the end:
junk$nm <- as.factor(junk$nm)
Replacing values from a column using a condition in R
# reassign depth values under 10 to zero
df$depth[df$depth<10] <- 0
(For the columns that are factors, you can only assign values that are factor levels. If you wanted to assign a value that wasn't currently a factor level, you would need to create the additional level first:
levels(df$species) <- c(levels(df$species), "unknown")
df$species[df$depth<10] <- "unknown"
Override bad/wrong values in a main table with NA or null values listed on another lookup table in R
Here’s a solution using dplyr::semi_join()
and dplyr::anti_join()
to split your dataframe based on whether the id and date keys match your lookup table. I then assign NA
s in just the subset with matching keys, then row-bind the subsets back together. Note that this solution doesn’t preserve the original row order.
library(dplyr)
table_ok_vals <- table %>%
anti_join(lookup, by = c("session_id", "datetime"))
table_replaced_vals <- table %>%
semi_join(lookup, by = c("session_id", "datetime")) %>%
mutate(CaloriesDaily = NA_real_)
table <- bind_rows(table_ok_vals, table_replaced_vals)
table
Output:
session_id datetime CaloriesDaily
1 1233815059 2016-05-01 5555
2 8583815123 2016-05-03 4444
3 8512315059 2016-05-04 2432
4 1503960366 2016-05-20 0
5 1583815059 2016-05-19 2343
6 8586545059 2016-05-20 1111
7 1290855005 2016-05-11 5425
8 1253242879 2016-04-25 1234
9 1111111111 2016-05-09 6542
10 8583815059 2016-05-12 NA
11 6290855005 2016-05-10 NA
12 8253242879 2016-04-30 NA
Replace contents of factor column in R dataframe
I bet the problem is when you are trying to replace values with a new one, one that is not currently part of the existing factor's levels:
levels(iris$Species)
# [1] "setosa" "versicolor" "virginica"
Your example was bad, this works:
iris$Species[iris$Species == 'virginica'] <- 'setosa'
This is what more likely creates the problem you were seeing with your own data:
iris$Species[iris$Species == 'virginica'] <- 'new.species'
# Warning message:
# In `[<-.factor`(`*tmp*`, iris$Species == "virginica", value = c(1L, :
# invalid factor level, NAs generated
It will work if you first increase your factor levels:
levels(iris$Species) <- c(levels(iris$Species), "new.species")
iris$Species[iris$Species == 'virginica'] <- 'new.species'
If you want to replace "species A" with "species B" you'd be better off with
levels(iris$Species)[match("oldspecies",levels(iris$Species))] <- "newspecies"
How do I replace a cell's contents when it matches a specific string?
An easier way may be to not let "NA" into the data in the first place. E.g., you can call
library(readxl)
readxl::read_excel(path, na = "NA")
and it will convert all the "NA" to NA
. read_delim
, read_csv
and related also have similar options.
Related Topics
Update a Ggplot Using a for Loop (R)
Contrast Between Label and Background: Determine If Color Is Light or Dark
How to Configure R-3.1.2 with --Enable-R-Shlib
Extract Certain Files from .Zip
R - How to Get a Value of a Multi-Dimensional Array by a Vector of Indices
Create an Arrow with Gradient Color
How to Keep My Subtitles When I Use Ggplotly()
Connect R and Vertica Using Rodbc
How to 'Compress' an Lm() Object for Later Prediction
Converting to Date in a Character Column That Contains Two Date Formats
Histogram Conditional Fill Color
How to Minimize Size of Object of Class "Lm" Without Compromising It Being Passed to Predict()
Sum Specific Columns Among Rows
Knitr: Opts_Chunk$Set() Not Working in Rscript Command
De-Aggregate/Reverse-Summarise/Expand a Dataset in R