R - Replace Specific Value Contents with Na

R - Replace specific value contents with NA

Since you're already using tidyverse functions, you can easily use na_if from dplyr within your pipes.

For example, I have a dataset where 999 is used to fill in a non-answer:

df <- tibble(
alpha = c("a", "b", "c", "d", "e"),
val1 = c(1, 999, 3, 8, 999),
val2 = c(2, 8, 999, 1, 2))

If I wanted to change val1 so 999 is NA, I could do:

df %>% 
mutate(val1 = na_if(val1, 999))

In your case, it sounds like you want to replace a value across multiple variables, so using across for multiple columns would be more appropriate:

df %>%
mutate(across(c(val1, val2), na_if, 999)) # or val1:val2

replaces all instances of 999 in both val1 and val2 with NA and now looks like this:

# A tibble: 5 x 3
alpha val1 val2
<chr> <dbl> <dbl>
1 a 1. 2.
2 b NA 8.
3 c 3. NA
4 d 8. 1.
5 e NA 2.

How to replace certain values in a specific rows and columns with NA in R?

Since your data structure is 2 dimensional, you can find the indices of the rows containing a specific value first and then use this information.

which(DF$Fruits == "Pineapple")
[1] 3
DF$Weight[which(DF$Fruits == "Pineapple")] <- NA

You should be aware of that which will return a vector, so if you have multiple fruits called "Pineapple" then the previous command will return all indices of them.

Replacing character values with NA in a data frame

This:

df[df == "foo"] <- NA

Replace a value NA with the value from another column in R

Perhaps the easiest to read/understand answer in R lexicon is to use ifelse. So borrowing Richard's dataframe we could do:

df <- structure(list(A = c(56L, NA, NA, 67L, NA),
B = c(75L, 45L, 77L, 41L, 65L),
Year = c(1921L, 1921L, 1922L, 1923L, 1923L)),.Names = c("A",
"B", "Year"), class = "data.frame", row.names = c(NA, -5L))
df$A <- ifelse(is.na(df$A), df$B, df$A)

replacing specific values with NA using na_if

na_if works on vectors, not data.frame, thus your first attempt using mutate would be most correct. Furthermore, it compares exact values to replace with NA.
However, your very large values are only displayed with 15 digits; I suspect there are a lot, lot more. Therefore, no values are matched exactly to your conditional (y). This is a common problem when trying to exactly compare to real values.

Also note that you are trying to compare the two values. Which is largest?

9.969210e+36
9.96920996838687e+36

You can do it quickly by:

df %>%> mutate(
IMD=ifelse(IMD > 9e36, NA, IMD),
CRU=ifelse(CRU > 9e36, NA, CRU)
)

or create a function as,

na_when_larger <- function(x, y) {
x[x > y] <- NA
x
}

df %>% mutate_at(vars(IMD, CRU), na_when_larger, 9.96e+36)

(try typing na_if into the console without parenthesis).

Replace a value in a data frame based on a conditional (`if`) statement

Easier to convert nm to characters and then make the change:

junk$nm <- as.character(junk$nm)
junk$nm[junk$nm == "B"] <- "b"

EDIT: And if indeed you need to maintain nm as factors, add this in the end:

junk$nm <- as.factor(junk$nm)

Replacing values from a column using a condition in R

# reassign depth values under 10 to zero
df$depth[df$depth<10] <- 0

(For the columns that are factors, you can only assign values that are factor levels. If you wanted to assign a value that wasn't currently a factor level, you would need to create the additional level first:

levels(df$species) <- c(levels(df$species), "unknown") 
df$species[df$depth<10] <- "unknown"

Override bad/wrong values in a main table with NA or null values listed on another lookup table in R

Here’s a solution using dplyr::semi_join() and dplyr::anti_join() to split your dataframe based on whether the id and date keys match your lookup table. I then assign NAs in just the subset with matching keys, then row-bind the subsets back together. Note that this solution doesn’t preserve the original row order.

library(dplyr)

table_ok_vals <- table %>%
anti_join(lookup, by = c("session_id", "datetime"))

table_replaced_vals <- table %>%
semi_join(lookup, by = c("session_id", "datetime")) %>%
mutate(CaloriesDaily = NA_real_)

table <- bind_rows(table_ok_vals, table_replaced_vals)

table

Output:

   session_id   datetime CaloriesDaily
1 1233815059 2016-05-01 5555
2 8583815123 2016-05-03 4444
3 8512315059 2016-05-04 2432
4 1503960366 2016-05-20 0
5 1583815059 2016-05-19 2343
6 8586545059 2016-05-20 1111
7 1290855005 2016-05-11 5425
8 1253242879 2016-04-25 1234
9 1111111111 2016-05-09 6542
10 8583815059 2016-05-12 NA
11 6290855005 2016-05-10 NA
12 8253242879 2016-04-30 NA

Replace contents of factor column in R dataframe

I bet the problem is when you are trying to replace values with a new one, one that is not currently part of the existing factor's levels:

levels(iris$Species)
# [1] "setosa" "versicolor" "virginica"

Your example was bad, this works:

iris$Species[iris$Species == 'virginica'] <- 'setosa'

This is what more likely creates the problem you were seeing with your own data:

iris$Species[iris$Species == 'virginica'] <- 'new.species'
# Warning message:
# In `[<-.factor`(`*tmp*`, iris$Species == "virginica", value = c(1L, :
# invalid factor level, NAs generated

It will work if you first increase your factor levels:

levels(iris$Species) <- c(levels(iris$Species), "new.species")
iris$Species[iris$Species == 'virginica'] <- 'new.species'

If you want to replace "species A" with "species B" you'd be better off with

levels(iris$Species)[match("oldspecies",levels(iris$Species))] <- "newspecies"

How do I replace a cell's contents when it matches a specific string?

An easier way may be to not let "NA" into the data in the first place. E.g., you can call

library(readxl)
readxl::read_excel(path, na = "NA")

and it will convert all the "NA" to NA. read_delim, read_csv and related also have similar options.



Related Topics



Leave a reply



Submit