Replacing Character Values With Na in a Data Frame

Replacing character values with NA in a data frame

This:

df[ df == "foo" ] <- NA

Replacing specific values with NA in a dataframe

Try this. The issue is because of the date variable. Using dplyr you can have:

library(dplyr)
#Code
new <- df %>% mutate(across(everything(),~as.character(.))) %>%
replace(.=='*',NA) %>%
mutate(time=as.Date(time))

Output:

        time    x    y
1 2021-01-08 1 <NA>
2 2021-01-09 2 2
3 2021-01-10 3 3
4 2021-01-11 <NA> 4
5 2021-01-12 <NA> 5

The base R way:

#Base R
df[-1][df[-1]=='*']<-NA

Output:

        time    x    y
1 2021-01-08 1 <NA>
2 2021-01-09 2 2
3 2021-01-10 3 3
4 2021-01-11 <NA> 4
5 2021-01-12 <NA> 5

R - Replace specific value contents with NA

Since you're already using tidyverse functions, you can easily use na_if from dplyr within your pipes.

For example, I have a dataset where 999 is used to fill in a non-answer:

df <- tibble(
alpha = c("a", "b", "c", "d", "e"),
val1 = c(1, 999, 3, 8, 999),
val2 = c(2, 8, 999, 1, 2))

If I wanted to change val1 so 999 is NA, I could do:

df %>% 
mutate(val1 = na_if(val1, 999))

In your case, it sounds like you want to replace a value across multiple variables, so using across for multiple columns would be more appropriate:

df %>%
mutate(across(c(val1, val2), na_if, 999)) # or val1:val2

replaces all instances of 999 in both val1 and val2 with NA and now looks like this:

# A tibble: 5 x 3
alpha val1 val2
<chr> <dbl> <dbl>
1 a 1. 2.
2 b NA 8.
3 c 3. NA
4 d 8. 1.
5 e NA 2.

Replacing N/A in a data-frame with a value of choice

In fact, your problems are not due to non working replacement, but to the fact that zacko is a factor.

Regarding your first attempt: despite the warning, the attempt works correctly and replaces the NA's with "REPLACEMENT" (but see explanation about factors below!). The new syntax is a little different, to use list instead of funs, you have to use tilde like this:

exampleDF %>% mutate_if(is.character, list(~ ifelse(is.na(.), "REPLACEMENT", .)))

The other one also works... or rather, would work, if zacko was a character vector. Apparently (I don't know it for sure, because you chose not to use dput to give us your example data) exampleDF$zacko is a factor. If you try to enter a value in a factor if that value is not one of the levels, you get this error:

> x <- factor(c("a", "b", "c"))
> x[1] <- "REPLACEMENT"
Warning message:
In `[<-.factor`(`*tmp*`, 1, value = "REPLACEMENT") :
invalid factor level, NA generated
> x
[1] <NA> b c
Levels: a b c

So you did replace it, but since it was a factor, and REPLACEMENT was not one of the levels, it has been replaced again by NA. Try this:

exampleDF$zacko <- as.character(exampleDF$zacko)

Your code should now work fine. Alternatively, if you want to keep it as a factor, add "FRUSTRATION" to the levels of zacko:

levels(exampleDF$zacko) <- c(levels(exampleDF$zacko), "FRUSTRATION")

Note also that by default, data.frame turns character vectors into factors:

> foo <- data.frame(zacko=letters[1:5])
> foo$zacko
[1] a b c d e
Levels: a b c d e

This is a very annoying and dangerous behavior. You don't want that! That is why many users of R set the following in their profiles:

options(stringsAsFactors=FALSE)

A tibble or data table does not behave like that:

> foo <- tibble(zacko=letters[1:5])
> foo$zacko
[1] "a" "b" "c" "d" "e"

Finally, in this simple case I would probably just use good old base R:

exampleDF$zacko[ is.na(exampleDF$zacko) ] <- "REPLACEMENT"

Replace a character ? with NA in R

As suggested by Allen Cameron, you can use as.numeric. I will simply show you how to apply that to the columns (since you said it was a large database).

Example data

# A tibble: 5 × 3
id values values_2
<int> <chr> <chr>
1 1 78 50
2 2 � �
3 3 64 �
4 4 23 20
5 5 F Random

df %>%
mutate(across(2:3, ~ as.numeric(.x)))

# A tibble: 5 × 3
id values values_2
<int> <dbl> <dbl>
1 1 78 50
2 2 NA NA
3 3 64 NA
4 4 23 20
5 5 NA NA

Rowwise mean() calculations, without the irrelevant id column

df %>% 
mutate(across(2:3, ~ as.numeric(.x))) %>%
rowwise() %>%
mutate(mean = mean(c_across(2:3), na.rm = TRUE))

# A tibble: 5 × 4
# Rowwise:
id values values_2 mean
<int> <dbl> <dbl> <dbl>
1 1 78 50 64
2 2 NA NA NaN
3 3 64 NA 64
4 4 23 20 21.5
5 5 NA NA NaN

Replace whole value by NA if specific character is found

The na_if wouldn't take more than one element in y. We can create a logical vector in replace to replace the values to NA. For multiple columns, use across

library(dplyr)
data <- data %>%
mutate(across(c(name, eye_color),
~ replace(., . %in% c("Luke Skywalker", "unknown"), NA)))

For partial match, use a regex in str_detect or grepl

library(stringr)
data <- data %>%
mutate(across(c(name, eye_color),
~ replace(., str_detect(., "sky|unkn"), NA)))

Find and replace values by NA for all columns in DataFrame

We can use dplyr to replace the 'NULL' values in all the columns and then convert the type of the columns with type.convert. Currently, all the columns are factor class (assuming that 'Age/Tenure' should be numeric/integer class)

library(dplyr)
res <- df %>%
mutate_all(funs(type.convert(as.character(replace(., .=='NULL', NA)))))
str(res)
#'data.frame': 7 obs. of 3 variables:
#$ Age : int 90 56 51 NA 67 NA 51
#$ Sex : Factor w/ 3 levels "Female","male",..: 3 1 NA 2 NA 1 3
#$ Tenure: int 2 NA 3 4 3 3 4

How do I change the NA string to actual NA for all columns in my Data?

dplyr::na_if() should do the trick:

df <- tibble( x = c('A', 'NA', 'C'), 
y = c('D', 'E', 'NA'),
z = c('NA', 'NA', 'I' ))

na_if(df, 'NA')

Conditionally replace values with NA in R

It is case where the column is factor. Convert to character and it should work

library(dplyr)
have %>%
mutate(gender = as.character(gender),
gender = replace(gender, gender == "I Do Not Wish to Disclose", NA))

The change in values in gender is when it gets coerced to its integer storage values

as.integer(factor(c("Male", "Female", "Male")))


Related Topics



Leave a reply



Submit