Replacing character values with NA in a data frame
This:
df[ df == "foo" ] <- NA
Replacing specific values with NA in a dataframe
Try this. The issue is because of the date variable. Using dplyr
you can have:
library(dplyr)
#Code
new <- df %>% mutate(across(everything(),~as.character(.))) %>%
replace(.=='*',NA) %>%
mutate(time=as.Date(time))
Output:
time x y
1 2021-01-08 1 <NA>
2 2021-01-09 2 2
3 2021-01-10 3 3
4 2021-01-11 <NA> 4
5 2021-01-12 <NA> 5
The base R
way:
#Base R
df[-1][df[-1]=='*']<-NA
Output:
time x y
1 2021-01-08 1 <NA>
2 2021-01-09 2 2
3 2021-01-10 3 3
4 2021-01-11 <NA> 4
5 2021-01-12 <NA> 5
R - Replace specific value contents with NA
Since you're already using tidyverse functions, you can easily use na_if
from dplyr
within your pipes.
For example, I have a dataset where 999 is used to fill in a non-answer:
df <- tibble(
alpha = c("a", "b", "c", "d", "e"),
val1 = c(1, 999, 3, 8, 999),
val2 = c(2, 8, 999, 1, 2))
If I wanted to change val1
so 999 is NA, I could do:
df %>%
mutate(val1 = na_if(val1, 999))
In your case, it sounds like you want to replace a value across multiple variables, so using across
for multiple columns would be more appropriate:
df %>%
mutate(across(c(val1, val2), na_if, 999)) # or val1:val2
replaces all instances of 999 in both val1
and val2
with NA
and now looks like this:
# A tibble: 5 x 3
alpha val1 val2
<chr> <dbl> <dbl>
1 a 1. 2.
2 b NA 8.
3 c 3. NA
4 d 8. 1.
5 e NA 2.
Replacing N/A in a data-frame with a value of choice
In fact, your problems are not due to non working replacement, but to the fact that zacko
is a factor.
Regarding your first attempt: despite the warning, the attempt works correctly and replaces the NA's with "REPLACEMENT" (but see explanation about factors below!). The new syntax is a little different, to use list
instead of funs
, you have to use tilde like this:
exampleDF %>% mutate_if(is.character, list(~ ifelse(is.na(.), "REPLACEMENT", .)))
The other one also works... or rather, would work, if zacko
was a character vector. Apparently (I don't know it for sure, because you chose not to use dput
to give us your example data) exampleDF$zacko
is a factor. If you try to enter a value in a factor if that value is not one of the levels, you get this error:
> x <- factor(c("a", "b", "c"))
> x[1] <- "REPLACEMENT"
Warning message:
In `[<-.factor`(`*tmp*`, 1, value = "REPLACEMENT") :
invalid factor level, NA generated
> x
[1] <NA> b c
Levels: a b c
So you did replace it, but since it was a factor, and REPLACEMENT was not one of the levels, it has been replaced again by NA
. Try this:
exampleDF$zacko <- as.character(exampleDF$zacko)
Your code should now work fine. Alternatively, if you want to keep it as a factor, add "FRUSTRATION" to the levels of zacko
:
levels(exampleDF$zacko) <- c(levels(exampleDF$zacko), "FRUSTRATION")
Note also that by default, data.frame
turns character vectors into factors:
> foo <- data.frame(zacko=letters[1:5])
> foo$zacko
[1] a b c d e
Levels: a b c d e
This is a very annoying and dangerous behavior. You don't want that! That is why many users of R set the following in their profiles:
options(stringsAsFactors=FALSE)
A tibble or data table does not behave like that:
> foo <- tibble(zacko=letters[1:5])
> foo$zacko
[1] "a" "b" "c" "d" "e"
Finally, in this simple case I would probably just use good old base R:
exampleDF$zacko[ is.na(exampleDF$zacko) ] <- "REPLACEMENT"
Replace a character ? with NA in R
As suggested by Allen Cameron, you can use as.numeric
. I will simply show you how to apply that to the columns (since you said it was a large database).
Example data
# A tibble: 5 × 3
id values values_2
<int> <chr> <chr>
1 1 78 50
2 2 � �
3 3 64 �
4 4 23 20
5 5 F Random
df %>%
mutate(across(2:3, ~ as.numeric(.x)))
# A tibble: 5 × 3
id values values_2
<int> <dbl> <dbl>
1 1 78 50
2 2 NA NA
3 3 64 NA
4 4 23 20
5 5 NA NA
Rowwise mean()
calculations, without the irrelevant id
column
df %>%
mutate(across(2:3, ~ as.numeric(.x))) %>%
rowwise() %>%
mutate(mean = mean(c_across(2:3), na.rm = TRUE))
# A tibble: 5 × 4
# Rowwise:
id values values_2 mean
<int> <dbl> <dbl> <dbl>
1 1 78 50 64
2 2 NA NA NaN
3 3 64 NA 64
4 4 23 20 21.5
5 5 NA NA NaN
Replace whole value by NA if specific character is found
The na_if
wouldn't take more than one element in y
. We can create a logical vector in replace
to replace the values to NA
. For multiple columns, use across
library(dplyr)
data <- data %>%
mutate(across(c(name, eye_color),
~ replace(., . %in% c("Luke Skywalker", "unknown"), NA)))
For partial match, use a regex
in str_detect
or grepl
library(stringr)
data <- data %>%
mutate(across(c(name, eye_color),
~ replace(., str_detect(., "sky|unkn"), NA)))
Find and replace values by NA for all columns in DataFrame
We can use dplyr
to replace
the 'NULL'
values in all the columns and then convert the type of the columns with type.convert
. Currently, all the columns are factor
class (assuming that 'Age/Tenure' should be numeric/integer
class)
library(dplyr)
res <- df %>%
mutate_all(funs(type.convert(as.character(replace(., .=='NULL', NA)))))
str(res)
#'data.frame': 7 obs. of 3 variables:
#$ Age : int 90 56 51 NA 67 NA 51
#$ Sex : Factor w/ 3 levels "Female","male",..: 3 1 NA 2 NA 1 3
#$ Tenure: int 2 NA 3 4 3 3 4
How do I change the NA string to actual NA for all columns in my Data?
dplyr::na_if()
should do the trick:
df <- tibble( x = c('A', 'NA', 'C'),
y = c('D', 'E', 'NA'),
z = c('NA', 'NA', 'I' ))
na_if(df, 'NA')
Conditionally replace values with NA in R
It is case where the column is factor
. Convert to character
and it should work
library(dplyr)
have %>%
mutate(gender = as.character(gender),
gender = replace(gender, gender == "I Do Not Wish to Disclose", NA))
The change in values in gender
is when it gets coerced to its integer storage values
as.integer(factor(c("Male", "Female", "Male")))
Related Topics
What Specifically Are the Dangers of Eval(Parse(...))
Drop Data Frame Columns by Name
Convert a List to a Data Frame
Left Align Two Graph Edges (Ggplot)
Gather Multiple Sets of Columns
Linear Regression and Group by in R
Why Does Summarize or Mutate Not Work With Group_By When I Load 'Plyr' After 'Dplyr'
Split Column At Delimiter in Data Frame
Pass a Data.Frame Column Name to a Function
Extract Row Corresponding to Minimum Value of a Variable by Group
Unique Combination of All Elements from Two (Or More) Vectors
How to Read Data When Some Numbers Contain Commas as Thousand Separator
How to Show Code But Hide Output in Rmarkdown
Add Legend to Geom_Line() Graph in R