Replace All Na with False in Selected Columns in R

Replace all NA with FALSE in selected columns in R

If you want to do the replacement for a subset of variables, you can still use the is.na(*) <- trick, as follows:

df[c("x1", "x2")][is.na(df[c("x1", "x2")])] <- FALSE

IMO using temporary variables makes the logic easier to follow:

vars.to.replace <- c("x1", "x2")
df2 <- df[vars.to.replace]
df2[is.na(df2)] <- FALSE
df[vars.to.replace] <- df2

How to replace NA values in a table for selected columns

You can do:

x[, 1:2][is.na(x[, 1:2])] <- 0

or better (IMHO), use the variable names:

x[c("a", "b")][is.na(x[c("a", "b")])] <- 0

In both cases, 1:2 or c("a", "b") can be replaced by a pre-defined vector.

Replace only some NA values for selected rows and for only a column in R

df$type[!df$Asked & is.na(df$type)] <- "Replies" gets you to your desired table:

> type <-
+ c(NA, rep("Question",3), NA, NA, rep("Answer",4), rep(NA, 3), rep("Answer",2),
+ NA, "Question", NA, rep("Answer",2), NA,NA)
> Asked <- c(
+ T, rep(F, 9), T, rep(F, 4), T, rep(F, 4), T,F
+ )
> df <- data.frame(title = 1:22, comments = 1:22, type, Asked)
> df$type[!df$Asked & is.na(df$type)] <- "Replies"
> df
title comments type Asked
1 1 1 <NA> TRUE
2 2 2 Question FALSE
3 3 3 Question FALSE
4 4 4 Question FALSE
5 5 5 Replies FALSE
6 6 6 Replies FALSE
7 7 7 Answer FALSE
8 8 8 Answer FALSE
9 9 9 Answer FALSE
10 10 10 Answer FALSE
11 11 11 <NA> TRUE
12 12 12 Replies FALSE
13 13 13 Replies FALSE
14 14 14 Answer FALSE
15 15 15 Answer FALSE
16 16 16 <NA> TRUE
17 17 17 Question FALSE
18 18 18 Replies FALSE
19 19 19 Answer FALSE
20 20 20 Answer FALSE
21 21 21 <NA> TRUE
22 22 22 Replies FALSE

R Replace NA for all Columns Except *

You can use mutate_at :

library(dplyr)

Remove them by Name

df %>% mutate_at(vars(-c(Date, thatCol)), ~replace(., is.na(.), 0))

Remove them by position

df %>% mutate_at(-c(1,4), ~replace(., is.na(.), 0))

Select them by name

df %>% mutate_at(vars(col1, thisCol, col999), ~replace(., is.na(.), 0))

Select them by position

df %>% mutate_at(c(2, 3, 5), ~replace(., is.na(.), 0))

If you want to use replace_na

df %>% mutate_at(vars(-c(Date, thatCol)), tidyr::replace_na, 0)

Note that mutate_at is soon going to be replaced by across in dplyr 1.0.0.

How to process NA as False in R

For me, I'd think the most beneficial way would be to use a dplyr's case_when function and explicitly state how the NA cases you mention should be handled.

Replicating your example (notice that I'm explicitly setting the NAs here. Your NAs were the result of R not being able to handle a character string ("NA") within a numeric vector.

col1 = as.numeric(c(10, 2, 15, 2, NA_real_, 15))
col2 = as.numeric(c(15, 15, 2, 2, 15, NA_real_))
test <- data.frame(col1, col2)

For both the mutate function and case_when function I'm loading dplyr. If you're not familiar with case_when it's like a ifelse with multiple conditionals. Each conditional is followed by a "~" tilde. What comes after the tilde is what gets assigned if the conditional is met. To set "everything else" as some value X you type TRUE ~ "x" as that obviously gets evaluated as true for all the other cases that have not been met in the previous conditionals.

This should do what you want:

library(dplyr)

test <- mutate(.data = test,
G5 = case_when(col1 > 5 & col2 > 5 ~ "Yes", #Original
(is.na(col1) & col2 > 5) | (col1 > 5 & is.na(col2)) ~ "Yes",
TRUE ~ "No")) # Everything else gets the value "No"

test
#> col1 col2 G5
#> 1 10 15 Yes
#> 2 2 15 No
#> 3 15 2 No
#> 4 2 2 No
#> 5 NA 15 Yes
#> 6 15 NA Yes

Replace NA with interpolated value for specific column fields in r

It is specified in the ?na.approx

An object of similar structure as object with NAs replaced by interpolation. For na.approx only the internal NAs are replaced and leading or trailing NAs are omitted if na.rm = TRUE or not replaced if na.rm = FALSE.

By default, the na.approx uses na.rm = TRUE

na.approx(object, x = index(object), xout, ..., na.rm = TRUE, maxgap = Inf, along)


Thus, we can change the code to

my_data[, 42] <- na.approx(my_data[, 42], na.rm = FALSE)

In a large dataset, it is possible to have leading/lagging NAs and using the OP's code results in an output vector with less number of elements as na.rm = TRUE, which triggers the length difference error in replacement

Replacing values of selected columns based on another dataframe with different size

Data:

dfa <- read.table(text="Accession Column1 Column2 Column3 Root ID
2000_1 0 0.2 14 2000 1
2000_2 0.01 0.2 17 2000 2
2001_1 0.012 0.22 11 2001 1
2001_2 0.011 0.231 17 2001 2", header = T)

Libraries and Functions:

library(tidyverse)

cv <- function(x) 100 * (sd(x) / mean(x))

Solution:

If we cut to the chase and consider the end result, basically you want to replace the values in Column1:Column3 with NA if CV is greater than 30. Otherwise, you want to preserve the original values. The code below does that.

dfa %>% 
group_by(Root) %>%
mutate_at(vars(Column1:Column3),
list(~ if(cv(.) > 30) NA else .))

Result:

#> # A tibble: 4 x 6
#> Accession Column1 Column2 Column3 Root ID
#> <fct> <dbl> <dbl> <dbl> <int> <int>
#> 1 2000_1 NA 0.2 14 2000 1
#> 2 2000_2 NA 0.2 17 2000 2
#> 3 2001_1 0.012 0.22 NA 2001 1
#> 4 2001_2 0.011 0.231 NA 2001 2




More complicated approaches:

If we want to follow your train of thoughts, then we'll end up with a more complicated code which is illustrated below;

dfa %>% 
select_if(function(col) is.numeric(col) & all(col != .$ID)) %>%
group_by(Root) %>%
summarise_each(list(cv)) %>%
mutate_at(vars(Column1:Column3),
list(~ ifelse(. > 30, NA, 0))) %>%
left_join(dfa[,c("Root", "ID")], . , by = "Root") %>%
bind_rows(dfa, .) %>%
group_by(Root, ID) %>%
summarise_each(list(~ if(is.numeric(.)) sum(., na.rm = FALSE) else first(.))) %>%
ungroup %>%
select(-ID, -Root, everything())

Explanation:

  1. selecting numeric columns except ID.
  2. grouping by Root.
  3. Calculating CV for all the columns.
  4. Replacing CV values greater than 30 with NA and the rest with 0. I am planing to sum these with the original values as it seems that OP is interested in preserving the NAs (i.e. greater than 30) from this CV matrix but keep the other values unchanged in the original dataset. So summing with 0 keep the latter unchanged while those NAs (na.rm = F) will affect the values.
  5. Adding ID column back by joining to make the CV matrix the same size (rowwise) of the original dataset. Moreover, it will be used for grouping later.
  6. Binding the datasets by rows.
  7. grouping by Root and ID.
  8. Summarizing numeric columns (i.e. Column1, Column2, etc.) by summing the values from original dataframe and modified CV matrix and keeping the first value from other columns (since the original dataframe was first in bind_rows that means preserving the original values).
  9. Ungrouping to avoid future conflicts.
  10. Rearranging columns in the order that OP presented.

Another solution would be very similar to what's above, but instead of joining to get the ID column and expand the CV matrix, one can preserve them from the beginning by summarizing as list column and later unnesting them.

dfa %>% 
mutate(ID = as.factor(ID)) %>%
group_by(Root) %>%
summarise_each(list(~ if(is.numeric(.)) cv(.) else list(.))) %>%
mutate_at(vars(Column1:Column3),
list(~ ifelse(. > 30, NA, 0))) %>%
unnest(cols = c(Accession, ID)) %>%
mutate(ID = as.integer(ID)) %>%
bind_rows(dfa, .) %>%
group_by(Root, ID) %>%
summarise_each(list(~ if(is.numeric(.)) sum(., na.rm = FALSE) else first(.))) %>%
ungroup %>%
select(-ID, -Root, everything())


Related Topics



Leave a reply



Submit