Replace all NA with FALSE in selected columns in R
If you want to do the replacement for a subset of variables, you can still use the is.na(*) <-
trick, as follows:
df[c("x1", "x2")][is.na(df[c("x1", "x2")])] <- FALSE
IMO using temporary variables makes the logic easier to follow:
vars.to.replace <- c("x1", "x2")
df2 <- df[vars.to.replace]
df2[is.na(df2)] <- FALSE
df[vars.to.replace] <- df2
How to replace NA values in a table for selected columns
You can do:
x[, 1:2][is.na(x[, 1:2])] <- 0
or better (IMHO), use the variable names:
x[c("a", "b")][is.na(x[c("a", "b")])] <- 0
In both cases, 1:2
or c("a", "b")
can be replaced by a pre-defined vector.
Replace only some NA values for selected rows and for only a column in R
df$type[!df$Asked & is.na(df$type)] <- "Replies"
gets you to your desired table:
> type <-
+ c(NA, rep("Question",3), NA, NA, rep("Answer",4), rep(NA, 3), rep("Answer",2),
+ NA, "Question", NA, rep("Answer",2), NA,NA)
> Asked <- c(
+ T, rep(F, 9), T, rep(F, 4), T, rep(F, 4), T,F
+ )
> df <- data.frame(title = 1:22, comments = 1:22, type, Asked)
> df$type[!df$Asked & is.na(df$type)] <- "Replies"
> df
title comments type Asked
1 1 1 <NA> TRUE
2 2 2 Question FALSE
3 3 3 Question FALSE
4 4 4 Question FALSE
5 5 5 Replies FALSE
6 6 6 Replies FALSE
7 7 7 Answer FALSE
8 8 8 Answer FALSE
9 9 9 Answer FALSE
10 10 10 Answer FALSE
11 11 11 <NA> TRUE
12 12 12 Replies FALSE
13 13 13 Replies FALSE
14 14 14 Answer FALSE
15 15 15 Answer FALSE
16 16 16 <NA> TRUE
17 17 17 Question FALSE
18 18 18 Replies FALSE
19 19 19 Answer FALSE
20 20 20 Answer FALSE
21 21 21 <NA> TRUE
22 22 22 Replies FALSE
R Replace NA for all Columns Except *
You can use mutate_at
:
library(dplyr)
Remove them by Name
df %>% mutate_at(vars(-c(Date, thatCol)), ~replace(., is.na(.), 0))
Remove them by position
df %>% mutate_at(-c(1,4), ~replace(., is.na(.), 0))
Select them by name
df %>% mutate_at(vars(col1, thisCol, col999), ~replace(., is.na(.), 0))
Select them by position
df %>% mutate_at(c(2, 3, 5), ~replace(., is.na(.), 0))
If you want to use replace_na
df %>% mutate_at(vars(-c(Date, thatCol)), tidyr::replace_na, 0)
Note that mutate_at
is soon going to be replaced by across
in dplyr 1.0.0
.
How to process NA as False in R
For me, I'd think the most beneficial way would be to use a dplyr
's case_when
function and explicitly state how the NA
cases you mention should be handled.
Replicating your example (notice that I'm explicitly setting the NAs here. Your NAs were the result of R not being able to handle a character string ("NA") within a numeric vector.
col1 = as.numeric(c(10, 2, 15, 2, NA_real_, 15))
col2 = as.numeric(c(15, 15, 2, 2, 15, NA_real_))
test <- data.frame(col1, col2)
For both the mutate
function and case_when
function I'm loading dplyr
. If you're not familiar with case_when
it's like a ifelse with multiple conditionals. Each conditional is followed by a "~" tilde. What comes after the tilde is what gets assigned if the conditional is met. To set "everything else" as some value X you type TRUE ~ "x"
as that obviously gets evaluated as true for all the other cases that have not been met in the previous conditionals.
This should do what you want:
library(dplyr)
test <- mutate(.data = test,
G5 = case_when(col1 > 5 & col2 > 5 ~ "Yes", #Original
(is.na(col1) & col2 > 5) | (col1 > 5 & is.na(col2)) ~ "Yes",
TRUE ~ "No")) # Everything else gets the value "No"
test
#> col1 col2 G5
#> 1 10 15 Yes
#> 2 2 15 No
#> 3 15 2 No
#> 4 2 2 No
#> 5 NA 15 Yes
#> 6 15 NA Yes
Replace NA with interpolated value for specific column fields in r
It is specified in the ?na.approx
An object of similar structure as object with NAs replaced by interpolation. For na.approx only the internal NAs are replaced and leading or trailing NAs are omitted if na.rm = TRUE or not replaced if na.rm = FALSE.
By default, the na.approx
uses na.rm = TRUE
na.approx(object, x = index(object), xout, ..., na.rm = TRUE, maxgap = Inf, along)
Thus, we can change the code to
my_data[, 42] <- na.approx(my_data[, 42], na.rm = FALSE)
In a large dataset, it is possible to have leading/lagging NAs and using the OP's code results in an output vector with less number of elements as na.rm = TRUE
, which triggers the length difference error in replacement
Replacing values of selected columns based on another dataframe with different size
Data:
dfa <- read.table(text="Accession Column1 Column2 Column3 Root ID
2000_1 0 0.2 14 2000 1
2000_2 0.01 0.2 17 2000 2
2001_1 0.012 0.22 11 2001 1
2001_2 0.011 0.231 17 2001 2", header = T)
Libraries and Functions:
library(tidyverse)
cv <- function(x) 100 * (sd(x) / mean(x))
Solution:
If we cut to the chase and consider the end result, basically you want to replace the values in Column1:Column3
with NA
if CV is greater than 30. Otherwise, you want to preserve the original values. The code below does that.
dfa %>%
group_by(Root) %>%
mutate_at(vars(Column1:Column3),
list(~ if(cv(.) > 30) NA else .))
Result:
#> # A tibble: 4 x 6
#> Accession Column1 Column2 Column3 Root ID
#> <fct> <dbl> <dbl> <dbl> <int> <int>
#> 1 2000_1 NA 0.2 14 2000 1
#> 2 2000_2 NA 0.2 17 2000 2
#> 3 2001_1 0.012 0.22 NA 2001 1
#> 4 2001_2 0.011 0.231 NA 2001 2
More complicated approaches:
If we want to follow your train of thoughts, then we'll end up with a more complicated code which is illustrated below;
dfa %>%
select_if(function(col) is.numeric(col) & all(col != .$ID)) %>%
group_by(Root) %>%
summarise_each(list(cv)) %>%
mutate_at(vars(Column1:Column3),
list(~ ifelse(. > 30, NA, 0))) %>%
left_join(dfa[,c("Root", "ID")], . , by = "Root") %>%
bind_rows(dfa, .) %>%
group_by(Root, ID) %>%
summarise_each(list(~ if(is.numeric(.)) sum(., na.rm = FALSE) else first(.))) %>%
ungroup %>%
select(-ID, -Root, everything())
Explanation:
- selecting numeric columns except
ID
. - grouping by
Root
. - Calculating CV for all the columns.
- Replacing CV values greater than 30 with
NA
and the rest with 0. I am planing to sum these with the original values as it seems that OP is interested in preserving the NAs (i.e. greater than 30) from this CV matrix but keep the other values unchanged in the original dataset. So summing with 0 keep the latter unchanged while those NAs (na.rm = F
) will affect the values. - Adding ID column back by joining to make the CV matrix the same size (rowwise) of the original dataset. Moreover, it will be used for grouping later.
- Binding the datasets by rows.
- grouping by
Root
andID
. - Summarizing numeric columns (i.e.
Column1
,Column2
, etc.) by summing the values from original dataframe and modified CV matrix and keeping the first value from other columns (since the original dataframe was first inbind_rows
that means preserving the original values). - Ungrouping to avoid future conflicts.
- Rearranging columns in the order that OP presented.
Another solution would be very similar to what's above, but instead of joining to get the ID column and expand the CV matrix, one can preserve them from the beginning by summarizing as list column and later unnesting them.
dfa %>%
mutate(ID = as.factor(ID)) %>%
group_by(Root) %>%
summarise_each(list(~ if(is.numeric(.)) cv(.) else list(.))) %>%
mutate_at(vars(Column1:Column3),
list(~ ifelse(. > 30, NA, 0))) %>%
unnest(cols = c(Accession, ID)) %>%
mutate(ID = as.integer(ID)) %>%
bind_rows(dfa, .) %>%
group_by(Root, ID) %>%
summarise_each(list(~ if(is.numeric(.)) sum(., na.rm = FALSE) else first(.))) %>%
ungroup %>%
select(-ID, -Root, everything())
Related Topics
Create an Expression from a Function for Data.Table to Eval
Plotting Data from an Svm Fit - Hyperplane
Remove Data.Frame Row Names When Using Xtable
How to Merge Two Columns in R with a Specific Symbol
Fastest Way for Filling-In Missing Dates for Data.Table
R Convert Between Zoo Object and Data Frame, Results Inconsistent for Different Numbers of Columns
Namespace Dependencies Not Required
How to Install R Package from Private Repo Using Devtools Install_Github
How to Combine Multiple Ggplot2 Elements into the Return of a Function
Can't Load X11 in R After Os X Yosemite Upgrade
Simple Frequency Tables Using Data.Table
How Can Put Multiple Plots Side-By-Side in Shiny R
How to Break Out of a Foreach Loop
What Algorithm I Need to Find N-Grams
Listing R Package Dependencies Without Installing Packages
Use Fortran Subroutine in R? Undefined Symbol
R Leaflet Language of the Map:How to Specify to Use English Language