Replace NA with Zero in dplyr without using list()
What version of dplyr
are you using? It might be an old one. The replace_na
function now seems to be in tidyr
. This works
library(tidyr)
df <- tibble::tibble(x = c(1, 2, NA), y = c("a", NA, "b"), z = list(1:5, NULL, 10:20))
df %>% replace_na(list(x = 0, y = "unknown")) %>% str()
# Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 3 obs. of 3 variables:
# $ x: num 1 2 0
# $ y: chr "a" "unknown" "b"
# $ z:List of 3
# ..$ : int 1 2 3 4 5
# ..$ : NULL
# ..$ : int 10 11 12 13 14 15 16 17 18 19 ...
We can see the NA values have been replaced and the columns x
and y
are still atomic vectors. Tested with tidyr_0.7.2
.
How do I replace NA values with zeros in an R dataframe?
See my comment in @gsk3 answer. A simple example:
> m <- matrix(sample(c(NA, 1:10), 100, replace = TRUE), 10)
> d <- as.data.frame(m)
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
1 4 3 NA 3 7 6 6 10 6 5
2 9 8 9 5 10 NA 2 1 7 2
3 1 1 6 3 6 NA 1 4 1 6
4 NA 4 NA 7 10 2 NA 4 1 8
5 1 2 4 NA 2 6 2 6 7 4
6 NA 3 NA NA 10 2 1 10 8 4
7 4 4 9 10 9 8 9 4 10 NA
8 5 8 3 2 1 4 5 9 4 7
9 3 9 10 1 9 9 10 5 3 3
10 4 2 2 5 NA 9 7 2 5 5
> d[is.na(d)] <- 0
> d
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
1 4 3 0 3 7 6 6 10 6 5
2 9 8 9 5 10 0 2 1 7 2
3 1 1 6 3 6 0 1 4 1 6
4 0 4 0 7 10 2 0 4 1 8
5 1 2 4 0 2 6 2 6 7 4
6 0 3 0 0 10 2 1 10 8 4
7 4 4 9 10 9 8 9 4 10 0
8 5 8 3 2 1 4 5 9 4 7
9 3 9 10 1 9 9 10 5 3 3
10 4 2 2 5 0 9 7 2 5 5
There's no need to apply apply
. =)
EDIT
You should also take a look at norm
package. It has a lot of nice features for missing data analysis. =)
R dplyr - replace NA with 0 if
You can use across
:
library(dplyr)
dtf %>% mutate(across(where(is.numeric), ~replace(., is.na(.), 0)))
#mutate_if for dplyr < 1.0.0
#dtf %>% mutate_if(is.numeric, ~replace(., is.na(.), 0))
You can also use replace_na
from tidyr
:
dtf %>% mutate(across(where(is.numeric), tidyr::replace_na, 0))
# id amt xamt camt date pamt
#1 1 1 1 1 2020-01-01 1
#2 2 4 4 4 <NA> 4
#3 3 0 0 0 2020-01-01 0
#4 4 123 123 123 <NA> 123
As suggested by @Darren Tsai we can also use coalesce
.
dtf %>% mutate(across(where(is.numeric), coalesce, 0))
replace NA's with 0, and Non NA's with a different value
You are not using NA
properly here -- you are treating it like a character variable in x=="NA"
- with NA
values, standard practice is to use is.na()
, not x==NA
. Try:
my_df$b3 <- ifelse(is.na(my_df$b2), 0, 1)
Replace NAs in R with zero, if column 1 is equal to zero
This should work:
library(dplyr)
data %>%
as.data.frame() %>%
mutate(across(c(Q2,Q3), ~case_when(Q1 == 0 ~ 0, TRUE ~ .)))
# Q1 Q2 Q3
# 1 0 0 0
# 2 0 0 0
# 3 1 2 2
# 4 2 1 1
# 5 0 0 0
# 6 4 NA 4
Your code failed because ifelse()
wants a vector as input and provides a vector as output. You could also use ifelse()
instead of case_when()
if you wanted.
library(dplyr)
data %>%
as.data.frame() %>%
mutate(across(c(Q2,Q3), ~ifelse(Q1 == 0, 0, .)))
replace NA with 0 using starts_with()
How about using mutate_at
with if_else
(or case_when
)? This works if you want to replace all NA
in the columns of interest with 0.
mutate_at(tbl1, vars( starts_with("num_") ),
funs( if_else( is.na(.), 0, .) ) )
# A tibble: 3 x 4
id num_a num_b col_c
<dbl> <dbl> <dbl> <chr>
1 1 1 0 d
2 2 0 99 e
3 3 4 100 <NA>
Note that starts_with
and other select helpers return An integer vector giving the position of the matched variables. I always have to keep this in mind when trying to use them in situations outside how I normally use them..
In newer versions of dplyr, use list()
with a tilde instead of funs()
:
list( ~if_else( is.na(.), 0, .) )
dplyr: replace NAs with zeros after group_by, while keeping original NAs in R
We could create a condition with if/else
to check for a single observation and if it is not NA, then return 0 or else do the calculation
library(dplyr)
df %>%
group_by(age, year) %>%
mutate(var1 = if(n() == 1 && !is.na(var1) | sum(!is.na(var1)) == 1) 0 * var1
else ((var1-mean(var1, na.rm=TRUE))/(1*(sd(var1, na.rm=TRUE))))) %>%
ungroup
-output
# A tibble: 8 x 4
id age year var1
<int> <chr> <int> <dbl>
1 4 KL 2007 0
2 1 KL 2008 -0.707
3 2 KL 2008 0.707
4 4 AG 2008 NA
5 3 AG 2008 0
6 3 SU 2009 NA
7 4 SU 2009 NA
8 4 LL 2011 NA
data
df <- structure(list(id = c(4L, 1L, 2L, 4L, 3L, 3L, 4L, 4L), age = c("KL",
"KL", "KL", "AG", "AG", "SU", "SU", "LL"), year = c(2007L, 2008L,
2008L, 2008L, 2008L, 2009L, 2009L, 2011L), var1 = c(15L, 10L,
20L, NA, 5L, NA, NA, NA)), class = "data.frame", row.names = c(NA,
-8L))
Set NA to 0 in R
You can just use the output of is.na
to replace directly with subsetting:
bothbeams.data[is.na(bothbeams.data)] <- 0
Or with a reproducible example:
dfr <- data.frame(x=c(1:3,NA),y=c(NA,4:6))
dfr[is.na(dfr)] <- 0
dfr
x y
1 1 0
2 2 4
3 3 5
4 0 6
However, be careful using this method on a data frame containing factors that also have missing values:
> d <- data.frame(x = c(NA,2,3),y = c("a",NA,"c"))
> d[is.na(d)] <- 0
Warning message:
In `[<-.factor`(`*tmp*`, thisvar, value = 0) :
invalid factor level, NA generated
It "works":
> d
x y
1 0 a
2 2 <NA>
3 3 c
...but you likely will want to specifically alter only the numeric columns in this case, rather than the whole data frame. See, eg, the answer below using dplyr::mutate_if
.
How to replace 0 or missing value with NA in R
You could just use replace
without any additional function / package:
data <- replace(data, data == 0, NA)
This is now assuming that data
is your data frame.
Otherwise you can simply insert the column name, e.g. if your data frame is df
and column name data
:
df$data <- replace(df$data, df$data == 0, NA)
Related Topics
Adding S4 Dispatch to Base R S3 Generic
Dynamic Arguments to Expand.Grid
Shiny: How to Adjust the Width of the Tabsetpanel
Create Category Based on Range in R
How to Check the Existence of a Downloaded File
Centering Image and Text in R Markdown for a PDF Report
R Shiny - Disable/Able Shinyui Elements
Ggplot2 - Shade Area Between Two Vertical Lines
How to Replace Empty String with Na in R Dataframe
Colorize Clusters in Dendogram with Ggplot2
How to Change Font Size of the Correlation Coefficient in Corrplot
Copy/Move One Environment to Another
Plotting Multiple Curves Same Graph and Same Scale
How to Generate Bin Frequency Table in R