Replace Na with Zero in Dplyr Without Using List()

Replace NA with Zero in dplyr without using list()

What version of dplyr are you using? It might be an old one. The replace_na function now seems to be in tidyr. This works

library(tidyr)
df <- tibble::tibble(x = c(1, 2, NA), y = c("a", NA, "b"), z = list(1:5, NULL, 10:20))
df %>% replace_na(list(x = 0, y = "unknown")) %>% str()
# Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 3 obs. of 3 variables:
# $ x: num 1 2 0
# $ y: chr "a" "unknown" "b"
# $ z:List of 3
# ..$ : int 1 2 3 4 5
# ..$ : NULL
# ..$ : int 10 11 12 13 14 15 16 17 18 19 ...

We can see the NA values have been replaced and the columns x and y are still atomic vectors. Tested with tidyr_0.7.2.

How do I replace NA values with zeros in an R dataframe?

See my comment in @gsk3 answer. A simple example:

> m <- matrix(sample(c(NA, 1:10), 100, replace = TRUE), 10)
> d <- as.data.frame(m)
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
1 4 3 NA 3 7 6 6 10 6 5
2 9 8 9 5 10 NA 2 1 7 2
3 1 1 6 3 6 NA 1 4 1 6
4 NA 4 NA 7 10 2 NA 4 1 8
5 1 2 4 NA 2 6 2 6 7 4
6 NA 3 NA NA 10 2 1 10 8 4
7 4 4 9 10 9 8 9 4 10 NA
8 5 8 3 2 1 4 5 9 4 7
9 3 9 10 1 9 9 10 5 3 3
10 4 2 2 5 NA 9 7 2 5 5

> d[is.na(d)] <- 0

> d
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
1 4 3 0 3 7 6 6 10 6 5
2 9 8 9 5 10 0 2 1 7 2
3 1 1 6 3 6 0 1 4 1 6
4 0 4 0 7 10 2 0 4 1 8
5 1 2 4 0 2 6 2 6 7 4
6 0 3 0 0 10 2 1 10 8 4
7 4 4 9 10 9 8 9 4 10 0
8 5 8 3 2 1 4 5 9 4 7
9 3 9 10 1 9 9 10 5 3 3
10 4 2 2 5 0 9 7 2 5 5

There's no need to apply apply. =)

EDIT

You should also take a look at norm package. It has a lot of nice features for missing data analysis. =)

R dplyr - replace NA with 0 if

You can use across :

library(dplyr)
dtf %>% mutate(across(where(is.numeric), ~replace(., is.na(.), 0)))
#mutate_if for dplyr < 1.0.0
#dtf %>% mutate_if(is.numeric, ~replace(., is.na(.), 0))

You can also use replace_na from tidyr :

dtf %>% mutate(across(where(is.numeric), tidyr::replace_na, 0))

# id amt xamt camt date pamt
#1 1 1 1 1 2020-01-01 1
#2 2 4 4 4 <NA> 4
#3 3 0 0 0 2020-01-01 0
#4 4 123 123 123 <NA> 123

As suggested by @Darren Tsai we can also use coalesce.

dtf %>% mutate(across(where(is.numeric), coalesce, 0))

replace NA's with 0, and Non NA's with a different value

You are not using NA properly here -- you are treating it like a character variable in x=="NA" - with NA values, standard practice is to use is.na(), not x==NA. Try:

my_df$b3 <- ifelse(is.na(my_df$b2), 0, 1)

Replace NAs in R with zero, if column 1 is equal to zero

This should work:

library(dplyr)
data %>%
as.data.frame() %>%
mutate(across(c(Q2,Q3), ~case_when(Q1 == 0 ~ 0, TRUE ~ .)))
# Q1 Q2 Q3
# 1 0 0 0
# 2 0 0 0
# 3 1 2 2
# 4 2 1 1
# 5 0 0 0
# 6 4 NA 4

Your code failed because ifelse() wants a vector as input and provides a vector as output. You could also use ifelse() instead of case_when() if you wanted.

library(dplyr)
data %>%
as.data.frame() %>%
mutate(across(c(Q2,Q3), ~ifelse(Q1 == 0, 0, .)))

replace NA with 0 using starts_with()

How about using mutate_at with if_else (or case_when)? This works if you want to replace all NA in the columns of interest with 0.

mutate_at(tbl1, vars( starts_with("num_") ), 
funs( if_else( is.na(.), 0, .) ) )

# A tibble: 3 x 4
id num_a num_b col_c
<dbl> <dbl> <dbl> <chr>
1 1 1 0 d
2 2 0 99 e
3 3 4 100 <NA>

Note that starts_with and other select helpers return An integer vector giving the position of the matched variables. I always have to keep this in mind when trying to use them in situations outside how I normally use them..

In newer versions of dplyr, use list() with a tilde instead of funs():

list( ~if_else( is.na(.), 0, .) )

dplyr: replace NAs with zeros after group_by, while keeping original NAs in R

We could create a condition with if/else to check for a single observation and if it is not NA, then return 0 or else do the calculation

library(dplyr)
df %>%
group_by(age, year) %>%
mutate(var1 = if(n() == 1 && !is.na(var1) | sum(!is.na(var1)) == 1) 0 * var1
else ((var1-mean(var1, na.rm=TRUE))/(1*(sd(var1, na.rm=TRUE))))) %>%
ungroup

-output

# A tibble: 8 x 4
id age year var1
<int> <chr> <int> <dbl>
1 4 KL 2007 0
2 1 KL 2008 -0.707
3 2 KL 2008 0.707
4 4 AG 2008 NA
5 3 AG 2008 0
6 3 SU 2009 NA
7 4 SU 2009 NA
8 4 LL 2011 NA

data

df <- structure(list(id = c(4L, 1L, 2L, 4L, 3L, 3L, 4L, 4L), age = c("KL", 
"KL", "KL", "AG", "AG", "SU", "SU", "LL"), year = c(2007L, 2008L,
2008L, 2008L, 2008L, 2009L, 2009L, 2011L), var1 = c(15L, 10L,
20L, NA, 5L, NA, NA, NA)), class = "data.frame", row.names = c(NA,
-8L))

Set NA to 0 in R

You can just use the output of is.na to replace directly with subsetting:

bothbeams.data[is.na(bothbeams.data)] <- 0

Or with a reproducible example:

dfr <- data.frame(x=c(1:3,NA),y=c(NA,4:6))
dfr[is.na(dfr)] <- 0
dfr
x y
1 1 0
2 2 4
3 3 5
4 0 6

However, be careful using this method on a data frame containing factors that also have missing values:

> d <- data.frame(x = c(NA,2,3),y = c("a",NA,"c"))
> d[is.na(d)] <- 0
Warning message:
In `[<-.factor`(`*tmp*`, thisvar, value = 0) :
invalid factor level, NA generated

It "works":

> d
x y
1 0 a
2 2 <NA>
3 3 c

...but you likely will want to specifically alter only the numeric columns in this case, rather than the whole data frame. See, eg, the answer below using dplyr::mutate_if.

How to replace 0 or missing value with NA in R

You could just use replace without any additional function / package:

data <- replace(data, data == 0, NA)

This is now assuming that data is your data frame.

Otherwise you can simply insert the column name, e.g. if your data frame is df and column name data:

df$data <- replace(df$data, df$data == 0, NA)


Related Topics



Leave a reply



Submit