Set NA to 0 in R
You can just use the output of is.na
to replace directly with subsetting:
bothbeams.data[is.na(bothbeams.data)] <- 0
Or with a reproducible example:
dfr <- data.frame(x=c(1:3,NA),y=c(NA,4:6))
dfr[is.na(dfr)] <- 0
dfr
x y
1 1 0
2 2 4
3 3 5
4 0 6
However, be careful using this method on a data frame containing factors that also have missing values:
> d <- data.frame(x = c(NA,2,3),y = c("a",NA,"c"))
> d[is.na(d)] <- 0
Warning message:
In `[<-.factor`(`*tmp*`, thisvar, value = 0) :
invalid factor level, NA generated
It "works":
> d
x y
1 0 a
2 2 <NA>
3 3 c
...but you likely will want to specifically alter only the numeric columns in this case, rather than the whole data frame. See, eg, the answer below using dplyr::mutate_if
.
How do I replace NA values with zeros in an R dataframe?
See my comment in @gsk3 answer. A simple example:
> m <- matrix(sample(c(NA, 1:10), 100, replace = TRUE), 10)
> d <- as.data.frame(m)
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
1 4 3 NA 3 7 6 6 10 6 5
2 9 8 9 5 10 NA 2 1 7 2
3 1 1 6 3 6 NA 1 4 1 6
4 NA 4 NA 7 10 2 NA 4 1 8
5 1 2 4 NA 2 6 2 6 7 4
6 NA 3 NA NA 10 2 1 10 8 4
7 4 4 9 10 9 8 9 4 10 NA
8 5 8 3 2 1 4 5 9 4 7
9 3 9 10 1 9 9 10 5 3 3
10 4 2 2 5 NA 9 7 2 5 5
> d[is.na(d)] <- 0
> d
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
1 4 3 0 3 7 6 6 10 6 5
2 9 8 9 5 10 0 2 1 7 2
3 1 1 6 3 6 0 1 4 1 6
4 0 4 0 7 10 2 0 4 1 8
5 1 2 4 0 2 6 2 6 7 4
6 0 3 0 0 10 2 1 10 8 4
7 4 4 9 10 9 8 9 4 10 0
8 5 8 3 2 1 4 5 9 4 7
9 3 9 10 1 9 9 10 5 3 3
10 4 2 2 5 0 9 7 2 5 5
There's no need to apply apply
. =)
EDIT
You should also take a look at norm
package. It has a lot of nice features for missing data analysis. =)
Replace NA with 0 in a data frame column
Since nobody so far felt fit to point out why what you're trying doesn't work:
NA == NA
doesn't returnTRUE
, it returnsNA
(since comparing to undefined values should yield an undefined result).- You're trying to call
apply
on an atomic vector. You can't useapply
to loop over the elements in a column. - Your subscripts are off - you're trying to give two indices into
a$x
, which is just the column (an atomic vector).
I'd fix up 3. to get to a$x[is.na(a$x)] <- 0
Replace all 0 values to NA
Replacing all zeroes to NA:
df[df == 0] <- NA
Explanation
1. It is not NULL
what you should want to replace zeroes with. As it says in ?'NULL'
,
NULL represents the null object in R
which is unique and, I guess, can be seen as the most uninformative and empty object.1 Then it becomes not so surprising that
data.frame(x = c(1, NULL, 2))
# x
# 1 1
# 2 2
That is, R does not reserve any space for this null object.2 Meanwhile, looking at ?'NA'
we see that
NA is a logical constant of length 1 which contains a missing value
indicator. NA can be coerced to any other vector type except raw.
Importantly, NA
is of length 1 so that R reserves some space for it. E.g.,
data.frame(x = c(1, NA, 2))
# x
# 1 1
# 2 NA
# 3 2
Also, the data frame structure requires all the columns to have the same number of elements so that there can be no "holes" (i.e., NULL
values).
Now you could replace zeroes by NULL
in a data frame in the sense of completely removing all the rows containing at least one zero. When using, e.g., var
, cov
, or cor
, that is actually equivalent to first replacing zeroes with NA
and setting the value of use
as "complete.obs"
. Typically, however, this is unsatisfactory as it leads to extra information loss.
2. Instead of running some sort of loop, in the solution I use df == 0
vectorization. df == 0
returns (try it) a matrix of the same size as df
, with the entries TRUE
and FALSE
. Further, we are also allowed to pass this matrix to the subsetting [...]
(see ?'['
). Lastly, while the result of df[df == 0]
is perfectly intuitive, it may seem strange that df[df == 0] <- NA
gives the desired effect. The assignment operator <-
is indeed not always so smart and does not work in this way with some other objects, but it does so with data frames; see ?'<-'
.
1 The empty set in the set theory feels somehow related.
2 Another similarity with the set theory: the empty set is a subset of every set, but we do not reserve any space for it.
Set 0 to NA in R
Is this what you need?
df <- data.frame(A=c(0, 3, "bla"), B=c("A", 0, "X"), C=c("x","B", 4)) #some fake data
df[df == 0] <- NA
Replace NA with Zero in dplyr without using list()
What version of dplyr
are you using? It might be an old one. The replace_na
function now seems to be in tidyr
. This works
library(tidyr)
df <- tibble::tibble(x = c(1, 2, NA), y = c("a", NA, "b"), z = list(1:5, NULL, 10:20))
df %>% replace_na(list(x = 0, y = "unknown")) %>% str()
# Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 3 obs. of 3 variables:
# $ x: num 1 2 0
# $ y: chr "a" "unknown" "b"
# $ z:List of 3
# ..$ : int 1 2 3 4 5
# ..$ : NULL
# ..$ : int 10 11 12 13 14 15 16 17 18 19 ...
We can see the NA values have been replaced and the columns x
and y
are still atomic vectors. Tested with tidyr_0.7.2
.
replacing all NA with a 0 in data.table in R
We can either specify the .SDcols
with the names of the columns ('nm1'), loop over the .SD
(Subset of Data.table) and assign the NA to 0 (replace_na
from tidyr
)
library(data.table)
library(tidyr)
nm1 <- paste0("claim", 9:12, "month")
setDT(claimsMonthly)[, (nm1) := lapply(.SD, replace_na, 0), .SDcols = nm1]
Or as @jangorecki mentioned in the comments, nafill
from data.table
would be better
setDT(claimsMonthly)[, (nm1) := lapply(.SD, nafill, fill = 0), .SDcols = nm1]
or using a loop with set
, assign the columns of interest with 0 based on the NA values in each column by specifying the i
(for row index) and j
for column index/name
for(j in nm1){
set(claimsMonthly, i = which(is.na(claimsMonthly[[j]])), j =j, value = 0)
}
Or with setnafill
setnafill(claimsMonthly, cols = nm1, fill = 0)
R dplyr - replace NA with 0 if
You can use across
:
library(dplyr)
dtf %>% mutate(across(where(is.numeric), ~replace(., is.na(.), 0)))
#mutate_if for dplyr < 1.0.0
#dtf %>% mutate_if(is.numeric, ~replace(., is.na(.), 0))
You can also use replace_na
from tidyr
:
dtf %>% mutate(across(where(is.numeric), tidyr::replace_na, 0))
# id amt xamt camt date pamt
#1 1 1 1 1 2020-01-01 1
#2 2 4 4 4 <NA> 4
#3 3 0 0 0 2020-01-01 0
#4 4 123 123 123 <NA> 123
As suggested by @Darren Tsai we can also use coalesce
.
dtf %>% mutate(across(where(is.numeric), coalesce, 0))
Related Topics
Forward and Backward Fill Data Frame in R
Showing String in Formula and Not as Variable in Lm Fit
How to Fix Corrupted Dates in R
What Are the Differences Between Community Detection Algorithms in Igraph
Ggplot2 Plot Without Axes, Legends, etc
Remove Grid, Background Color, and Top and Right Borders from Ggplot2
What's the Difference Between Integer Class and Numeric Class in R
Differencebetween Gc() and Rm()
How to Save Data File into .Rdata
How to Sort a Data Frame by Date
How to Fix the Aspect Ratio in Ggplot
Extract Elements Common in All Column Groups
Processing Negative Number in "Accounting" Format