How to Replace Na Values With Zeros in an R Dataframe

How do I replace NA values with zeros in an R dataframe?

See my comment in @gsk3 answer. A simple example:

> m <- matrix(sample(c(NA, 1:10), 100, replace = TRUE), 10)
> d <- as.data.frame(m)
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
1 4 3 NA 3 7 6 6 10 6 5
2 9 8 9 5 10 NA 2 1 7 2
3 1 1 6 3 6 NA 1 4 1 6
4 NA 4 NA 7 10 2 NA 4 1 8
5 1 2 4 NA 2 6 2 6 7 4
6 NA 3 NA NA 10 2 1 10 8 4
7 4 4 9 10 9 8 9 4 10 NA
8 5 8 3 2 1 4 5 9 4 7
9 3 9 10 1 9 9 10 5 3 3
10 4 2 2 5 NA 9 7 2 5 5

> d[is.na(d)] <- 0

> d
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
1 4 3 0 3 7 6 6 10 6 5
2 9 8 9 5 10 0 2 1 7 2
3 1 1 6 3 6 0 1 4 1 6
4 0 4 0 7 10 2 0 4 1 8
5 1 2 4 0 2 6 2 6 7 4
6 0 3 0 0 10 2 1 10 8 4
7 4 4 9 10 9 8 9 4 10 0
8 5 8 3 2 1 4 5 9 4 7
9 3 9 10 1 9 9 10 5 3 3
10 4 2 2 5 0 9 7 2 5 5

There's no need to apply apply. =)

EDIT

You should also take a look at norm package. It has a lot of nice features for missing data analysis. =)

Replace NA with 0 in a data frame column

Since nobody so far felt fit to point out why what you're trying doesn't work:

  1. NA == NA doesn't return TRUE, it returns NA (since comparing to undefined values should yield an undefined result).
  2. You're trying to call apply on an atomic vector. You can't use apply to loop over the elements in a column.
  3. Your subscripts are off - you're trying to give two indices into a$x, which is just the column (an atomic vector).

I'd fix up 3. to get to a$x[is.na(a$x)] <- 0

replacing NA's with 0's in R dataframe

dataset <- matrix(sample(c(NA, 1:5), 25, replace = TRUE), 5);
data <- as.data.frame(dataset)
[,1] [,2] [,3] [,4] [,5] 
[1,] 2 3 5 5 4
[2,] 2 4 3 2 4
[3,] 2 NA NA NA 2
[4,] 2 3 NA 5 5
[5,] 2 3 2 2 3
data[is.na(data)] <- 0

How do I replace NA values with zeros in R?

For practically any data structure X containing numerics, use

X[is.na(X)] <- 0

Your question seems slightly discombobulated though - you have indicated that you mean <NA> not NA, without explaining what type <NA> is.

If it is the string "<NA>" you mean, then

X[X=="<NA>"] <- "0"

If you have mixed data types in your data frame, check for that too:

X[is.character(X) & X=="<NA>"] <- "0"

which is strictly more useful in the numeric case.

X[is.numeric(X) & is.na(X)] <- 0

This is a very common idiom for dealing with missing data in R, although you should also look at the parameter na.rm = TRUE which many functions such as mean, sum, &c. will accept.

This strategy will fail for a factor, because you cannot add new factor levels by assigning to the value of a factor. I haven't used read.spss, but looking at the documentation, I suggest you add the use.value.labels = FALSE argument to your call, to avoid creating factors in the first place.

In your specific case, your entire data frame is of the same type (factor). This means it's safe to convert to a character matrix

> class(mydata[[1]])
"factor"
> mydataM <- as.matrix(mydata)
> mode(mydataM)
"character"

Now you can replace the NA values

X[is.character(X) & X=="<NA>"] <- "0"

In the more general case where you have unwanted factor columns mixed in with other types, you need to do something a little more complex.

myDataM=as.data.frame(lapply(x,
function(x)if(class(x)=="factor")as.character(x)else x))

Set NA to 0 in R

You can just use the output of is.na to replace directly with subsetting:

bothbeams.data[is.na(bothbeams.data)] <- 0

Or with a reproducible example:

dfr <- data.frame(x=c(1:3,NA),y=c(NA,4:6))
dfr[is.na(dfr)] <- 0
dfr
x y
1 1 0
2 2 4
3 3 5
4 0 6

However, be careful using this method on a data frame containing factors that also have missing values:

> d <- data.frame(x = c(NA,2,3),y = c("a",NA,"c"))
> d[is.na(d)] <- 0
Warning message:
In `[<-.factor`(`*tmp*`, thisvar, value = 0) :
invalid factor level, NA generated

It "works":

> d
x y
1 0 a
2 2 <NA>
3 3 c

...but you likely will want to specifically alter only the numeric columns in this case, rather than the whole data frame. See, eg, the answer below using dplyr::mutate_if.

Replace NAs with zeros using ifelse

There is syntax issue in the code

is.na(df$violence) == "ignore"

will be comparing the logical column derived from is.na with "ignore", instead if the description is as stated in the OP's post - The category ignore should be coded 1 but other categories including NAs should be coded 0., use

df$new_variable <- +(df$violence %in% "ignore")

Here, we check for values that are "ignore" with %in% which returns a logical vector - TRUE only for "ignore" and FALSE for all others including NA (== returns NA for NA values) and then convert to binary with + (TRUE -> 1 and FALSE -> 0)

Replace NA values by - in R

In base R, we may do

output[is.na(output)] <- "-"

-output

> output
date ABC CDE FGH SUM
1 2021-06-30 4 1 6 11
2 2021-07-02 1 - - 1


Related Topics



Leave a reply



Submit