How do I replace NA values with zeros in an R dataframe?
See my comment in @gsk3 answer. A simple example:
> m <- matrix(sample(c(NA, 1:10), 100, replace = TRUE), 10)
> d <- as.data.frame(m)
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
1 4 3 NA 3 7 6 6 10 6 5
2 9 8 9 5 10 NA 2 1 7 2
3 1 1 6 3 6 NA 1 4 1 6
4 NA 4 NA 7 10 2 NA 4 1 8
5 1 2 4 NA 2 6 2 6 7 4
6 NA 3 NA NA 10 2 1 10 8 4
7 4 4 9 10 9 8 9 4 10 NA
8 5 8 3 2 1 4 5 9 4 7
9 3 9 10 1 9 9 10 5 3 3
10 4 2 2 5 NA 9 7 2 5 5
> d[is.na(d)] <- 0
> d
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
1 4 3 0 3 7 6 6 10 6 5
2 9 8 9 5 10 0 2 1 7 2
3 1 1 6 3 6 0 1 4 1 6
4 0 4 0 7 10 2 0 4 1 8
5 1 2 4 0 2 6 2 6 7 4
6 0 3 0 0 10 2 1 10 8 4
7 4 4 9 10 9 8 9 4 10 0
8 5 8 3 2 1 4 5 9 4 7
9 3 9 10 1 9 9 10 5 3 3
10 4 2 2 5 0 9 7 2 5 5
There's no need to apply apply
. =)
EDIT
You should also take a look at norm
package. It has a lot of nice features for missing data analysis. =)
Replace NA with 0 in a data frame column
Since nobody so far felt fit to point out why what you're trying doesn't work:
NA == NA
doesn't returnTRUE
, it returnsNA
(since comparing to undefined values should yield an undefined result).- You're trying to call
apply
on an atomic vector. You can't useapply
to loop over the elements in a column. - Your subscripts are off - you're trying to give two indices into
a$x
, which is just the column (an atomic vector).
I'd fix up 3. to get to a$x[is.na(a$x)] <- 0
replacing NA's with 0's in R dataframe
dataset <- matrix(sample(c(NA, 1:5), 25, replace = TRUE), 5);
data <- as.data.frame(dataset)
[,1] [,2] [,3] [,4] [,5]
[1,] 2 3 5 5 4
[2,] 2 4 3 2 4
[3,] 2 NA NA NA 2
[4,] 2 3 NA 5 5
[5,] 2 3 2 2 3
data[is.na(data)] <- 0
How do I replace NA values with zeros in R?
For practically any data structure X
containing numerics, use
X[is.na(X)] <- 0
Your question seems slightly discombobulated though - you have indicated that you mean <NA>
not NA, without explaining what type <NA>
is.
If it is the string "<NA>"
you mean, then
X[X=="<NA>"] <- "0"
If you have mixed data types in your data frame, check for that too:
X[is.character(X) & X=="<NA>"] <- "0"
which is strictly more useful in the numeric case.
X[is.numeric(X) & is.na(X)] <- 0
This is a very common idiom for dealing with missing data in R, although you should also look at the parameter na.rm = TRUE
which many functions such as mean
, sum
, &c. will accept.
This strategy will fail for a factor, because you cannot add new factor levels by assigning to the value of a factor. I haven't used read.spss, but looking at the documentation, I suggest you add the use.value.labels = FALSE
argument to your call, to avoid creating factors in the first place.
In your specific case, your entire data frame is of the same type (factor). This means it's safe to convert to a character matrix
> class(mydata[[1]])
"factor"
> mydataM <- as.matrix(mydata)
> mode(mydataM)
"character"
Now you can replace the NA values
X[is.character(X) & X=="<NA>"] <- "0"
In the more general case where you have unwanted factor columns mixed in with other types, you need to do something a little more complex.
myDataM=as.data.frame(lapply(x,
function(x)if(class(x)=="factor")as.character(x)else x))
Set NA to 0 in R
You can just use the output of is.na
to replace directly with subsetting:
bothbeams.data[is.na(bothbeams.data)] <- 0
Or with a reproducible example:
dfr <- data.frame(x=c(1:3,NA),y=c(NA,4:6))
dfr[is.na(dfr)] <- 0
dfr
x y
1 1 0
2 2 4
3 3 5
4 0 6
However, be careful using this method on a data frame containing factors that also have missing values:
> d <- data.frame(x = c(NA,2,3),y = c("a",NA,"c"))
> d[is.na(d)] <- 0
Warning message:
In `[<-.factor`(`*tmp*`, thisvar, value = 0) :
invalid factor level, NA generated
It "works":
> d
x y
1 0 a
2 2 <NA>
3 3 c
...but you likely will want to specifically alter only the numeric columns in this case, rather than the whole data frame. See, eg, the answer below using dplyr::mutate_if
.
Replace NAs with zeros using ifelse
There is syntax issue in the code
is.na(df$violence) == "ignore"
will be comparing the logical column derived from is.na
with "ignore", instead if the description is as stated in the OP's post - The category ignore should be coded 1 but other categories including NAs should be coded 0.
, use
df$new_variable <- +(df$violence %in% "ignore")
Here, we check for values that are "ignore" with %in%
which returns a logical vector - TRUE
only for "ignore" and FALSE
for all others including NA
(==
returns NA
for NA
values) and then convert to binary with +
(TRUE
-> 1
and FALSE
-> 0
)
Replace NA values by - in R
In base R
, we may do
output[is.na(output)] <- "-"
-output
> output
date ABC CDE FGH SUM
1 2021-06-30 4 1 6 11
2 2021-07-02 1 - - 1
Related Topics
Change Rows into Columns in R With Values Yes/No (1/0)
How to View the Source Code For a Function
Gather Multiple Sets of Columns
Rcpp Package Doesn't Include Rcpp_Precious_Remove
Dcast Warning: 'Aggregation Function Missing: Defaulting to Length'
Elegant Way to Check For Missing Packages and Install Them
How to Count the Number of Unique Values by Group
Reorder Levels of a Factor Without Changing Order of Values
Get the Difference Between Dates in Terms of Weeks, Months, Quarters, and Years
Count Number of Rows Per Group and Add Result to Original Data Frame
How to Import Multiple .Csv Files At Once
Aggregating by Unique Identifier and Concatenating Related Values into a String
Convert Data from Long Format to Wide Format With Multiple Measure Columns
Complete Dataframe With Missing Combinations of Values
Emulate Ggplot2 Default Color Palette