Replace all 0 values to NA
Replacing all zeroes to NA:
df[df == 0] <- NA
Explanation
1. It is not NULL
what you should want to replace zeroes with. As it says in ?'NULL'
,
NULL represents the null object in R
which is unique and, I guess, can be seen as the most uninformative and empty object.1 Then it becomes not so surprising that
data.frame(x = c(1, NULL, 2))
# x
# 1 1
# 2 2
That is, R does not reserve any space for this null object.2 Meanwhile, looking at ?'NA'
we see that
NA is a logical constant of length 1 which contains a missing value
indicator. NA can be coerced to any other vector type except raw.
Importantly, NA
is of length 1 so that R reserves some space for it. E.g.,
data.frame(x = c(1, NA, 2))
# x
# 1 1
# 2 NA
# 3 2
Also, the data frame structure requires all the columns to have the same number of elements so that there can be no "holes" (i.e., NULL
values).
Now you could replace zeroes by NULL
in a data frame in the sense of completely removing all the rows containing at least one zero. When using, e.g., var
, cov
, or cor
, that is actually equivalent to first replacing zeroes with NA
and setting the value of use
as "complete.obs"
. Typically, however, this is unsatisfactory as it leads to extra information loss.
2. Instead of running some sort of loop, in the solution I use df == 0
vectorization. df == 0
returns (try it) a matrix of the same size as df
, with the entries TRUE
and FALSE
. Further, we are also allowed to pass this matrix to the subsetting [...]
(see ?'['
). Lastly, while the result of df[df == 0]
is perfectly intuitive, it may seem strange that df[df == 0] <- NA
gives the desired effect. The assignment operator <-
is indeed not always so smart and does not work in this way with some other objects, but it does so with data frames; see ?'<-'
.
1 The empty set in the set theory feels somehow related.
2 Another similarity with the set theory: the empty set is a subset of every set, but we do not reserve any space for it.
How to replace 0 or missing value with NA in R
You could just use replace
without any additional function / package:
data <- replace(data, data == 0, NA)
This is now assuming that data
is your data frame.
Otherwise you can simply insert the column name, e.g. if your data frame is df
and column name data
:
df$data <- replace(df$data, df$data == 0, NA)
Set 0 to NA in R
Is this what you need?
df <- data.frame(A=c(0, 3, "bla"), B=c("A", 0, "X"), C=c("x","B", 4)) #some fake data
df[df == 0] <- NA
How do I replace NA values with zeros in an R dataframe?
See my comment in @gsk3 answer. A simple example:
> m <- matrix(sample(c(NA, 1:10), 100, replace = TRUE), 10)
> d <- as.data.frame(m)
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
1 4 3 NA 3 7 6 6 10 6 5
2 9 8 9 5 10 NA 2 1 7 2
3 1 1 6 3 6 NA 1 4 1 6
4 NA 4 NA 7 10 2 NA 4 1 8
5 1 2 4 NA 2 6 2 6 7 4
6 NA 3 NA NA 10 2 1 10 8 4
7 4 4 9 10 9 8 9 4 10 NA
8 5 8 3 2 1 4 5 9 4 7
9 3 9 10 1 9 9 10 5 3 3
10 4 2 2 5 NA 9 7 2 5 5
> d[is.na(d)] <- 0
> d
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
1 4 3 0 3 7 6 6 10 6 5
2 9 8 9 5 10 0 2 1 7 2
3 1 1 6 3 6 0 1 4 1 6
4 0 4 0 7 10 2 0 4 1 8
5 1 2 4 0 2 6 2 6 7 4
6 0 3 0 0 10 2 1 10 8 4
7 4 4 9 10 9 8 9 4 10 0
8 5 8 3 2 1 4 5 9 4 7
9 3 9 10 1 9 9 10 5 3 3
10 4 2 2 5 0 9 7 2 5 5
There's no need to apply apply
. =)
EDIT
You should also take a look at norm
package. It has a lot of nice features for missing data analysis. =)
R - replacing the zero values in *just one row* of a matrix with NA
A[1, A[1,] == 0] <- NA
A
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
# [1,] NA NA 1 1 NA NA NA 1 NA
# [2,] 0 0 0 0 1 0 0 0 0
# [3,] 1 0 0 0 0 0 0 0 0
# [4,] 1 0 0 0 0 0 0 0 1
# [5,] 0 1 0 0 0 1 0 0 0
# [6,] 0 0 0 0 1 0 0 0 0
# [7,] 0 0 0 0 0 0 0 0 0
# [8,] 1 0 0 0 0 0 0 0 0
# [9,] 0 0 0 1 0 0 0 0 0
Python Pandas replace multiple columns zero to Nan
I think you need replace
by dict
:
cols = ["Weight","Height","BootSize","SuitSize","Type"]
df2[cols] = df2[cols].replace({'0':np.nan, 0:np.nan})
Replace all zero columns with NA
With tidyverse
, we can use if/else
library(tidyverse)
df %>%
group_by(ID) %>%
mutate_all(list(~ if(all(.==0)) NA_integer_ else .))
# ID A1 B1
# <fct> <dbl> <dbl>
#1 A NA 1
#2 A NA 0
#3 A NA 1
#4 A NA 0
#5 B 1 NA
#6 B 0 NA
#7 B 1 NA
Or without any if/else
df %>%
group_by(ID) %>%
mutate_all(~ NA^all(!.) * .)
or using data.table
library(data.table)
setDT(df)[, lapply(.SD, function(x) replace(x, all(x == 0), NA)), ID]
Or using base R
by(df[-1], df$ID, FUN = function(x) x * (NA^ !colSums(!!x))[col(x)])
Replacing zeroes with NA for values preceding non-zero
There are three issues. First, writing:
df <- cbind(stock1,stock2,stock3,stock4)
doesn't create a data frame. It creates a matrix. This is an issue when you try to use lapply
, which will operate over the columns of a data frame but over the elements of a matrix. Instead, you should write:
df <- data.frame(stock1,stock2,stock3,stock4)
Second, the function you're using in lapply
needs to return the modified vector. Otherwise, the return value will be something unexpected (in this case, the assignment will return a single NA
, and the lapply
will return a data frame of one row of NA
s instead of the data frame you want).
Third, you need to take care with 1:n
when n
can be zero (i.e., when the first stock quote is non-zero) because 1:0
gives the sequence c(1,0)
instead of an empty sequence. (This is arguably one of R's stupidest features.)
Therefore, the following will give you what you want:
stock1 <- c(0.01, -0.02, 0.01, 0.05, 0.04, -0.02)
stock2 <- c(0, 0, 0.02, 0.04, -0.03, 0.02)
stock3 <- c(0, 0, 0.02, 0, -0.01, 0.03)
stock4 <- c(0, -0.02, 0.01, 0, 0, -0.02)
df <- data.frame(stock1,stock2,stock3,stock4)
as.data.frame(lapply(df, function(x) {
n <- min(which(x != 0)) - 1
if (n > 0)
x[1:n] <- NA
x
}))
The output is as expected:
stock1 stock2 stock3 stock4
1 0.01 NA NA NA
2 -0.02 NA NA -0.02
3 0.01 0.02 0.02 0.01
4 0.05 0.04 0.00 0.00
5 0.04 -0.03 -0.01 0.00
6 -0.02 0.02 0.03 -0.02
Update: As @Daniel_Fischer notes, there's a clever trick to avoid the 1:0
problem. You can instead write:
as.data.frame(lapply(df, function(x) {
n <- min(which(x != 0)) - 1
x[0:n] <- NA # use 0:n instead of 1:n
x
}))
This takes advantage of the fact that R ignores zeros in this type of indexing operation, so:
x[0:0] <- NA # same as x[0] <- NA and does nothing
x[0:1] <- NA # same as x[1] <- NA
x[0:2] <- NA # same as x[1:2] <- NA, etc.
Related Topics
Select the N Most Frequent Values in a Variable
Find All Combinations of a Set of Numbers That Add Up to a Certain Total
R: Error in Usemethod("Tbl_Vars")
How to Dplyr Rename a Column, by Column Index
Delete Rows Containing Specific Strings in R
Creating a for Loop to Subset Data on R
Remove Total Value for One Column in Powerbi
How to Add a Row to Data Frame Based on a Condition
How to Give Subtitles for Subplot in Plot_Ly Using R
R: How to Check If All Columns in a Data.Frame Are the Same
Adding Some Space Between the X-Axis and the Bars, in Ggplot
Convert Categorical Variables to Numeric in R
How to Select Variables in an R Dataframe Whose Names Contain a Particular String
How to Add a Diagonal Line to a Plot
Gsub a Every Element After a Keyword in R
Change the Class from Factor to Numeric of Many Columns in a Data Frame
Regex Expression to Match Decimal Numbers With Comma as a Separator