R replacing zeros in dataframe with next non zero value
Here's one way with na.locf
from zoo
. Although this method does change some values to NA
in the process, the code is nice and painless.
library(zoo)
na.locf(with(df, ifelse(b == 0, NA_real_, b)), fromLast = TRUE)
# [1] 1 -1 -1 -1 -1 -1 -1 1 1 1
An alternative to this, and one that might be faster than ifelse
on long vectors, is
na.locf(with(df, { is.na(b) <- b == 0; b }), fromLast = TRUE)
# [1] 1 -1 -1 -1 -1 -1 -1 1 1 1
Replace 0 values in dataframe with nearest non-0 value
You could do it like this, first define a function that does the trick like this:
zero_replacer <- function(x) {
#Find zero values
is_zero <- which(x == 0)
#Find stretches of zero values
zero_stretches <- cumsum(!dplyr::lag(is_zero, default = is_zero[1] - 1) + 1 == is_zero) + 1
#Replace each stretch with prev value
for (i in unique(zero_stretches )) {
idx_stretch <- zero_stretches == i
idx_zero <- is_zero[idx_stretch]
idx_val <- idx_zero[1] - 1
x[idx_zero] <- x[idx_val]
}
x
}
Then you can use this function in mutate like this:
tibble::tibble(# Generate some dummy data
a = c(1:5, 0, 0, 0, 1:4, rep(0, times = 10))
) %>%
dplyr::mutate(
#Replace zero with prev non-zero
a = zero_replacer(a)
)
Output:
# A tibble: 22 x 1
a
<dbl>
1 1
2 2
3 3
4 4
5 5
6 5
7 5
8 5
9 1
10 2
11 3
12 4
13 4
14 4
15 4
16 4
17 4
18 4
19 4
20 4
21 4
22 4
Replacing zeroes with NA for values preceding non-zero
There are three issues. First, writing:
df <- cbind(stock1,stock2,stock3,stock4)
doesn't create a data frame. It creates a matrix. This is an issue when you try to use lapply
, which will operate over the columns of a data frame but over the elements of a matrix. Instead, you should write:
df <- data.frame(stock1,stock2,stock3,stock4)
Second, the function you're using in lapply
needs to return the modified vector. Otherwise, the return value will be something unexpected (in this case, the assignment will return a single NA
, and the lapply
will return a data frame of one row of NA
s instead of the data frame you want).
Third, you need to take care with 1:n
when n
can be zero (i.e., when the first stock quote is non-zero) because 1:0
gives the sequence c(1,0)
instead of an empty sequence. (This is arguably one of R's stupidest features.)
Therefore, the following will give you what you want:
stock1 <- c(0.01, -0.02, 0.01, 0.05, 0.04, -0.02)
stock2 <- c(0, 0, 0.02, 0.04, -0.03, 0.02)
stock3 <- c(0, 0, 0.02, 0, -0.01, 0.03)
stock4 <- c(0, -0.02, 0.01, 0, 0, -0.02)
df <- data.frame(stock1,stock2,stock3,stock4)
as.data.frame(lapply(df, function(x) {
n <- min(which(x != 0)) - 1
if (n > 0)
x[1:n] <- NA
x
}))
The output is as expected:
stock1 stock2 stock3 stock4
1 0.01 NA NA NA
2 -0.02 NA NA -0.02
3 0.01 0.02 0.02 0.01
4 0.05 0.04 0.00 0.00
5 0.04 -0.03 -0.01 0.00
6 -0.02 0.02 0.03 -0.02
Update: As @Daniel_Fischer notes, there's a clever trick to avoid the 1:0
problem. You can instead write:
as.data.frame(lapply(df, function(x) {
n <- min(which(x != 0)) - 1
x[0:n] <- NA # use 0:n instead of 1:n
x
}))
This takes advantage of the fact that R ignores zeros in this type of indexing operation, so:
x[0:0] <- NA # same as x[0] <- NA and does nothing
x[0:1] <- NA # same as x[1] <- NA
x[0:2] <- NA # same as x[1:2] <- NA, etc.
Replace 0s with next non-zero value in column R
output=input[!!input][cumsum(!!input)+1]
#[1] 1 1 1 1 2 2 2 2 2 2 2 1 1 2 2 2 NA
We take advantage of how R coerces numbers to logicals. as.logical(0:2)
will return FALSE TRUE TRUE
. Zeros become FALSE
and the other numbers are considered TRUE
. Putting the negation exclamation point in front of input
coerces it to logical. I could have written as.logical(input)
, it's just a trick to save a few keystrokes. So we use that logical index to subset the non-zero values with input[!!input]
. The cumulative sum of a logical index cumsum(!!input)+1
creates a quick way to index on the change points when you add one to it. It helps to run each part separately.
Replace 0 values by using previous value in R
You can use the dplyr and tidyr packages.
library(dplyr)
library(tidyr)
df <- data.frame(var = c(1,2,3,0,7,8,0,0,9,8,9,0,0,0,4,7,2,4))
df <- df %>%
dplyr::mutate(var = ifelse(var == 0, NA, var)) %>%
tidyr::fill(var, .direction = c("down"))
df
> df
var
1 1
2 2
3 3
4 3
5 7
6 8
7 8
8 8
9 9
10 8
11 9
12 9
13 9
14 9
15 4
16 7
17 2
18 4
Fill NA row with previous value iteratively in R
Replace 0's with previous non-zero value per ID (lag)
Here's a tidyverse approach:
library(tidyverse)
df %>%
group_by(ID) %>%
mutate(x = replace(Var1, cumsum(Var1 !=0) > 0 & Var1 == 0, NA)) %>%
fill(x)
# # A tibble: 11 x 4
# # Groups: ID [2]
# ID Var1 res x
# <dbl> <dbl> <dbl> <dbl>
# 1 1. 0. 0. 0.
# 2 1. 10. 10. 10.
# 3 1. 30. 30. 30.
# 4 1. 0. 30. 30.
# 5 1. 0. 30. 30.
# 6 1. 50. 50. 50.
# 7 1. 80. 80. 80.
# 8 2. 0. 0. 0.
# 9 2. 0. 0. 0.
# 10 2. 57. 57. 57.
# 11 2. 0. 57. 57.
In the mutate step, we replace 0's with NA except for those that are at the beginning of each ID-run because in those cases we have no values to replace the NAs afterwards.
If you have multiple columns to adjust, you can use:
df %>%
group_by(ID) %>%
mutate_at(vars(starts_with("Var")),
funs(replace(., cumsum(. !=0) > 0 & . == 0, NA))) %>%
fill(starts_with("Var"))
where df could be:
df <- data.frame(ID = c(1,1,1,1,1,1,1,2,2,2,2),
Var1 = c(0,10, 30, 0, 0,50,80,0, 0, 57, 0),
Var2 = c(4,0, 30, 0, 0,50,0,16, 0, 57, 0))
How to replace zero values in between some non-zero values in pandas?
Replace 0
to missing values and then forward filling missing values with limit
parameter, but only for rows if not missing values for forward and back filling values:
s = df[0].replace(0, np.nan)
ff = s.ffill(limit=1)
bf = s.bfill(limit=1)
df['new'] = np.where(ff.notna() & bf.notna(), ff, 0).astype(int)
print (df)
0 new
0 5 5
1 5 5
2 0 5
3 5 5
4 5 5
5 0 0
6 0 0
7 0 0
8 0 0
9 0 0
10 1 1
11 1 1
12 0 1
13 1 1
14 1 1
15 0 1
16 3 3
17 3 3
18 3 3
19 0 0
20 0 0
21 0 0
22 0 0
Related Topics
Calculate Row Means Based on (Partial) Matching Column Names
Change Date Print Format from Yyyy-Mm-Dd to Dd-Mm-Yyyy
Change Value to Percentage of Row in R
Color Bar Missing in Ggplot Legend, Windows Remote Desktop
Convert Month's Number to Month Name
Return Call from Ggplot Object
In R, How to Split Timestamp Interval Data into Regular Slots
How to Avoid Density Curve Getting Cut Off in Plot
How to Automate Nested Sections in Rmds Which Include Text, Maps and Tables
Splitting Text to Words with R and Csplit()
How to Extract Text from R's Help Command
How to Plot a Boxplot with Correctly Spaced Continuous X-Axis Values in Ggplot2
Caret Error: "All the Accuracy Metric Values Are Missing"
Replacing for Loop with Foreach Loop