R Replacing Zeros in Dataframe with Next Non Zero Value

R replacing zeros in dataframe with next non zero value

Here's one way with na.locf from zoo. Although this method does change some values to NA in the process, the code is nice and painless.

library(zoo)
na.locf(with(df, ifelse(b == 0, NA_real_, b)), fromLast = TRUE)
# [1] 1 -1 -1 -1 -1 -1 -1 1 1 1

An alternative to this, and one that might be faster than ifelse on long vectors, is

na.locf(with(df, { is.na(b) <- b == 0; b }), fromLast = TRUE)
# [1] 1 -1 -1 -1 -1 -1 -1 1 1 1

Replace 0 values in dataframe with nearest non-0 value

You could do it like this, first define a function that does the trick like this:

zero_replacer <- function(x) {
#Find zero values
is_zero <- which(x == 0)
#Find stretches of zero values
zero_stretches <- cumsum(!dplyr::lag(is_zero, default = is_zero[1] - 1) + 1 == is_zero) + 1
#Replace each stretch with prev value
for (i in unique(zero_stretches )) {
idx_stretch <- zero_stretches == i
idx_zero <- is_zero[idx_stretch]
idx_val <- idx_zero[1] - 1
x[idx_zero] <- x[idx_val]
}
x
}

Then you can use this function in mutate like this:

tibble::tibble(# Generate some dummy data
a = c(1:5, 0, 0, 0, 1:4, rep(0, times = 10))
) %>%
dplyr::mutate(
#Replace zero with prev non-zero
a = zero_replacer(a)
)

Output:

# A tibble: 22 x 1
a
<dbl>
1 1
2 2
3 3
4 4
5 5
6 5
7 5
8 5
9 1
10 2
11 3
12 4
13 4
14 4
15 4
16 4
17 4
18 4
19 4
20 4
21 4
22 4

Replacing zeroes with NA for values preceding non-zero

There are three issues. First, writing:

df <- cbind(stock1,stock2,stock3,stock4)

doesn't create a data frame. It creates a matrix. This is an issue when you try to use lapply, which will operate over the columns of a data frame but over the elements of a matrix. Instead, you should write:

df <- data.frame(stock1,stock2,stock3,stock4)

Second, the function you're using in lapply needs to return the modified vector. Otherwise, the return value will be something unexpected (in this case, the assignment will return a single NA, and the lapply will return a data frame of one row of NAs instead of the data frame you want).

Third, you need to take care with 1:n when n can be zero (i.e., when the first stock quote is non-zero) because 1:0 gives the sequence c(1,0) instead of an empty sequence. (This is arguably one of R's stupidest features.)

Therefore, the following will give you what you want:

stock1 <- c(0.01, -0.02, 0.01, 0.05, 0.04, -0.02)
stock2 <- c(0, 0, 0.02, 0.04, -0.03, 0.02)
stock3 <- c(0, 0, 0.02, 0, -0.01, 0.03)
stock4 <- c(0, -0.02, 0.01, 0, 0, -0.02)
df <- data.frame(stock1,stock2,stock3,stock4)

as.data.frame(lapply(df, function(x) {
n <- min(which(x != 0)) - 1
if (n > 0)
x[1:n] <- NA
x
}))

The output is as expected:

  stock1 stock2 stock3 stock4
1 0.01 NA NA NA
2 -0.02 NA NA -0.02
3 0.01 0.02 0.02 0.01
4 0.05 0.04 0.00 0.00
5 0.04 -0.03 -0.01 0.00
6 -0.02 0.02 0.03 -0.02

Update: As @Daniel_Fischer notes, there's a clever trick to avoid the 1:0 problem. You can instead write:

as.data.frame(lapply(df, function(x) {
n <- min(which(x != 0)) - 1
x[0:n] <- NA # use 0:n instead of 1:n
x
}))

This takes advantage of the fact that R ignores zeros in this type of indexing operation, so:

x[0:0] <- NA    # same as x[0] <- NA and does nothing
x[0:1] <- NA # same as x[1] <- NA
x[0:2] <- NA # same as x[1:2] <- NA, etc.

Replace 0s with next non-zero value in column R

output=input[!!input][cumsum(!!input)+1]
#[1] 1 1 1 1 2 2 2 2 2 2 2 1 1 2 2 2 NA

We take advantage of how R coerces numbers to logicals. as.logical(0:2) will return FALSE TRUE TRUE. Zeros become FALSE and the other numbers are considered TRUE. Putting the negation exclamation point in front of input coerces it to logical. I could have written as.logical(input), it's just a trick to save a few keystrokes. So we use that logical index to subset the non-zero values with input[!!input]. The cumulative sum of a logical index cumsum(!!input)+1 creates a quick way to index on the change points when you add one to it. It helps to run each part separately.

Replace 0 values by using previous value in R

You can use the dplyr and tidyr packages.

library(dplyr)
library(tidyr)

df <- data.frame(var = c(1,2,3,0,7,8,0,0,9,8,9,0,0,0,4,7,2,4))
df <- df %>%
dplyr::mutate(var = ifelse(var == 0, NA, var)) %>%
tidyr::fill(var, .direction = c("down"))
df
> df
var
1 1
2 2
3 3
4 3
5 7
6 8
7 8
8 8
9 9
10 8
11 9
12 9
13 9
14 9
15 4
16 7
17 2
18 4

Fill NA row with previous value iteratively in R

Replace 0's with previous non-zero value per ID (lag)

Here's a tidyverse approach:

library(tidyverse)
df %>%
group_by(ID) %>%
mutate(x = replace(Var1, cumsum(Var1 !=0) > 0 & Var1 == 0, NA)) %>%
fill(x)
# # A tibble: 11 x 4
# # Groups: ID [2]
# ID Var1 res x
# <dbl> <dbl> <dbl> <dbl>
# 1 1. 0. 0. 0.
# 2 1. 10. 10. 10.
# 3 1. 30. 30. 30.
# 4 1. 0. 30. 30.
# 5 1. 0. 30. 30.
# 6 1. 50. 50. 50.
# 7 1. 80. 80. 80.
# 8 2. 0. 0. 0.
# 9 2. 0. 0. 0.
# 10 2. 57. 57. 57.
# 11 2. 0. 57. 57.

In the mutate step, we replace 0's with NA except for those that are at the beginning of each ID-run because in those cases we have no values to replace the NAs afterwards.


If you have multiple columns to adjust, you can use:

df %>% 
group_by(ID) %>%
mutate_at(vars(starts_with("Var")),
funs(replace(., cumsum(. !=0) > 0 & . == 0, NA))) %>%
fill(starts_with("Var"))

where df could be:

df <- data.frame(ID = c(1,1,1,1,1,1,1,2,2,2,2),
Var1 = c(0,10, 30, 0, 0,50,80,0, 0, 57, 0),
Var2 = c(4,0, 30, 0, 0,50,0,16, 0, 57, 0))

How to replace zero values in between some non-zero values in pandas?

Replace 0 to missing values and then forward filling missing values with limit parameter, but only for rows if not missing values for forward and back filling values:

s = df[0].replace(0, np.nan)
ff = s.ffill(limit=1)
bf = s.bfill(limit=1)

df['new'] = np.where(ff.notna() & bf.notna(), ff, 0).astype(int)
print (df)
0 new
0 5 5
1 5 5
2 0 5
3 5 5
4 5 5
5 0 0
6 0 0
7 0 0
8 0 0
9 0 0
10 1 1
11 1 1
12 0 1
13 1 1
14 1 1
15 0 1
16 3 3
17 3 3
18 3 3
19 0 0
20 0 0
21 0 0
22 0 0


Related Topics



Leave a reply



Submit