Replace Missing Value with Previous Value

Replace missing values in Pandas with previous value if not NAN

I found a way to achieve this using the pd.merge_asof() function. If it doesn't find the keyvalue to merge, it gives you the previous one. Sorting is crucial, though.

It works just as the excel lookup (NOT VLOOK UP, but LOOKUP -without the v or the h-).

thanks everyone!

Replace missing values if previous and next values are consistent

  1. Pivot longer to make use of tidyr::fill().
  2. Use fill() to create fill_down and fill_up columns, which will include the previous and next non-missing values, respectively.
  3. If previous non-missing == next non-missing, use that value; otherwise keep value as is. (This will also keep non-missing values as is, because in this case previous non-missing will always == next non-missing.)
  4. Pivot back to original format.
library(tidyverse)

df_filled <- df %>%
pivot_longer(!ID) %>%
mutate(
fill_down = value,
fill_up = value
) %>%
group_by(ID) %>%
fill(fill_down) %>%
fill(fill_up, .direction = "up") %>%
mutate(value = if_else(fill_down == fill_up, fill_down, value)) %>%
ungroup() %>%
pivot_wider(id_cols = ID)

df_filled
# # A tibble: 5 x 6
# ID Var1 Var2 Var3 Var4 Var5
# <dbl> <chr> <chr> <chr> <chr> <chr>
# 1 1 A A A A A
# 2 2 B C NA NA B
# 3 3 A A A A A
# 4 4 A B B B B
# 5 5 C NA B B B

Replace missing values with previous values in Julia Data Frame

This is the way to do it using Impute.jl:

julia> using Impute, DataFrames

julia> df = DataFrame(dt1=[0.2, missing, missing, 1, missing, 5, 6],
dt2=[0.3, missing, missing, 3, missing, 5, 6])
7×2 DataFrame
Row │ dt1 dt2
│ Float64? Float64?
─────┼──────────────────────
1 │ 0.2 0.3
2 │ missing missing
3 │ missing missing
4 │ 1.0 3.0
5 │ missing missing
6 │ 5.0 5.0
7 │ 6.0 6.0

julia> transform(df, names(df) .=> Impute.locf, renamecols=false)
7×2 DataFrame
Row │ dt1 dt2
│ Float64? Float64?
─────┼────────────────────
1 │ 0.2 0.3
2 │ 0.2 0.3
3 │ 0.2 0.3
4 │ 1.0 3.0
5 │ 1.0 3.0
6 │ 5.0 5.0
7 │ 6.0 6.0

Replace value with previous row value

Does this work:

library(dplyr)
library(tidyr)
df %>% mutate(DSWP10 = as.numeric(na_if(DSWP10, '.'))) %>% fill(DSWP10, .direction = 'up')
# A tibble: 7 x 2
Date DSWP10
<chr> <dbl>
1 07/01/2015 2.1
2 06/01/2015 1.99
3 05/01/2015 1.99
4 04/01/2015 1.99
5 03/01/2015 1.98
6 02/01/2015 1.95
7 01/01/2015 1.95

How to replace NaNs by preceding or next values in pandas DataFrame?

You could use the fillna method on the DataFrame and specify the method as ffill (forward fill):

>>> df = pd.DataFrame([[1, 2, 3], [4, None, None], [None, None, 9]])
>>> df.fillna(method='ffill')
0 1 2
0 1 2 3
1 4 2 3
2 4 2 9

This method...

propagate[s] last valid observation forward to next valid

To go the opposite way, there's also a bfill method.

This method doesn't modify the DataFrame inplace - you'll need to rebind the returned DataFrame to a variable or else specify inplace=True:

df.fillna(method='ffill', inplace=True)

Replace timeseries missing values with previous years value

The fix is to use pd.isna(x['traffic']) instead of x['traffic_volume']==np.NaN for the if condition in the lambda. I still don't understand why the initial line ran but didn't impute.

Handling missing values in time series replacing with previous values

The following code works perfectly

 df1<- df %>%
complete(Timestamp = seq(min(Timestamp), max(Timestamp), by = "sec")) %>%
fill(everything()) %>%
mutate(ID = row_number())

It adds missing data with the previous or last value before the missing data time is started.

Fill missing values with previous values by row using dplyr

One solution could be using na.locf function from package zoo combining with purrr::pmap function in a row-wise operation. na.locf takes the most recent non-NA value and replace all the upcoming NA values by that. Just as a reminder c(...) in both solutions captures all values of V1:V4 in each row in every iteration. However, I excluded id column in both as it is not involved in the our calculations.

library(zoo)
library(purrr)

df %>%
mutate(pmap_df(., ~ na.locf(c(...)[-1])))

id V1 V2 V3 V4
1 01 1 1 1 1
2 02 2 1 1 1
3 03 3 1 1 1
4 04 4 1 2 2

Or we can use coalesce function from dplyr. We can replace every NA values in each row with the last non-NA value, something we did earlier with na.locf. However this solution is a bit verbose:

df %>%
mutate(pmap_df(., ~ {x <- c(...)[!is.na(c(...))];
coalesce(c(...), x[length(x)])}))

id V1 V2 V3 V4
1 01 1 1 1 1
2 02 2 1 1 1
3 03 3 1 1 1
4 04 4 1 2 2

Or you could also use this:

library(purrr)

df %>%
mutate(across(!id, ~ replace(., is.na(.), invoke(coalesce, rev(df[-1])))))

id V1 V2 V3 V4
1 01 1 1 1 1
2 02 2 1 1 1
3 03 3 1 1 1
4 04 4 1 2 2

The warning message can be ignored. It is in fact produced because we have 6 NA values but the result of applying dplyr::coalesce on every vector is 1 element resulting in 4 elements to replace 6 slots.



Related Topics



Leave a reply



Submit