Replace missing values in Pandas with previous value if not NAN
I found a way to achieve this using the pd.merge_asof() function. If it doesn't find the keyvalue to merge, it gives you the previous one. Sorting is crucial, though.
It works just as the excel lookup (NOT VLOOK UP, but LOOKUP -without the v or the h-).
thanks everyone!
Replace missing values if previous and next values are consistent
- Pivot longer to make use of
tidyr::fill()
. - Use
fill()
to createfill_down
andfill_up
columns, which will include the previous and next non-missing values, respectively. - If previous non-missing == next non-missing, use that value; otherwise keep value as is. (This will also keep non-missing values as is, because in this case previous non-missing will always == next non-missing.)
- Pivot back to original format.
library(tidyverse)
df_filled <- df %>%
pivot_longer(!ID) %>%
mutate(
fill_down = value,
fill_up = value
) %>%
group_by(ID) %>%
fill(fill_down) %>%
fill(fill_up, .direction = "up") %>%
mutate(value = if_else(fill_down == fill_up, fill_down, value)) %>%
ungroup() %>%
pivot_wider(id_cols = ID)
df_filled
# # A tibble: 5 x 6
# ID Var1 Var2 Var3 Var4 Var5
# <dbl> <chr> <chr> <chr> <chr> <chr>
# 1 1 A A A A A
# 2 2 B C NA NA B
# 3 3 A A A A A
# 4 4 A B B B B
# 5 5 C NA B B B
Replace missing values with previous values in Julia Data Frame
This is the way to do it using Impute.jl:
julia> using Impute, DataFrames
julia> df = DataFrame(dt1=[0.2, missing, missing, 1, missing, 5, 6],
dt2=[0.3, missing, missing, 3, missing, 5, 6])
7×2 DataFrame
Row │ dt1 dt2
│ Float64? Float64?
─────┼──────────────────────
1 │ 0.2 0.3
2 │ missing missing
3 │ missing missing
4 │ 1.0 3.0
5 │ missing missing
6 │ 5.0 5.0
7 │ 6.0 6.0
julia> transform(df, names(df) .=> Impute.locf, renamecols=false)
7×2 DataFrame
Row │ dt1 dt2
│ Float64? Float64?
─────┼────────────────────
1 │ 0.2 0.3
2 │ 0.2 0.3
3 │ 0.2 0.3
4 │ 1.0 3.0
5 │ 1.0 3.0
6 │ 5.0 5.0
7 │ 6.0 6.0
Replace value with previous row value
Does this work:
library(dplyr)
library(tidyr)
df %>% mutate(DSWP10 = as.numeric(na_if(DSWP10, '.'))) %>% fill(DSWP10, .direction = 'up')
# A tibble: 7 x 2
Date DSWP10
<chr> <dbl>
1 07/01/2015 2.1
2 06/01/2015 1.99
3 05/01/2015 1.99
4 04/01/2015 1.99
5 03/01/2015 1.98
6 02/01/2015 1.95
7 01/01/2015 1.95
How to replace NaNs by preceding or next values in pandas DataFrame?
You could use the fillna
method on the DataFrame and specify the method as ffill
(forward fill):
>>> df = pd.DataFrame([[1, 2, 3], [4, None, None], [None, None, 9]])
>>> df.fillna(method='ffill')
0 1 2
0 1 2 3
1 4 2 3
2 4 2 9
This method...
propagate[s] last valid observation forward to next valid
To go the opposite way, there's also a bfill
method.
This method doesn't modify the DataFrame inplace - you'll need to rebind the returned DataFrame to a variable or else specify inplace=True
:
df.fillna(method='ffill', inplace=True)
Replace timeseries missing values with previous years value
The fix is to use pd.isna(x['traffic'])
instead of x['traffic_volume']==np.NaN
for the if
condition in the lambda
. I still don't understand why the initial line ran but didn't impute.
Handling missing values in time series replacing with previous values
The following code works perfectly
df1<- df %>%
complete(Timestamp = seq(min(Timestamp), max(Timestamp), by = "sec")) %>%
fill(everything()) %>%
mutate(ID = row_number())
It adds missing data with the previous or last value before the missing data time is started.
Fill missing values with previous values by row using dplyr
One solution could be using na.locf
function from package zoo
combining with purrr::pmap
function in a row-wise operation. na.locf
takes the most recent non-NA
value and replace all the upcoming NA
values by that. Just as a reminder c(...)
in both solutions captures all values of V1:V4
in each row in every iteration. However, I excluded id
column in both as it is not involved in the our calculations.
library(zoo)
library(purrr)
df %>%
mutate(pmap_df(., ~ na.locf(c(...)[-1])))
id V1 V2 V3 V4
1 01 1 1 1 1
2 02 2 1 1 1
3 03 3 1 1 1
4 04 4 1 2 2
Or we can use coalesce
function from dplyr
. We can replace every NA
values in each row with the last non-NA
value, something we did earlier with na.locf
. However this solution is a bit verbose:
df %>%
mutate(pmap_df(., ~ {x <- c(...)[!is.na(c(...))];
coalesce(c(...), x[length(x)])}))
id V1 V2 V3 V4
1 01 1 1 1 1
2 02 2 1 1 1
3 03 3 1 1 1
4 04 4 1 2 2
Or you could also use this:
library(purrr)
df %>%
mutate(across(!id, ~ replace(., is.na(.), invoke(coalesce, rev(df[-1])))))
id V1 V2 V3 V4
1 01 1 1 1 1
2 02 2 1 1 1
3 03 3 1 1 1
4 04 4 1 2 2
The warning message can be ignored. It is in fact produced because we have 6 NA
values but the result of applying dplyr::coalesce
on every vector is 1 element resulting in 4 elements to replace 6 slots.
Related Topics
Initialize an Empty Tibble with Column Names and 0 Rows
Fixing Set.Seed for an Entire Session
R: Text Progress Bar in for Loop
Arrange a Grouped_Df by Group Variable Not Working
How to Rank Within Groups in R
How to Change Font Size of the Correlation Coefficient in Corrplot
Sendmailr (Part2): Sending Files as Mail Attachments
Update Graph/Plot with Fixed Interval of Time
How to Write a Function That Calls a Function That Calls Data.Table
Plotting Envfit Vectors (Vegan Package) in Ggplot2
Differencebetween These Two Comparisons
R - How to Find Points Within Specific Contour
How to Do a Regression of a Series of Variables Without Typing Each Variable Name
R- Converting Data from Fraction to Decimal
How to Make the Legend in Ggplot2 the Same Height as My Plot
Using Filter_ in Dplyr Where Both Field and Value Are in Variables
Add Vline to Existing Plot and Have It Appear in Ggplot2 Legend