Add missing value in column with value from row above
The tidyr packages has the fill()
function which does the trick.
df1 <- data.frame(var1 = c("a",NA,NA,"b",NA), stringsAsFactors = FALSE)
df1 %>% fill(var1)
How to impute missing values with mean from row above and below python?
You can replace "-" to NaN and use interpolate
which by default fills missing values linearly. If there is only one missing value, then it would be akin to taking the mean of the top and bottom value of the missing value:
df = df.replace('-', np.nan)
df = df.interpolate()
Pandas Filling Missing Values Down Based on Row Above
You can use np.where
by looking at where the forward-fill is equal to one, filling 1 where it's True, and falling back to the value of 'col2' when it's False:
df['col2'] = np.where(df['col2'].ffill() == 1, 1, df['col2'])
The resulting output:
col1 col2
0 1 NaN
1 3 1.0
2 3 1.0
3 1 1.0
4 2 1.0
5 3 1.0
6 2 1.0
7 2 2.0
8 1 NaN
Fill missing values with previous values by row using dplyr
One solution could be using na.locf
function from package zoo
combining with purrr::pmap
function in a row-wise operation. na.locf
takes the most recent non-NA
value and replace all the upcoming NA
values by that. Just as a reminder c(...)
in both solutions captures all values of V1:V4
in each row in every iteration. However, I excluded id
column in both as it is not involved in the our calculations.
library(zoo)
library(purrr)
df %>%
mutate(pmap_df(., ~ na.locf(c(...)[-1])))
id V1 V2 V3 V4
1 01 1 1 1 1
2 02 2 1 1 1
3 03 3 1 1 1
4 04 4 1 2 2
Or we can use coalesce
function from dplyr
. We can replace every NA
values in each row with the last non-NA
value, something we did earlier with na.locf
. However this solution is a bit verbose:
df %>%
mutate(pmap_df(., ~ {x <- c(...)[!is.na(c(...))];
coalesce(c(...), x[length(x)])}))
id V1 V2 V3 V4
1 01 1 1 1 1
2 02 2 1 1 1
3 03 3 1 1 1
4 04 4 1 2 2
Or you could also use this:
library(purrr)
df %>%
mutate(across(!id, ~ replace(., is.na(.), invoke(coalesce, rev(df[-1])))))
id V1 V2 V3 V4
1 01 1 1 1 1
2 02 2 1 1 1
3 03 3 1 1 1
4 04 4 1 2 2
The warning message can be ignored. It is in fact produced because we have 6 NA
values but the result of applying dplyr::coalesce
on every vector is 1 element resulting in 4 elements to replace 6 slots.
How to copy missing column values from previous row in pandas
Try using ffill
. For example:
df = pd.read_csv("pivot-products.csv")
df["product_id"] = df["product_id"].ffill()
Shift specific rows to correct missing values in a Pandas Dataframe
@ti7's suggestion is spot on; split the dataframe into individual frames, merge and fillna :
sensor1 = df.filter(like='1')
sensor2 = df.filter(like='2')
(sensor1.merge(sensor2,
how = 'outer',
left_on='TimeStamp1',
right_on = 'TimeStamp2',
sort = True)
.fillna({"TimeStamp2" : df.TimeStamp1})
.dropna(subset=['TimeStamp1'])
)
TimeStamp1 Sensor1 TimeStamp2 Sensor2
0 08:00 100.0 08:00 60.0
1 08:05 102.0 08:05 NaN
2 08:10 105.0 08:10 40.0
3 08:15 101.0 08:15 50.0
4 08:20 103.0 08:20 NaN
5 08:25 104.0 08:25 31.0
Fill missing value by averaging previous row value
rolling
+ mean
+ shift
You will need to modify the below logic to interpret the mean of NaN
and another value, in the case where one of the previous two values are null.
df = df.fillna(df.rolling(2).mean().shift())
print(df)
A B C D
0 NaN 2.0 NaN 0.0
1 3.0 4.0 NaN 1.0
2 NaN 3.0 NaN 5.0
3 NaN 3.0 NaN 3.0
Python - fill NA by value from previous rows based on identifier column
I believe you need GroupBy.ffill
with DataFrame.reindex
for same order like original DataFrame
:
df = df.groupby('Cat1').ffill().reindex(df.columns, axis=1)
print (df)
Day Date Cat1 Cat2
0 1 31/12/17 cat mouse
1 2 01/09/18 cat mouse
2 3 27/05/18 dog elephant
3 4 01/09/18 cat mouse
4 5 01/09/18 cat mouse
Fill in missing pandas data with previous non-missing value, grouped by key
You could perform a groupby/forward-fill operation on each group:
import numpy as np
import pandas as pd
df = pd.DataFrame({'id': [1,1,2,2,1,2,1,1], 'x':[10,20,100,200,np.nan,np.nan,300,np.nan]})
df['x'] = df.groupby(['id'])['x'].ffill()
print(df)
yields
id x
0 1 10.0
1 1 20.0
2 2 100.0
3 2 200.0
4 1 20.0
5 2 200.0
6 1 300.0
7 1 300.0
Related Topics
Setting Default Number of Decimal Places for Printing
How Do Add a Column in a Data Frame in R
How to Increase Stack Space Overflow for Pandoc in R
Lme4::Glmer VS. Stata's Melogit Command
Ggplot2 Aes_String() Fails to Handle Names Starting with Numbers or Containing Spaces
Evaluate Inline R Code in Rmarkdown Figure Caption
Chain Arithmetic Operators in Dplyr with %>% Pipe
R - File.Choose() Customizing Dialogue Window
Easiest Way to Discretize Continuous Scales for Ggplot2 Color Scales
Rcmdr Launch Error in Yosemite (Os X 10.10)
Get Stack Trace on Trycatch'Ed Error in R
Xpath to Extract Text After Br Tags in R
Model Matrix with All Pairwise Interactions Between Columns