Add Missing Value in Column with Value from Row Above

Add missing value in column with value from row above

The tidyr packages has the fill() function which does the trick.

df1 <- data.frame(var1 = c("a",NA,NA,"b",NA), stringsAsFactors = FALSE)
df1 %>% fill(var1)

How to impute missing values with mean from row above and below python?

You can replace "-" to NaN and use interpolate which by default fills missing values linearly. If there is only one missing value, then it would be akin to taking the mean of the top and bottom value of the missing value:

df = df.replace('-', np.nan)
df = df.interpolate()

Pandas Filling Missing Values Down Based on Row Above

You can use np.where by looking at where the forward-fill is equal to one, filling 1 where it's True, and falling back to the value of 'col2' when it's False:

df['col2'] = np.where(df['col2'].ffill() == 1, 1, df['col2'])

The resulting output:

   col1  col2
0 1 NaN
1 3 1.0
2 3 1.0
3 1 1.0
4 2 1.0
5 3 1.0
6 2 1.0
7 2 2.0
8 1 NaN

Fill missing values with previous values by row using dplyr

One solution could be using na.locf function from package zoo combining with purrr::pmap function in a row-wise operation. na.locf takes the most recent non-NA value and replace all the upcoming NA values by that. Just as a reminder c(...) in both solutions captures all values of V1:V4 in each row in every iteration. However, I excluded id column in both as it is not involved in the our calculations.

library(zoo)
library(purrr)

df %>%
mutate(pmap_df(., ~ na.locf(c(...)[-1])))

id V1 V2 V3 V4
1 01 1 1 1 1
2 02 2 1 1 1
3 03 3 1 1 1
4 04 4 1 2 2

Or we can use coalesce function from dplyr. We can replace every NA values in each row with the last non-NA value, something we did earlier with na.locf. However this solution is a bit verbose:

df %>%
mutate(pmap_df(., ~ {x <- c(...)[!is.na(c(...))];
coalesce(c(...), x[length(x)])}))

id V1 V2 V3 V4
1 01 1 1 1 1
2 02 2 1 1 1
3 03 3 1 1 1
4 04 4 1 2 2

Or you could also use this:

library(purrr)

df %>%
mutate(across(!id, ~ replace(., is.na(.), invoke(coalesce, rev(df[-1])))))

id V1 V2 V3 V4
1 01 1 1 1 1
2 02 2 1 1 1
3 03 3 1 1 1
4 04 4 1 2 2

The warning message can be ignored. It is in fact produced because we have 6 NA values but the result of applying dplyr::coalesce on every vector is 1 element resulting in 4 elements to replace 6 slots.

How to copy missing column values from previous row in pandas

Try using ffill. For example:

df = pd.read_csv("pivot-products.csv")
df["product_id"] = df["product_id"].ffill()

Shift specific rows to correct missing values in a Pandas Dataframe

@ti7's suggestion is spot on; split the dataframe into individual frames, merge and fillna :

sensor1 = df.filter(like='1')
sensor2 = df.filter(like='2')
(sensor1.merge(sensor2,
how = 'outer',
left_on='TimeStamp1',
right_on = 'TimeStamp2',
sort = True)
.fillna({"TimeStamp2" : df.TimeStamp1})
.dropna(subset=['TimeStamp1'])
)
TimeStamp1 Sensor1 TimeStamp2 Sensor2
0 08:00 100.0 08:00 60.0
1 08:05 102.0 08:05 NaN
2 08:10 105.0 08:10 40.0
3 08:15 101.0 08:15 50.0
4 08:20 103.0 08:20 NaN
5 08:25 104.0 08:25 31.0

Fill missing value by averaging previous row value

rolling + mean + shift

You will need to modify the below logic to interpret the mean of NaN and another value, in the case where one of the previous two values are null.

df = df.fillna(df.rolling(2).mean().shift())

print(df)

A B C D
0 NaN 2.0 NaN 0.0
1 3.0 4.0 NaN 1.0
2 NaN 3.0 NaN 5.0
3 NaN 3.0 NaN 3.0

Python - fill NA by value from previous rows based on identifier column

I believe you need GroupBy.ffill with DataFrame.reindex for same order like original DataFrame:

df = df.groupby('Cat1').ffill().reindex(df.columns, axis=1)
print (df)
Day Date Cat1 Cat2
0 1 31/12/17 cat mouse
1 2 01/09/18 cat mouse
2 3 27/05/18 dog elephant
3 4 01/09/18 cat mouse
4 5 01/09/18 cat mouse

Fill in missing pandas data with previous non-missing value, grouped by key

You could perform a groupby/forward-fill operation on each group:

import numpy as np
import pandas as pd

df = pd.DataFrame({'id': [1,1,2,2,1,2,1,1], 'x':[10,20,100,200,np.nan,np.nan,300,np.nan]})
df['x'] = df.groupby(['id'])['x'].ffill()
print(df)

yields

   id      x
0 1 10.0
1 1 20.0
2 2 100.0
3 2 200.0
4 1 20.0
5 2 200.0
6 1 300.0
7 1 300.0


Related Topics



Leave a reply



Submit