Forward Fill Specific Columns in Pandas Dataframe

forward fill specific columns in pandas dataframe

tl;dr:

cols = ['X', 'Y']
df.loc[:,cols] = df.loc[:,cols].ffill()

And I have also added a self containing example:

>>> import pandas as pd
>>> import numpy as np
>>>
>>> ## create dataframe
... ts1 = [0, 1, np.nan, np.nan, np.nan, np.nan]
>>> ts2 = [0, 2, np.nan, 3, np.nan, np.nan]
>>> d = {'X': ts1, 'Y': ts2, 'Z': ts2}
>>> df = pd.DataFrame(data=d)
>>> print(df.head())
X Y Z
0 0 0 0
1 1 2 2
2 NaN NaN NaN
3 NaN 3 3
4 NaN NaN NaN
>>>
>>> ## apply forward fill
... cols = ['X', 'Y']
>>> df.loc[:,cols] = df.loc[:,cols].ffill()
>>> print(df.head())
X Y Z
0 0 0 0
1 1 2 2
2 1 2 NaN
3 1 3 3
4 1 3 NaN

Forward fill on specific column for specific rows

df = df.replace('na', np.nan)
df['num2'] = df.groupby('Color')['num2'].ffill()

Output:

>>> df
Color num1 num2
0 red 1 2
1 red 1 2
2 blue 2 NaN
3 blue 2 3
4 yellow 1 4
5 yellow 1 4

Forward fill blocks of above values pandas

You can create consecutive values for missing and not missing values, then create counter per columns and forward filling missing values per groups:

df = pd.DataFrame([[1, 2, 3], [4, None, 8], [None, 5, 9], [None,None,10],
[0, 2, None], [5, None, None], [None, 5, None], [None,None,None]])

print (df)
0 1 2
0 1.0 2.0 3.0
1 4.0 NaN 8.0
2 NaN 5.0 9.0
3 NaN NaN 10.0
4 0.0 2.0 NaN
5 5.0 NaN NaN
6 NaN 5.0 NaN
7 NaN NaN NaN


m = df.isna()
g = m.ne(m.shift()).cumsum()
for c in df.columns:
df[c] = df.groupby(g.groupby(c).cumcount())[c].ffill()

print (df)
0 1 2
0 1.0 2.0 3.0
1 4.0 2.0 8.0
2 1.0 5.0 9.0
3 4.0 5.0 10.0
4 0.0 2.0 3.0
5 5.0 2.0 8.0
6 0.0 5.0 9.0
7 5.0 5.0 10.0

EDIT: New solution repeat non missing values by newxt missing values per groups creted by first non missing value, here is used numpy.tile for generate sequences:

df = pd.DataFrame([[1, 2, 3], [4, None, 8], [None, 5, 9], [7,None,10],
[0, 2, None], [5, None, None], [None, 6, None], [None,8,None]
, [None,None,None], [None,None,None]])
print (df)
0 1 2
0 1.0 2.0 3.0
1 4.0 NaN 8.0
2 NaN 5.0 9.0
3 7.0 NaN 10.0
4 0.0 2.0 NaN
5 5.0 NaN NaN
6 NaN 6.0 NaN
7 NaN 8.0 NaN
8 NaN NaN NaN
9 NaN NaN NaN


g = (df.notna() & df.shift().isna()).cumsum()

def f(x):
non_miss = x.dropna()
return np.tile(non_miss, int(len(x) // len(non_miss) + 2))[:len(x)]

df = df.apply(lambda x: x.groupby(g[x.name]).transform(f))
print (df)
0 1 2
0 1.0 2.0 3.0
1 4.0 2.0 8.0
2 1.0 5.0 9.0
3 7.0 5.0 10.0
4 0.0 2.0 3.0
5 5.0 2.0 8.0
6 7.0 6.0 9.0
7 0.0 8.0 10.0
8 5.0 6.0 3.0
9 7.0 8.0 8.0

Forward fill only certain value

mask = (df.ffill() == 0) should only be suffice to fulfill your usecase.

Firstly, df.ffill will propagate the last valid observation forward. So rows followed by 0 will be filled by 0s, and rows followed by 1 will be filled by 1s. Compare that to 0 to select rows with 0s only and use it as mask to get your final df.

Example: (Added a 0 and few NaNs to the end of your df)

>>> s = [np.nan, 0, np.nan, np.nan, 1, np.nan, np.nan, 0, np.nan, 1, np.nan, np.nan, 0, np.nan, np.nan, np.nan]
>>> df = pd.DataFrame(s, columns=["s"])
>>> df
s
0 NaN
1 0.0
2 NaN
3 NaN
4 1.0
5 NaN
6 NaN
7 0.0
8 NaN
9 1.0
10 NaN
11 NaN
12 0.0
13 NaN
14 NaN
15 NaN
>>>
>>>
>>> df[df.ffill() == 0] = 0
>>> df
s
0 NaN
1 0.0
2 0.0
3 0.0
4 1.0
5 NaN
6 NaN
7 0.0
8 0.0
9 1.0
10 NaN
11 NaN
12 0.0
13 0.0
14 0.0
15 0.0

Pandas forward fill, but only between equal values

If I understand correctly, what you want can be done like this. You want to fill the NaNs where backfill and forward fill give the same value.

ff = df.aux.ffill()
bf = df.aux.bfill()
df.aux = ff[ff == bf]

How to forward propagate/fill a specific value in a Pandas DataFrame Column/Series?

You can still use ffill but first you have to mask the False values

s.mask(~s).ffill(limit=2).fillna(s)


0     True
1 True
2 True
3 False
4 False
5 True
6 True
7 True
8 False
Name: 0, dtype: bool

Pandas dataframe fillna() only some columns in place

You can select your desired columns and do it by assignment:

df[['a', 'b']] = df[['a','b']].fillna(value=0)

The resulting output is as expected:

     a    b    c
0 1.0 4.0 NaN
1 2.0 5.0 NaN
2 3.0 0.0 7.0
3 0.0 6.0 8.0

Pandas dataframe column forward fill from first non-zero value

Use .values attribute:

df['c']=df.groupby('ID',as_index = False)['c'].apply(lambda x: x.replace(to_replace=0, method='ffill')).values

Now if you print df you will get your desired output:

    ID  b   c
0 1 0 0
1 1 5 1
2 1 8 1
3 2 4 0
4 2 8 1
5 2 81 1

Forward fill on custom value in pandas dataframe

You can use df.mask with df.isin with df.replace

df.mask(df.isin(['*']),df.replace('*',np.nan).ffill())

a b
0 1.0 10
1 2.0 10
2 3.0 10
3 4.0 10
4 NaN 50
5 6.0 60
6 7.0 70


Related Topics



Leave a reply



Submit