forward fill specific columns in pandas dataframe
tl;dr:
cols = ['X', 'Y']
df.loc[:,cols] = df.loc[:,cols].ffill()
And I have also added a self containing example:
>>> import pandas as pd
>>> import numpy as np
>>>
>>> ## create dataframe
... ts1 = [0, 1, np.nan, np.nan, np.nan, np.nan]
>>> ts2 = [0, 2, np.nan, 3, np.nan, np.nan]
>>> d = {'X': ts1, 'Y': ts2, 'Z': ts2}
>>> df = pd.DataFrame(data=d)
>>> print(df.head())
X Y Z
0 0 0 0
1 1 2 2
2 NaN NaN NaN
3 NaN 3 3
4 NaN NaN NaN
>>>
>>> ## apply forward fill
... cols = ['X', 'Y']
>>> df.loc[:,cols] = df.loc[:,cols].ffill()
>>> print(df.head())
X Y Z
0 0 0 0
1 1 2 2
2 1 2 NaN
3 1 3 3
4 1 3 NaN
Forward fill on specific column for specific rows
df = df.replace('na', np.nan)
df['num2'] = df.groupby('Color')['num2'].ffill()
Output:
>>> df
Color num1 num2
0 red 1 2
1 red 1 2
2 blue 2 NaN
3 blue 2 3
4 yellow 1 4
5 yellow 1 4
Forward fill blocks of above values pandas
You can create consecutive values for missing and not missing values, then create counter per columns and forward filling missing values per groups:
df = pd.DataFrame([[1, 2, 3], [4, None, 8], [None, 5, 9], [None,None,10],
[0, 2, None], [5, None, None], [None, 5, None], [None,None,None]])
print (df)
0 1 2
0 1.0 2.0 3.0
1 4.0 NaN 8.0
2 NaN 5.0 9.0
3 NaN NaN 10.0
4 0.0 2.0 NaN
5 5.0 NaN NaN
6 NaN 5.0 NaN
7 NaN NaN NaN
m = df.isna()
g = m.ne(m.shift()).cumsum()
for c in df.columns:
df[c] = df.groupby(g.groupby(c).cumcount())[c].ffill()
print (df)
0 1 2
0 1.0 2.0 3.0
1 4.0 2.0 8.0
2 1.0 5.0 9.0
3 4.0 5.0 10.0
4 0.0 2.0 3.0
5 5.0 2.0 8.0
6 0.0 5.0 9.0
7 5.0 5.0 10.0
EDIT: New solution repeat non missing values by newxt missing values per groups creted by first non missing value, here is used numpy.tile
for generate sequences:
df = pd.DataFrame([[1, 2, 3], [4, None, 8], [None, 5, 9], [7,None,10],
[0, 2, None], [5, None, None], [None, 6, None], [None,8,None]
, [None,None,None], [None,None,None]])
print (df)
0 1 2
0 1.0 2.0 3.0
1 4.0 NaN 8.0
2 NaN 5.0 9.0
3 7.0 NaN 10.0
4 0.0 2.0 NaN
5 5.0 NaN NaN
6 NaN 6.0 NaN
7 NaN 8.0 NaN
8 NaN NaN NaN
9 NaN NaN NaN
g = (df.notna() & df.shift().isna()).cumsum()
def f(x):
non_miss = x.dropna()
return np.tile(non_miss, int(len(x) // len(non_miss) + 2))[:len(x)]
df = df.apply(lambda x: x.groupby(g[x.name]).transform(f))
print (df)
0 1 2
0 1.0 2.0 3.0
1 4.0 2.0 8.0
2 1.0 5.0 9.0
3 7.0 5.0 10.0
4 0.0 2.0 3.0
5 5.0 2.0 8.0
6 7.0 6.0 9.0
7 0.0 8.0 10.0
8 5.0 6.0 3.0
9 7.0 8.0 8.0
Forward fill only certain value
mask = (df.ffill() == 0)
should only be suffice to fulfill your usecase.
Firstly, df.ffill
will propagate the last valid observation forward. So rows followed by 0
will be filled by 0s
, and rows followed by 1
will be filled by 1s
. Compare that to 0
to select rows with 0s
only and use it as mask to get your final df.
Example: (Added a 0 and few NaNs to the end of your df)
>>> s = [np.nan, 0, np.nan, np.nan, 1, np.nan, np.nan, 0, np.nan, 1, np.nan, np.nan, 0, np.nan, np.nan, np.nan]
>>> df = pd.DataFrame(s, columns=["s"])
>>> df
s
0 NaN
1 0.0
2 NaN
3 NaN
4 1.0
5 NaN
6 NaN
7 0.0
8 NaN
9 1.0
10 NaN
11 NaN
12 0.0
13 NaN
14 NaN
15 NaN
>>>
>>>
>>> df[df.ffill() == 0] = 0
>>> df
s
0 NaN
1 0.0
2 0.0
3 0.0
4 1.0
5 NaN
6 NaN
7 0.0
8 0.0
9 1.0
10 NaN
11 NaN
12 0.0
13 0.0
14 0.0
15 0.0
Pandas forward fill, but only between equal values
If I understand correctly, what you want can be done like this. You want to fill the NaNs where backfill and forward fill give the same value.
ff = df.aux.ffill()
bf = df.aux.bfill()
df.aux = ff[ff == bf]
How to forward propagate/fill a specific value in a Pandas DataFrame Column/Series?
You can still use ffill
but first you have to mask the False
values
s.mask(~s).ffill(limit=2).fillna(s)
0 True
1 True
2 True
3 False
4 False
5 True
6 True
7 True
8 False
Name: 0, dtype: bool
Pandas dataframe fillna() only some columns in place
You can select your desired columns and do it by assignment:
df[['a', 'b']] = df[['a','b']].fillna(value=0)
The resulting output is as expected:
a b c
0 1.0 4.0 NaN
1 2.0 5.0 NaN
2 3.0 0.0 7.0
3 0.0 6.0 8.0
Pandas dataframe column forward fill from first non-zero value
Use .values
attribute:
df['c']=df.groupby('ID',as_index = False)['c'].apply(lambda x: x.replace(to_replace=0, method='ffill')).values
Now if you print df
you will get your desired output:
ID b c
0 1 0 0
1 1 5 1
2 1 8 1
3 2 4 0
4 2 8 1
5 2 81 1
Forward fill on custom value in pandas dataframe
You can use df.mask
with df.isin
with df.replace
df.mask(df.isin(['*']),df.replace('*',np.nan).ffill())
a b
0 1.0 10
1 2.0 10
2 3.0 10
3 4.0 10
4 NaN 50
5 6.0 60
6 7.0 70
Related Topics
How to Straighten a Rotated Rectangle Area of an Image Using Opencv in Python
Why Is Pip Installing an Old Version of My Package
Difference Between Data and JSON Parameters in Python Requests Package
Lambda Function Don't Closure the Parameter in Python
Interact with Other Programs Using Python
Finding Duplicate Files and Removing Them
Remove Non-Ascii Characters from Pandas Column
Nltk Named Entity Recognition to a Python List
How to Make Lists Contain Only Distinct Element in Python
How Does My Input Not Equal the Answer
How to Append to the End of an Empty List
Is There a Multi-Dimensional Version of Arange/Linspace in Numpy
Bin Size in Matplotlib (Histogram)
How to Merge Multiple Lists into One List in Python