How to Replace Nans by Preceding or Next Values in Pandas Dataframe

How to replace NaNs by preceding or next values in pandas DataFrame?

You could use the fillna method on the DataFrame and specify the method as ffill (forward fill):

>>> df = pd.DataFrame([[1, 2, 3], [4, None, None], [None, None, 9]])
>>> df.fillna(method='ffill')
0 1 2
0 1 2 3
1 4 2 3
2 4 2 9

This method...

propagate[s] last valid observation forward to next valid

To go the opposite way, there's also a bfill method.

This method doesn't modify the DataFrame inplace - you'll need to rebind the returned DataFrame to a variable or else specify inplace=True:

df.fillna(method='ffill', inplace=True)

Fill NaN values in dataframe with previous values in column

It can be done easily with ffill method in pandas fillna.

To illustrate the working consider the following sample dataframe

df = pd.DataFrame()

df['Vals'] = [1, 2, 3, np.nan, np.nan, 6, 7, np.nan, 8]

Vals
0 1.0
1 2.0
2 3.0
3 NaN
4 5.0
5 6.0
6 7.0
7 NaN
8 8.0

To fill the missing value do this

df['Vals'].fillna(method='ffill', inplace=True)

Vals
0 1.0
1 2.0
2 3.0
3 3.0
4 3.0
5 6.0
6 7.0
7 7.0
8 8.0

How to replace NaNs by average of preceding and succeeding values in pandas DataFrame?

Try DataFrame.interpolate(). Example from the panda docs:

In [65]: df = pd.DataFrame({'A': [1, 2.1, np.nan, 4.7, 5.6, 6.8],
....: 'B': [.25, np.nan, np.nan, 4, 12.2, 14.4]})
....:

In [66]: df
Out[66]:
A B
0 1.0 0.25
1 2.1 NaN
2 NaN NaN
3 4.7 4.00
4 5.6 12.20
5 6.8 14.40

In [67]: df.interpolate()
Out[67]:
A B
0 1.0 0.25
1 2.1 1.50
2 3.4 2.75
3 4.7 4.00
4 5.6 12.20
5 6.8 14.40

pandas Dataframe Replace NaN values with with previous value based on a key column

pd.concat with groupby and assign

pd.concat([
g.ffill().assign(d=lambda d: d.b.shift(), e=lambda d: d.d.cumsum())
for _, g in df.groupby('key_value')
])

key_value a b c d e
0 value_01 1.0 1 x NaN NaN
1 value_01 1.0 2 x 1.0 1.0
2 value_01 1.0 3 x 2.0 3.0
3 value_02 7.0 4 y NaN NaN
4 value_02 7.0 5 y 4.0 4.0
5 value_02 7.0 6 y 5.0 9.0
6 value_03 19.0 7 z NaN NaN

groupby and apply

def h(g):
return g.ffill().assign(
d=lambda d: d.b.shift(), e=lambda d: d.d.cumsum())

df.groupby('key_value', as_index=False, group_keys=False).apply(h)

Fill in NaNs with previous values of a specific column in Python

Couple of ways:

In [3166]: df.apply(lambda x: x.fillna(df.close.shift())).ffill()
Out[3166]:
open high low close
Timestamp
2014-01-07 13:18:00 874.67040 892.06753 874.67040 892.06753
2014-01-07 13:19:00 892.06753 892.06753 892.06753 892.06753
2014-01-07 13:20:00 892.06753 892.06753 892.06753 892.06753
2014-01-07 13:21:00 883.23085 883.23085 874.48165 874.48165
2014-01-07 13:22:00 874.48165 874.48165 874.48165 874.48165

In [3167]: df.fillna({c: df.close.shift() for c in df}).ffill()
Out[3167]:
open high low close
Timestamp
2014-01-07 13:18:00 874.67040 892.06753 874.67040 892.06753
2014-01-07 13:19:00 892.06753 892.06753 892.06753 892.06753
2014-01-07 13:20:00 892.06753 892.06753 892.06753 892.06753
2014-01-07 13:21:00 883.23085 883.23085 874.48165 874.48165
2014-01-07 13:22:00 874.48165 874.48165 874.48165 874.48165

pandas: replace NaN with the last non-NaN value in column

You can do this using the fillna() method on the dataframe. The method='ffill' tells it to fill forward with the last valid value.

df.fillna(method='ffill')


Related Topics



Leave a reply



Submit