How to replace NaNs by preceding or next values in pandas DataFrame?
You could use the fillna
method on the DataFrame and specify the method as ffill
(forward fill):
>>> df = pd.DataFrame([[1, 2, 3], [4, None, None], [None, None, 9]])
>>> df.fillna(method='ffill')
0 1 2
0 1 2 3
1 4 2 3
2 4 2 9
This method...
propagate[s] last valid observation forward to next valid
To go the opposite way, there's also a bfill
method.
This method doesn't modify the DataFrame inplace - you'll need to rebind the returned DataFrame to a variable or else specify inplace=True
:
df.fillna(method='ffill', inplace=True)
Fill NaN values in dataframe with previous values in column
It can be done easily with ffill
method in pandas fillna.
To illustrate the working consider the following sample dataframe
df = pd.DataFrame()
df['Vals'] = [1, 2, 3, np.nan, np.nan, 6, 7, np.nan, 8]
Vals
0 1.0
1 2.0
2 3.0
3 NaN
4 5.0
5 6.0
6 7.0
7 NaN
8 8.0
To fill the missing value do this
df['Vals'].fillna(method='ffill', inplace=True)
Vals
0 1.0
1 2.0
2 3.0
3 3.0
4 3.0
5 6.0
6 7.0
7 7.0
8 8.0
How to replace NaNs by average of preceding and succeeding values in pandas DataFrame?
Try DataFrame.interpolate()
. Example from the panda docs:
In [65]: df = pd.DataFrame({'A': [1, 2.1, np.nan, 4.7, 5.6, 6.8],
....: 'B': [.25, np.nan, np.nan, 4, 12.2, 14.4]})
....:
In [66]: df
Out[66]:
A B
0 1.0 0.25
1 2.1 NaN
2 NaN NaN
3 4.7 4.00
4 5.6 12.20
5 6.8 14.40
In [67]: df.interpolate()
Out[67]:
A B
0 1.0 0.25
1 2.1 1.50
2 3.4 2.75
3 4.7 4.00
4 5.6 12.20
5 6.8 14.40
pandas Dataframe Replace NaN values with with previous value based on a key column
pd.concat
with groupby
and assign
pd.concat([
g.ffill().assign(d=lambda d: d.b.shift(), e=lambda d: d.d.cumsum())
for _, g in df.groupby('key_value')
])
key_value a b c d e
0 value_01 1.0 1 x NaN NaN
1 value_01 1.0 2 x 1.0 1.0
2 value_01 1.0 3 x 2.0 3.0
3 value_02 7.0 4 y NaN NaN
4 value_02 7.0 5 y 4.0 4.0
5 value_02 7.0 6 y 5.0 9.0
6 value_03 19.0 7 z NaN NaN
groupby
and apply
def h(g):
return g.ffill().assign(
d=lambda d: d.b.shift(), e=lambda d: d.d.cumsum())
df.groupby('key_value', as_index=False, group_keys=False).apply(h)
Fill in NaNs with previous values of a specific column in Python
Couple of ways:
In [3166]: df.apply(lambda x: x.fillna(df.close.shift())).ffill()
Out[3166]:
open high low close
Timestamp
2014-01-07 13:18:00 874.67040 892.06753 874.67040 892.06753
2014-01-07 13:19:00 892.06753 892.06753 892.06753 892.06753
2014-01-07 13:20:00 892.06753 892.06753 892.06753 892.06753
2014-01-07 13:21:00 883.23085 883.23085 874.48165 874.48165
2014-01-07 13:22:00 874.48165 874.48165 874.48165 874.48165
In [3167]: df.fillna({c: df.close.shift() for c in df}).ffill()
Out[3167]:
open high low close
Timestamp
2014-01-07 13:18:00 874.67040 892.06753 874.67040 892.06753
2014-01-07 13:19:00 892.06753 892.06753 892.06753 892.06753
2014-01-07 13:20:00 892.06753 892.06753 892.06753 892.06753
2014-01-07 13:21:00 883.23085 883.23085 874.48165 874.48165
2014-01-07 13:22:00 874.48165 874.48165 874.48165 874.48165
pandas: replace NaN with the last non-NaN value in column
You can do this using the fillna()
method on the dataframe. The method='ffill'
tells it to fill forward with the last valid value.
df.fillna(method='ffill')
Related Topics
How to Check If a List Is Empty
Does Python Have "Private" Variables in Classes
Create a Pandas Dataframe by Appending One Row At a Time
Why Do Python Classes Inherit Object
What's the Difference Between Lists and Tuples
Open Web in New Tab Selenium + Python
Import a Module from a Relative Path
All Combinations of a List of Lists
Why Does List.Append Evaluate to False in a Boolean Context
Why Is _Init_() Always Called After _New_()
Understanding the "Is" Operator
What Does the Ellipsis Object Do
How to Do Fuzzy Match Merge With Python Pandas
What Is Memoization and How to Use It in Python
How to Test If a String Contains One of the Substrings in a List, in Pandas