Pandas Interpolate Within a Groupby

Pandas interpolate within a groupby

>>> df.groupby('filename').apply(lambda group: group.interpolate(method='index'))
filename val1 val2
t
1 file1.csv 5 10
2 file1.csv 10 15
3 file1.csv 15 20
6 file2.csv NaN NaN
7 file2.csv 10 20
8 file2.csv 12 15

How to interpolate missing values with groupby?

And base on what you need , pass the method spline

df.groupby('state')['population'].apply(lambda x : x.interpolate(method = "spline", order = 1, limit_direction = "both"))
0 100.0
1 150.0
2 200.0
3 250.0
4 50.0
5 125.0
6 200.0
7 275.0
Name: population, dtype: float64

Pandas dataframe groupby id and interpolate values

I believe you need DataFrame.groupby with DataFrame.resample and Resampler.interpolate:

#for DatetimeIndex
df.index = pd.to_datetime(df['year'], format='%Y').rename('datetimes')

df = (df.groupby('id')['value']
.apply(lambda x: x.resample('MS').interpolate())
.reset_index())
print (df)
id datetimes value
0 1 2020-01-01 0.090000
1 1 2020-02-01 0.090083
2 1 2020-03-01 0.090167
3 1 2020-04-01 0.090250
4 1 2020-05-01 0.090333
.. .. ... ...
477 2 2039-09-01 0.109667
478 2 2039-10-01 0.109750
479 2 2039-11-01 0.109833
480 2 2039-12-01 0.109917
481 2 2040-01-01 0.110000

[482 rows x 3 columns]

interpolate annual data for each group separately

Thanks to Henry Ecker, he answered my question in the comment of my previous post.

df['observations'] = (
df['observations']
.mask(df['observations'].eq(0)) # Replace 0 with NaN
.groupby(df['station']) # Groupby Station
.transform(pd.Series.interpolate, method='linear') # interpolate
)

He also suggested this post too for more information.

Groupby and interpolate in Pandas

I hope your function should be chained with groupby object like:

df = (df.set_index('year week')
.groupby('Account Id')[cols_to_interpolate]
.resample('D')
.ffill()
.interpolate() / 7)

Solution from comments is different - interpolate is apply for each group:

df.groupby('Account Id').apply(interpolator)


Related Topics



Leave a reply



Submit