Pandas interpolate within a groupby
>>> df.groupby('filename').apply(lambda group: group.interpolate(method='index'))
filename val1 val2
t
1 file1.csv 5 10
2 file1.csv 10 15
3 file1.csv 15 20
6 file2.csv NaN NaN
7 file2.csv 10 20
8 file2.csv 12 15
How to interpolate missing values with groupby?
And base on what you need , pass the method spline
df.groupby('state')['population'].apply(lambda x : x.interpolate(method = "spline", order = 1, limit_direction = "both"))
0 100.0
1 150.0
2 200.0
3 250.0
4 50.0
5 125.0
6 200.0
7 275.0
Name: population, dtype: float64
Pandas dataframe groupby id and interpolate values
I believe you need DataFrame.groupby
with DataFrame.resample
and Resampler.interpolate
:
#for DatetimeIndex
df.index = pd.to_datetime(df['year'], format='%Y').rename('datetimes')
df = (df.groupby('id')['value']
.apply(lambda x: x.resample('MS').interpolate())
.reset_index())
print (df)
id datetimes value
0 1 2020-01-01 0.090000
1 1 2020-02-01 0.090083
2 1 2020-03-01 0.090167
3 1 2020-04-01 0.090250
4 1 2020-05-01 0.090333
.. .. ... ...
477 2 2039-09-01 0.109667
478 2 2039-10-01 0.109750
479 2 2039-11-01 0.109833
480 2 2039-12-01 0.109917
481 2 2040-01-01 0.110000
[482 rows x 3 columns]
interpolate annual data for each group separately
Thanks to Henry Ecker, he answered my question in the comment of my previous post.
df['observations'] = (
df['observations']
.mask(df['observations'].eq(0)) # Replace 0 with NaN
.groupby(df['station']) # Groupby Station
.transform(pd.Series.interpolate, method='linear') # interpolate
)
He also suggested this post too for more information. Groupby and interpolate in Pandas
I hope your function should be chained with groupby
object like:
df = (df.set_index('year week')
.groupby('Account Id')[cols_to_interpolate]
.resample('D')
.ffill()
.interpolate() / 7)
Solution from comments is different - interpolate
is apply for each group:df.groupby('Account Id').apply(interpolator)
Related Topics
Networkx - Change Color/Width According to Edge Attributes - Inconsistent Result
Pivot Tables or Group by for Pandas
Python - Requests.Exceptions.Sslerror - Dh Key Too Small
Is It Bad Practice to Use a Built-In Function Name as an Attribute or Method Identifier
Multiprocessing.Pool Makes Numpy Matrix Multiplication Slower
How to Strip All Whitespace from String
How to Group a Pandas Dataframe by a Defined Time Interval
How to Extract a Subset of a Colormap as a New Colormap in Matplotlib
Import Text to Pandas with Multiple Delimiters
"Permission Denied" Trying to Run Python on Windows 10
How to Skip Iterations in a Loop
Joining Pairs of Elements of a List
Selenium Using Python: Enter/Provide Http Proxy Password for Firefox
Should I Be Adding the Django Migration Files in the .Gitignore File
Mapping a Range of Values to Another
Overflowerror: Long Int Too Large to Convert to Float in Python