Pandas fill in missing date within each group with information in the previous row
Getting the date right of course:
x.dt = pd.to_datetime(x.dt)
Then this:
cols = ['dt', 'sub_id']
pd.concat([
d.asfreq('D').ffill(downcast='infer')
for _, d in x.drop_duplicates(cols, keep='last')
.set_index('dt').groupby('sub_id')
]).reset_index()
dt amount sub_id
0 2016-01-01 10 1
1 2016-01-02 10 1
2 2016-01-03 30 1
3 2016-01-04 40 1
4 2016-01-01 80 2
5 2016-01-02 80 2
6 2016-01-03 80 2
7 2016-01-04 82 2
Fill in missing date values and populate second column based on previous row
First you need to make sure your date is datetime
type, and you can use resample
:
# resample
df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)
new_df = df.set_index('Date').resample('D').ffill().reset_index()
Output:
Date Rate
0 2019-01-01 1.12
1 2019-01-02 1.13
2 2019-01-03 1.12
3 2019-01-04 1.12
4 2019-01-05 1.12
5 2019-01-06 1.11
6 2019-01-07 1.13
7 2019-01-08 1.14
8 2019-01-09 1.13
9 2019-01-10 1.11
10 2019-01-11 1.11
11 2019-01-12 1.12
12 2019-01-13 1.13
13 2019-01-14 1.14
Pandas filling missing dates and values within group
Initial Dataframe:
dt user val
0 2016-01-01 a 1
1 2016-01-02 a 33
2 2016-01-05 b 2
3 2016-01-06 b 1
First, convert the dates to datetime:
x['dt'] = pd.to_datetime(x['dt'])
Then, generate the dates and unique users:
dates = x.set_index('dt').resample('D').asfreq().index
>> DatetimeIndex(['2016-01-01', '2016-01-02', '2016-01-03', '2016-01-04',
'2016-01-05', '2016-01-06'],
dtype='datetime64[ns]', name='dt', freq='D')
users = x['user'].unique()
>> array(['a', 'b'], dtype=object)
This will allow you to create a MultiIndex:
idx = pd.MultiIndex.from_product((dates, users), names=['dt', 'user'])
>> MultiIndex(levels=[[2016-01-01 00:00:00, 2016-01-02 00:00:00, 2016-01-03 00:00:00, 2016-01-04 00:00:00, 2016-01-05 00:00:00, 2016-01-06 00:00:00], ['a', 'b']],
labels=[[0, 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5], [0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1]],
names=['dt', 'user'])
You can use that to reindex your DataFrame:
x.set_index(['dt', 'user']).reindex(idx, fill_value=0).reset_index()
Out:
dt user val
0 2016-01-01 a 1
1 2016-01-01 b 0
2 2016-01-02 a 33
3 2016-01-02 b 0
4 2016-01-03 a 0
5 2016-01-03 b 0
6 2016-01-04 a 0
7 2016-01-04 b 0
8 2016-01-05 a 0
9 2016-01-05 b 2
10 2016-01-06 a 0
11 2016-01-06 b 1
which then can be sorted by users:
x.set_index(['dt', 'user']).reindex(idx, fill_value=0).reset_index().sort_values(by='user')
Out:
dt user val
0 2016-01-01 a 1
2 2016-01-02 a 33
4 2016-01-03 a 0
6 2016-01-04 a 0
8 2016-01-05 a 0
10 2016-01-06 a 0
1 2016-01-01 b 0
3 2016-01-02 b 0
5 2016-01-03 b 0
7 2016-01-04 b 0
9 2016-01-05 b 2
11 2016-01-06 b 1
Pandas - Filling missing dates within groups with different time ranges
Create DatetimeIndex
, so possible use groupby
with custom lambda function and Series.asfreq
:
x['dt'] = pd.to_datetime(x['dt'])
x = (x.set_index('dt')
.groupby('user')['val']
.apply(lambda x: x.asfreq('MS', fill_value=0))
.reset_index())
print (x)
user dt val
0 a 2015-01-01 1
1 a 2015-02-01 33
2 a 2015-03-01 0
3 a 2015-04-01 0
4 a 2015-05-01 4
5 a 2015-06-01 0
6 a 2015-07-01 2
7 a 2015-08-01 66
8 b 2016-01-01 2
9 b 2016-02-01 1
10 b 2016-03-01 0
11 b 2016-04-01 0
12 b 2016-05-01 5
13 b 2016-06-01 0
14 b 2016-07-01 0
15 b 2016-08-01 0
16 b 2016-09-01 1
17 c 2017-01-01 5
18 c 2017-02-01 0
19 c 2017-03-01 7
20 c 2017-04-01 0
21 c 2017-05-01 0
22 c 2017-06-01 0
23 c 2017-07-01 0
24 c 2017-08-01 5
Or use Series.reindex
with min and max datetimes per groups:
x = (x.set_index('dt')
.groupby('user')['val']
.apply(lambda x: x.reindex(pd.date_range(x.index.min(),
x.index.max(), freq='MS'), fill_value=0))
.rename_axis(('user','dt'))
.reset_index())
Fill in missing pandas data with previous non-missing value, grouped by key
You could perform a groupby/forward-fill operation on each group:
import numpy as np
import pandas as pd
df = pd.DataFrame({'id': [1,1,2,2,1,2,1,1], 'x':[10,20,100,200,np.nan,np.nan,300,np.nan]})
df['x'] = df.groupby(['id'])['x'].ffill()
print(df)
yields
id x
0 1 10.0
1 1 20.0
2 2 100.0
3 2 200.0
4 1 20.0
5 2 200.0
6 1 300.0
7 1 300.0
Fill missing date record by duplication former date record in pandas
Or you can simply using reindex
idx=pd.date_range(start='2015-02-20',end='2015-10-23', freq='D')
df=df.set_index(df.Date,drop=True)
df.reindex(idx).ffill().sort_index(ascending=False).drop('Date',1).reset_index().\
rename(columns={'index':'Date'})
Out[304]:
Date Value
0 2015-10-23 75%
1 2015-10-22 50%
2 2015-10-21 50%
3 2015-10-20 50%
4 2015-10-19 50%
5 2015-10-18 50%
6 2015-10-17 50%
7 2015-10-16 50%
8 2015-10-15 50%
9 2015-10-14 50%
10 2015-10-13 50%
11 2015-10-12 50%
12 2015-10-11 50%
13 2015-10-10 50%
14 2015-10-09 50%
15 2015-10-08 50%
16 2015-10-07 50%
17 2015-10-06 50%
18 2015-10-05 50%
19 2015-10-04 50%
20 2015-10-03 50%
21 2015-10-02 50%
22 2015-10-01 50%
23 2015-09-30 50%
24 2015-09-29 50%
25 2015-09-28 50%
26 2015-09-27 50%
27 2015-09-26 50%
28 2015-09-25 50%
29 2015-09-24 50%
Related Topics
How to Compare Two Image Files Contents in Python
Python File Opens and Immediately Closes
Pandas Fill in Missing Date Within Each Group With Information in the Previous Row
How to Pass a Dictionary Object as Parameter for a Function in Python
In Python, How to Check If a String Only Contains Certain Characters
How to Calculate R-Squared Using Python and Numpy
I Need to Code a 1 22 333 4444 Pattern in Python With While Loops
How to Install Colorama in Python
Cursor.Fetchone() Returns None But Row in the Database Exists
Issue Skipping Song by Requester
Setting Matplotlib Colorbar Range
Python 2D List Performance, Without Numpy
Sqlalchemy, Prevent Duplicate Rows
How to Get Elasticsearch to Perform an Exact Match Query
How to Upgrade the Sqlite Version Used by Python'S Sqlite3 Module on Mac