Pandas Fill in Missing Date Within Each Group With Information in the Previous Row

Pandas fill in missing date within each group with information in the previous row

Getting the date right of course:

x.dt = pd.to_datetime(x.dt)

Then this:

cols = ['dt', 'sub_id']

pd.concat([
d.asfreq('D').ffill(downcast='infer')
for _, d in x.drop_duplicates(cols, keep='last')
.set_index('dt').groupby('sub_id')
]).reset_index()

dt amount sub_id
0 2016-01-01 10 1
1 2016-01-02 10 1
2 2016-01-03 30 1
3 2016-01-04 40 1
4 2016-01-01 80 2
5 2016-01-02 80 2
6 2016-01-03 80 2
7 2016-01-04 82 2

Fill in missing date values and populate second column based on previous row

First you need to make sure your date is datetime type, and you can use resample:

# resample
df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)

new_df = df.set_index('Date').resample('D').ffill().reset_index()

Output:

         Date  Rate
0 2019-01-01 1.12
1 2019-01-02 1.13
2 2019-01-03 1.12
3 2019-01-04 1.12
4 2019-01-05 1.12
5 2019-01-06 1.11
6 2019-01-07 1.13
7 2019-01-08 1.14
8 2019-01-09 1.13
9 2019-01-10 1.11
10 2019-01-11 1.11
11 2019-01-12 1.12
12 2019-01-13 1.13
13 2019-01-14 1.14

Pandas filling missing dates and values within group

Initial Dataframe:

            dt  user    val
0 2016-01-01 a 1
1 2016-01-02 a 33
2 2016-01-05 b 2
3 2016-01-06 b 1

First, convert the dates to datetime:

x['dt'] = pd.to_datetime(x['dt'])

Then, generate the dates and unique users:

dates = x.set_index('dt').resample('D').asfreq().index

>> DatetimeIndex(['2016-01-01', '2016-01-02', '2016-01-03', '2016-01-04',
'2016-01-05', '2016-01-06'],
dtype='datetime64[ns]', name='dt', freq='D')

users = x['user'].unique()

>> array(['a', 'b'], dtype=object)

This will allow you to create a MultiIndex:

idx = pd.MultiIndex.from_product((dates, users), names=['dt', 'user'])

>> MultiIndex(levels=[[2016-01-01 00:00:00, 2016-01-02 00:00:00, 2016-01-03 00:00:00, 2016-01-04 00:00:00, 2016-01-05 00:00:00, 2016-01-06 00:00:00], ['a', 'b']],
labels=[[0, 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5], [0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1]],
names=['dt', 'user'])

You can use that to reindex your DataFrame:

x.set_index(['dt', 'user']).reindex(idx, fill_value=0).reset_index()
Out:
dt user val
0 2016-01-01 a 1
1 2016-01-01 b 0
2 2016-01-02 a 33
3 2016-01-02 b 0
4 2016-01-03 a 0
5 2016-01-03 b 0
6 2016-01-04 a 0
7 2016-01-04 b 0
8 2016-01-05 a 0
9 2016-01-05 b 2
10 2016-01-06 a 0
11 2016-01-06 b 1

which then can be sorted by users:

x.set_index(['dt', 'user']).reindex(idx, fill_value=0).reset_index().sort_values(by='user')
Out:
dt user val
0 2016-01-01 a 1
2 2016-01-02 a 33
4 2016-01-03 a 0
6 2016-01-04 a 0
8 2016-01-05 a 0
10 2016-01-06 a 0
1 2016-01-01 b 0
3 2016-01-02 b 0
5 2016-01-03 b 0
7 2016-01-04 b 0
9 2016-01-05 b 2
11 2016-01-06 b 1

Pandas - Filling missing dates within groups with different time ranges

Create DatetimeIndex, so possible use groupby with custom lambda function and Series.asfreq:

x['dt'] = pd.to_datetime(x['dt'])
x = (x.set_index('dt')
.groupby('user')['val']
.apply(lambda x: x.asfreq('MS', fill_value=0))
.reset_index())
print (x)
user dt val
0 a 2015-01-01 1
1 a 2015-02-01 33
2 a 2015-03-01 0
3 a 2015-04-01 0
4 a 2015-05-01 4
5 a 2015-06-01 0
6 a 2015-07-01 2
7 a 2015-08-01 66
8 b 2016-01-01 2
9 b 2016-02-01 1
10 b 2016-03-01 0
11 b 2016-04-01 0
12 b 2016-05-01 5
13 b 2016-06-01 0
14 b 2016-07-01 0
15 b 2016-08-01 0
16 b 2016-09-01 1
17 c 2017-01-01 5
18 c 2017-02-01 0
19 c 2017-03-01 7
20 c 2017-04-01 0
21 c 2017-05-01 0
22 c 2017-06-01 0
23 c 2017-07-01 0
24 c 2017-08-01 5

Or use Series.reindex with min and max datetimes per groups:

x = (x.set_index('dt')
.groupby('user')['val']
.apply(lambda x: x.reindex(pd.date_range(x.index.min(),
x.index.max(), freq='MS'), fill_value=0))
.rename_axis(('user','dt'))
.reset_index())

Fill in missing pandas data with previous non-missing value, grouped by key

You could perform a groupby/forward-fill operation on each group:

import numpy as np
import pandas as pd

df = pd.DataFrame({'id': [1,1,2,2,1,2,1,1], 'x':[10,20,100,200,np.nan,np.nan,300,np.nan]})
df['x'] = df.groupby(['id'])['x'].ffill()
print(df)

yields

   id      x
0 1 10.0
1 1 20.0
2 2 100.0
3 2 200.0
4 1 20.0
5 2 200.0
6 1 300.0
7 1 300.0

Fill missing date record by duplication former date record in pandas

Or you can simply using reindex

idx=pd.date_range(start='2015-02-20',end='2015-10-23', freq='D')
df=df.set_index(df.Date,drop=True)
df.reindex(idx).ffill().sort_index(ascending=False).drop('Date',1).reset_index().\
rename(columns={'index':'Date'})


Out[304]:
Date Value
0 2015-10-23 75%
1 2015-10-22 50%
2 2015-10-21 50%
3 2015-10-20 50%
4 2015-10-19 50%
5 2015-10-18 50%
6 2015-10-17 50%
7 2015-10-16 50%
8 2015-10-15 50%
9 2015-10-14 50%
10 2015-10-13 50%
11 2015-10-12 50%
12 2015-10-11 50%
13 2015-10-10 50%
14 2015-10-09 50%
15 2015-10-08 50%
16 2015-10-07 50%
17 2015-10-06 50%
18 2015-10-05 50%
19 2015-10-04 50%
20 2015-10-03 50%
21 2015-10-02 50%
22 2015-10-01 50%
23 2015-09-30 50%
24 2015-09-29 50%
25 2015-09-28 50%
26 2015-09-27 50%
27 2015-09-26 50%
28 2015-09-25 50%
29 2015-09-24 50%


Related Topics



Leave a reply



Submit