Add missing dates to pandas dataframe
You could use Series.reindex
:
import pandas as pd
idx = pd.date_range('09-01-2013', '09-30-2013')
s = pd.Series({'09-02-2013': 2,
'09-03-2013': 10,
'09-06-2013': 5,
'09-07-2013': 1})
s.index = pd.DatetimeIndex(s.index)
s = s.reindex(idx, fill_value=0)
print(s)
yields
2013-09-01 0
2013-09-02 2
2013-09-03 10
2013-09-04 0
2013-09-05 0
2013-09-06 5
2013-09-07 1
2013-09-08 0
...
Fill missing dates in a pandas DataFrame
You could create a date range and use "Fecha" column to set_index
+ reindex
to add missing months. Then fillna
+ reset_index
fetches the desired outcome:
df['Fecha'] = pd.to_datetime(df['Fecha'])
df = (df.set_index('Fecha')
.reindex(pd.date_range('2020-01-01', '2021-12-01', freq='MS'))
.rename_axis(['Fecha'])
.fillna(0)
.reset_index())
Output:
Fecha unidades
0 2020-01-01 2.0
1 2020-02-01 0.0
2 2020-03-01 0.0
3 2020-04-01 0.0
4 2020-05-01 0.0
5 2020-06-01 0.0
6 2020-07-01 0.0
7 2020-08-01 0.0
8 2020-09-01 4.0
9 2020-10-01 11.0
10 2020-11-01 4.0
11 2020-12-01 2.0
12 2021-01-01 0.0
13 2021-02-01 0.0
14 2021-03-01 9.0
15 2021-04-01 2.0
16 2021-05-01 1.0
17 2021-06-01 0.0
18 2021-07-01 1.0
19 2021-08-01 0.0
20 2021-09-01 0.0
21 2021-10-01 0.0
22 2021-11-01 0.0
23 2021-12-01 0.0
Add missing dates do datetime column in Pandas using last value
try this:
# If your date format is dayfirst, then use the following code
df['date (dd/mm/yyyy)'] = pd.to_datetime(df['date (dd/mm/yyyy)'], dayfirst=True)
out = df.set_index('date (dd/mm/yyyy)').asfreq('D', method='ffill').reset_index()
print(out)
Fill in missing dates for a pandas dataframe with multiple series
Group by Item
and Category
, then generate a time series from the min to the max date:
result = (
df.groupby(["Item", "Category"])["Date"]
.apply(lambda s: pd.date_range(s.min(), s.max()))
.explode()
.reset_index()
)
Including the missing dates in the date column of pandas dataframe for a specific timespan
Set Date
as index and reindex it with df_date
:
df_date = pd.date_range(start='1/1/2019', end='11/1/2020', freq='MS')
df = df.set_index('Date').reindex(df_date)
Output:
>>> df
Value
2019-01-01 NaN
2019-02-01 NaN
2019-03-01 NaN
2019-04-01 NaN
2019-05-01 NaN
2019-06-01 NaN
2019-07-01 NaN
2019-08-01 NaN
2019-09-01 NaN
2019-10-01 46486868.0
2019-11-01 36092742.0
2019-12-01 32839185.0
2020-01-01 NaN
2020-02-01 NaN
2020-03-01 NaN
2020-04-01 NaN
2020-05-01 NaN
2020-06-01 NaN
2020-07-01 NaN
2020-08-01 NaN
2020-09-01 NaN
2020-10-01 NaN
2020-11-01 NaN
Dataframe: Add new rows for missing dates
You can use .reindex
+ .ffill()
:
min_date = df.index.min()
max_date = df.index.max()
date_list = pd.date_range(min_date, max_date, freq="D")
df = df.reindex(date_list).ffill()
print(df)
Prints:
S&P500 Europe Japan
2002-12-23 0.247683 0.245252 0.203916
2002-12-24 0.241855 0.237858 0.200971
2002-12-25 0.241855 0.237858 0.200971
2002-12-26 0.237095 0.230614 0.197621
2002-12-27 0.241104 0.250323 0.191855
OR: Use method=
parameter
df = df.reindex(date_list, method="ffill")
Add missing dates to pandas dataframe with zeros as values
In your first approach, you are reindexing a DatetimeIndex
with a PeriodIndex
(created by period_range), use date_range
instead of period_range
works:
idx = pd.date_range(date_period, date_now)
df.index = pd.DatetimeIndex(df.date)
df.reindex(idx, fill_value=0)
# date quantity
#2022-08-13 0 0
#2022-08-14 0 0
#2022-08-15 0 0
#2022-08-16 0 0
#2022-08-17 2022-08-17 1
#2022-08-18 2022-08-18 2
#2022-08-19 2022-08-19 3
#2022-08-20 0 0
How to add missing dates in pandas
Use DataFrame.reindex
, working also if need some custom start and end datimes:
df = df.reindex(pd.date_range(start, end, freq ='D'))
Or DataFrame.asfreq
for add missing datetimes between existing data:
df = df.asfreq('d')
Filling missing dates on a DataFrame across different groups
Let's try it with pivot
+ date_range
+ reindex
+ stack
:
tmp = df.pivot('date','customer','attended')
tmp.index = pd.to_datetime(tmp.index)
out = tmp.reindex(pd.date_range(tmp.index[0], tmp.index[-1])).fillna(False).stack().reset_index().rename(columns={0:'attended'})
Output:
level_0 customer attended
0 2022-01-01 John True
1 2022-01-01 Mark False
2 2022-01-02 John True
3 2022-01-02 Mark False
4 2022-01-03 John False
5 2022-01-03 Mark False
6 2022-01-04 John True
7 2022-01-04 Mark False
8 2022-01-05 John False
9 2022-01-05 Mark True
pandas fill missing dates in time series
You need to use period_range
rather than date_range
:
In [11]: idx = pd.period_range(min(df.date), max(df.date))
...: results.reindex(idx, fill_value=0)
...:
Out[11]:
f1 f2 f3 f4
2000-01-01 2.049157 1.962635 2.756154 2.224751
2000-01-02 2.675899 2.587217 1.540823 1.606150
2000-01-03 0.000000 0.000000 0.000000 0.000000
2000-01-04 0.000000 0.000000 0.000000 0.000000
2000-01-05 0.000000 0.000000 0.000000 0.000000
2000-01-06 0.000000 0.000000 0.000000 0.000000
2000-01-07 0.000000 0.000000 0.000000 0.000000
2000-01-08 0.000000 0.000000 0.000000 0.000000
2000-01-09 0.000000 0.000000 0.000000 0.000000
2000-01-10 0.000000 0.000000 0.000000 0.000000
2000-01-11 0.000000 0.000000 0.000000 0.000000
2000-01-12 0.000000 0.000000 0.000000 0.000000
2000-01-13 0.000000 0.000000 0.000000 0.000000
2000-01-14 0.000000 0.000000 0.000000 0.000000
2000-01-15 0.000000 0.000000 0.000000 0.000000
2000-01-16 0.000000 0.000000 0.000000 0.000000
2000-01-17 0.000000 0.000000 0.000000 0.000000
2000-01-18 0.000000 0.000000 0.000000 0.000000
2000-01-19 0.000000 0.000000 0.000000 0.000000
2000-01-20 0.000000 0.000000 0.000000 0.000000
2000-01-21 0.000000 0.000000 0.000000 0.000000
2000-01-22 0.000000 0.000000 0.000000 0.000000
2000-01-23 0.000000 0.000000 0.000000 0.000000
2000-01-24 0.000000 0.000000 0.000000 0.000000
2000-01-25 0.000000 0.000000 0.000000 0.000000
2000-01-26 0.000000 0.000000 0.000000 0.000000
2000-01-27 0.000000 0.000000 0.000000 0.000000
2000-01-28 0.000000 0.000000 0.000000 0.000000
2000-01-29 0.000000 0.000000 0.000000 0.000000
2000-01-30 0.000000 0.000000 0.000000 0.000000
2000-01-31 0.000000 0.000000 0.000000 0.000000
2000-02-01 0.000000 0.000000 0.000000 0.000000
2000-02-02 0.000000 0.000000 0.000000 0.000000
2000-02-03 0.000000 0.000000 0.000000 0.000000
2000-02-04 1.856158 2.892620 2.986166 2.793448
This is because your groupby uses PeriodIndex, rather than datetime:
df.groupby(pd.PeriodIndex(data=df.date, freq='D'))
You could have instead used a pd.Grouper
:
df.groupby(pd.Grouper(key="date", freq='D'))
which would have give a datetime index.
Related Topics
Twitter Sentiment Analysis W R Using German Language Set Sentiws
Plot Line on Top of Stacked Bar Chart in Ggplot2
R-How to Generate Random Sample of a Discrete Random Variables
Fitting a Lognormal Distribution to Truncated Data in R
Applying Over a Vector of Functions
Apply a Summarise Condition to a Range of Columns When Using Dplyr Group_By
Naive Bayes in Quanteda VS Caret: Wildly Different Results
Shiny - How to Change the Font Size in Select Tags
Convert a File Encoding Using R? (Ansi to Utf-8)
Check Whether All Elements of a List Are in Equal in R
Creating Shiny Reactive Variable That Indicates Which Widget Was Last Modified
Removing Duplicate Values Row-Wise in R
Specify Function Parameters in Do.Call
"Error: Continuous Value Supplied to Discrete Scale" in Default Data Set Example Mtcars and Ggplot2