How to Add Only Missing Dates in Dataframe

Fill missing dates in a pandas DataFrame

You could create a date range and use "Fecha" column to set_index + reindex to add missing months. Then fillna + reset_index fetches the desired outcome:

df['Fecha'] = pd.to_datetime(df['Fecha'])
df = (df.set_index('Fecha')
      .reindex(pd.date_range('2020-01-01', '2021-12-01', freq='MS'))
      .rename_axis(['Fecha'])
      .fillna(0)
      .reset_index())

Output:

        Fecha  unidades
0  2020-01-01       2.0
1  2020-02-01       0.0
2  2020-03-01       0.0
3  2020-04-01       0.0
4  2020-05-01       0.0
5  2020-06-01       0.0
6  2020-07-01       0.0
7  2020-08-01       0.0
8  2020-09-01       4.0
9  2020-10-01      11.0
10 2020-11-01       4.0
11 2020-12-01       2.0
12 2021-01-01       0.0
13 2021-02-01       0.0
14 2021-03-01       9.0
15 2021-04-01       2.0
16 2021-05-01       1.0
17 2021-06-01       0.0
18 2021-07-01       1.0
19 2021-08-01       0.0
20 2021-09-01       0.0
21 2021-10-01       0.0
22 2021-11-01       0.0
23 2021-12-01       0.0

Dataframe: Add new rows for missing dates

You can use .reindex + .ffill():

min_date = df.index.min()
max_date = df.index.max()
date_list = pd.date_range(min_date, max_date, freq="D")

df = df.reindex(date_list).ffill()
print(df)

Prints:

              S&P500    Europe     Japan
2002-12-23  0.247683  0.245252  0.203916
2002-12-24  0.241855  0.237858  0.200971
2002-12-25  0.241855  0.237858  0.200971
2002-12-26  0.237095  0.230614  0.197621
2002-12-27  0.241104  0.250323  0.191855

OR: Use method= parameter

df = df.reindex(date_list, method="ffill")

Fill in missing dates for a pandas dataframe with multiple series

Group by Item and Category, then generate a time series from the min to the max date:

result = (
    df.groupby(["Item", "Category"])["Date"]
    .apply(lambda s: pd.date_range(s.min(), s.max()))
    .explode()
    .reset_index()
)

Including the missing dates in the date column of pandas dataframe for a specific timespan

Set Date as index and reindex it with df_date:

df_date = pd.date_range(start='1/1/2019', end='11/1/2020', freq='MS')
df = df.set_index('Date').reindex(df_date)

Output:

>>> df
                 Value
2019-01-01         NaN
2019-02-01         NaN
2019-03-01         NaN
2019-04-01         NaN
2019-05-01         NaN
2019-06-01         NaN
2019-07-01         NaN
2019-08-01         NaN
2019-09-01         NaN
2019-10-01  46486868.0
2019-11-01  36092742.0
2019-12-01  32839185.0
2020-01-01         NaN
2020-02-01         NaN
2020-03-01         NaN
2020-04-01         NaN
2020-05-01         NaN
2020-06-01         NaN
2020-07-01         NaN
2020-08-01         NaN
2020-09-01         NaN
2020-10-01         NaN
2020-11-01         NaN

Add missing dates to pandas dataframe

You could use Series.reindex:

import pandas as pd

idx = pd.date_range('09-01-2013', '09-30-2013')

s = pd.Series({'09-02-2013': 2,
               '09-03-2013': 10,
               '09-06-2013': 5,
               '09-07-2013': 1})
s.index = pd.DatetimeIndex(s.index)

s = s.reindex(idx, fill_value=0)
print(s)

yields

2013-09-01     0
2013-09-02     2
2013-09-03    10
2013-09-04     0
2013-09-05     0
2013-09-06     5
2013-09-07     1
2013-09-08     0
...

How to add only missing Dates in Dataframe

Here's a correction of your approach, in base R.

Replace max(t1$Date) bySys.Date() in your real application:

t2<-merge(data.frame(Date= as.Date(min(t1$Date):max(t1$Date),"1970-1-1")),
          t1, by = "Date", all = TRUE)
t2[is.na(t2)] <- 0

#         Date Val1 Val2
# 1 2018-04-01  125 0.05
# 2 2018-04-02    0 0.00
# 3 2018-04-03  458 2.99
# 4 2018-04-04    0 0.00
# 5 2018-04-05  354 1.25

data

t1 <- read.table(text="Date        Val1     Val2
'2018-04-01'  125 0.05
'2018-04-03'  458 2.99
'2018-04-05'  354 1.25",h=T,strin=F)
t1$Date <- as.Date(df$Date)

Adding Missing Dates with 0 in Quantity in Python

To get all combinations of DATE, ITEMNUMBER, and LOCATION you can try:

import itertools
df2 = df.set_index(["DATE", "ITEMNUMBER", "LOCATION"])
df2 = df2.reindex(itertools.product(df['DATE'].unique(),
                                    df['ITEMNUMBER'].unique(),
                                    df['LOCATION'].unique())
                 ).fillna(0).reset_index()
df2

example input:

         DATE  ITEMNUMBER LOCATION  QUANTITY
0  2021-07-28           1        A         0
1  2021-07-28           2        B         1
2  2021-07-28           1        B         2
3  2021-07-29           1        A         3
4  2021-07-30           2        A         4

output:

          DATE  ITEMNUMBER LOCATION  QUANTITY
0   2021-07-28           1        A       0.0
1   2021-07-28           1        B       2.0
2   2021-07-28           2        A       0.0
3   2021-07-28           2        B       1.0
4   2021-07-29           1        A       3.0
5   2021-07-29           1        B       0.0
6   2021-07-29           2        A       0.0
7   2021-07-29           2        B       0.0
8   2021-07-30           1        A       0.0
9   2021-07-30           1        B       0.0
10  2021-07-30           2        A       4.0
11  2021-07-30           2        B       0.0

Adding missing dates to dataframe using reindex replaces data

You can reindex by MultiIndex.from_product from columns dates and time:

df.date = pd.to_datetime(df.date)
dates = pd.date_range(start=df.date.min(), end=df.date.max())
print (dates)
DatetimeIndex(['2016-06-06', '2016-06-07', '2016-06-08', '2016-06-09',
               '2016-06-10', '2016-06-11', '2016-06-12', '2016-06-13',
               '2016-06-14', '2016-06-15',
               ...
               '2016-10-16', '2016-10-17', '2016-10-18', '2016-10-19',
               '2016-10-20', '2016-10-21', '2016-10-22', '2016-10-23',
               '2016-10-24', '2016-10-25'],
              dtype='datetime64[ns]', length=142, freq='D')

mux = pd.MultiIndex.from_product([dates,['morning','evening']])
#print (mux)

df.set_index(['date','time'], inplace=True)

print (df.reindex(mux, fill_value=0))
                         _updated_at       Name  hour  day     data1     data2
2016-06-06 morning                 0          0     0    0    0.0000    0.0000
           evening  06/06/2016 13:27  game_name    13    6    0.0000    0.0000
2016-06-07 morning                 0          0     0    0    0.0000    0.0000
           evening                 0          0     0    0    0.0000    0.0000
2016-06-08 morning                 0          0     0    0    0.0000    0.0000
           evening                 0          0     0    0    0.0000    0.0000
2016-06-09 morning                 0          0     0    0    0.0000    0.0000
           evening                 0          0     0    0    0.0000    0.0000
2016-06-10 morning                 0          0     0    0    0.0000    0.0000
           evening                 0          0     0    0    0.0000    0.0000
2016-06-11 morning                 0          0     0    0    0.0000    0.0000
           evening                 0          0     0    0    0.0000    0.0000
2016-06-12 morning                 0          0     0    0    0.0000    0.0000
           evening                 0          0     0    0    0.0000    0.0000
2016-06-13 morning                 0          0     0    0    0.0000    0.0000
...

Last you can groupby by first level of Multiindex (dates) with DataFrameGroupBy.diff. You get for each dates row with NaN which can be removed by dropna:

print (df.reindex(mux, fill_value=0).groupby(level=0)['data1','data2'].diff(-1).dropna())
                       data1     data2
2016-06-06 morning    0.0000    0.0000
2016-06-07 morning    0.0000    0.0000
2016-06-08 morning    0.0000    0.0000
2016-06-09 morning    0.0000    0.0000
2016-06-10 morning    0.0000    0.0000
2016-06-11 morning    0.0000    0.0000
2016-06-12 morning    0.0000    0.0000
2016-06-13 morning    0.0000    0.0000
2016-06-14 morning    0.0000    0.0000
2016-06-15 morning    0.0000    0.0000
2016-06-16 morning    0.0000    0.0000
2016-06-17 morning    0.0000    0.0000
2016-06-18 morning    0.0000    0.0000
2016-06-19 morning    0.0000    0.0000
2016-06-20 morning    0.0000    0.0000
2016-06-21 morning    0.0000    0.0000
...
...

You can also select by ix and subtract:

print (df.reindex(mux, fill_value=0)
         .groupby(level=0)
         .apply(lambda x: x.ix[0, ['data1','data2']]-x.ix[1, ['data1','data2']]))

               data1     data2
2016-06-06    0.0000    0.0000
2016-06-07    0.0000    0.0000
2016-06-08    0.0000    0.0000
2016-06-09    0.0000    0.0000
2016-06-10    0.0000    0.0000
2016-06-11    0.0000    0.0000
2016-06-12    0.0000    0.0000
2016-06-13    0.0000    0.0000
2016-06-14    0.0000    0.0000
2016-06-15    0.0000    0.0000
2016-06-16    0.0000    0.0000
2016-06-17    0.0000    0.0000
2016-06-18    0.0000    0.0000
2016-06-19    0.0000    0.0000
2016-06-20    0.0000    0.0000
2016-06-21    0.0000    0.0000
2016-06-22    0.0000    0.0000
2016-06-23    0.0000    0.0000
2016-06-24    0.0000    0.0000
2016-06-25    0.0000    0.0000
2016-06-26    0.0000    0.0000
2016-06-27    0.0000    0.0000
2016-06-28    0.0000    0.0000
2016-06-29    0.0000    0.0000
2016-06-30    0.0000    0.0000
2016-07-01    0.0000    0.0000
2016-07-02    0.0000    0.0000
2016-07-03    0.0000    0.0000
2016-07-04    0.0000    0.0000
2016-07-05    0.0000    0.0000
             ...       ...
2016-09-26    0.0000    0.0000
2016-09-27    0.0000    0.0000
2016-09-28    0.0000    0.0000
2016-09-29    0.0000    0.0000
2016-09-30    0.0000    0.0000
2016-10-01    0.0000    0.0000
2016-10-02    0.0000    0.0000
2016-10-03    0.0000    0.0000
2016-10-04    0.0000    0.0000
2016-10-05    0.0000    0.0000
2016-10-06    0.0000    0.0000
2016-10-07    0.0000    0.0000
2016-10-08    0.0000    0.0000
2016-10-09    0.0000    0.0000
2016-10-10    0.0000    0.0000
2016-10-11    0.0000    0.0000
2016-10-12    0.0000    0.0000
2016-10-13    0.0000    0.0000
2016-10-14    0.0000    0.0000
2016-10-15    0.0000    0.0000
2016-10-16    0.0000    0.0000
2016-10-17    0.0000    0.0000
2016-10-18    0.0000    0.0000
2016-10-19    0.0000    0.0000
2016-10-20    0.0000    0.0000
2016-10-21    0.0000    0.0000
2016-10-22    0.0000    0.0000
2016-10-23    0.0000    0.0000
2016-10-24  313.5954  364.4107
2016-10-25  362.4682  431.5803

[142 rows x 2 columns]

Pandas fill missing dates and values simultaneously for each group

Let's try:

Getting the minimum value per group using groupby.min
Add a new column to the aggregated mins called max which stores the maximum values from the frame using Series.max on Dt
Create individual date_range per group based on the min and max values
Series.explode into rows to have a DataFrame that represents the new index.
Create a MultiIndex.from_frame to reindex the DataFrame with.
reindex with midx and set the fillvalue=0

# Get Min Per Group
dates = mydf.groupby('Id')['Dt'].min().to_frame(name='min')
# Get max from Frame
dates['max'] = mydf['Dt'].max()

# Create MultiIndex with separate Date ranges per Group
midx = pd.MultiIndex.from_frame(
    dates.apply(
        lambda x: pd.date_range(x['min'], x['max'], freq='MS'), axis=1
    ).explode().reset_index(name='Dt')[['Dt', 'Id']]
)

# Reindex
mydf = (
    mydf.set_index(['Dt', 'Id'])
        .reindex(midx, fill_value=0)
        .reset_index()
)

mydf:

           Dt Id  Sales
0  2020-10-01  A     47
1  2020-11-01  A     67
2  2020-12-01  A     46
3  2021-01-01  A      0
4  2021-02-01  A      0
5  2021-03-01  A      0
6  2021-04-01  A      0
7  2021-05-01  A      0
8  2021-06-01  A      0
9  2021-03-01  B      2
10 2021-04-01  B     42
11 2021-05-01  B     20
12 2021-06-01  B      4

DataFrame:

import pandas as pd

mydf = pd.DataFrame({
    'Dt': ['2021-03-01', '2021-04-01', '2021-05-01', '2021-06-01', '2020-10-01',
           '2020-11-01', '2020-12-01'],
    'Id': ['B', 'B', 'B', 'B', 'A', 'A', 'A'],
    'Sales': [2, 42, 20, 4, 47, 67, 46]
})
mydf['Dt'] = pd.to_datetime(mydf['Dt'])

Add Missing Dates for Groups in Pandas, using min/max Dates of the Group

This will do the trick:

df.drop_duplicates(
  ['Date', 'Company'], 'last'
).groupby('Company').apply(
  lambda x: x.set_index('Date').asfreq('M', fill_value = 0)
).drop('Company', axis = 1).reset_index()