Fill Missing Dates by Group

Fill missing dates by group

`tidyr::complete()` fills missing values

add id and date as the columns (...) to expand for

library(tidyverse)

complete(dat, id, date)


# A tibble: 16 x 3
      id date       value
   <dbl> <date>     <dbl>
 1  1.00 2017-01-01  30.0
 2  1.00 2017-02-01  30.0
 3  1.00 2017-03-01  NA  
 4  1.00 2017-04-01  25.0
 5  2.00 2017-01-01  NA  
 6  2.00 2017-02-01  25.0
 7  2.00 2017-03-01  NA  
 8  2.00 2017-04-01  NA  
 9  3.00 2017-01-01  25.0
10  3.00 2017-02-01  25.0
11  3.00 2017-03-01  25.0
12  3.00 2017-04-01  NA  
13  4.00 2017-01-01  20.0
14  4.00 2017-02-01  20.0
15  4.00 2017-03-01  NA  
16  4.00 2017-04-01  20.0

Pandas fill missing dates and values simultaneously for each group

Let's try:

Getting the minimum value per group using groupby.min
Add a new column to the aggregated mins called max which stores the maximum values from the frame using Series.max on Dt
Create individual date_range per group based on the min and max values
Series.explode into rows to have a DataFrame that represents the new index.
Create a MultiIndex.from_frame to reindex the DataFrame with.
reindex with midx and set the fillvalue=0

# Get Min Per Group
dates = mydf.groupby('Id')['Dt'].min().to_frame(name='min')
# Get max from Frame
dates['max'] = mydf['Dt'].max()

# Create MultiIndex with separate Date ranges per Group
midx = pd.MultiIndex.from_frame(
    dates.apply(
        lambda x: pd.date_range(x['min'], x['max'], freq='MS'), axis=1
    ).explode().reset_index(name='Dt')[['Dt', 'Id']]
)

# Reindex
mydf = (
    mydf.set_index(['Dt', 'Id'])
        .reindex(midx, fill_value=0)
        .reset_index()
)

mydf:

           Dt Id  Sales
0  2020-10-01  A     47
1  2020-11-01  A     67
2  2020-12-01  A     46
3  2021-01-01  A      0
4  2021-02-01  A      0
5  2021-03-01  A      0
6  2021-04-01  A      0
7  2021-05-01  A      0
8  2021-06-01  A      0
9  2021-03-01  B      2
10 2021-04-01  B     42
11 2021-05-01  B     20
12 2021-06-01  B      4

DataFrame:

import pandas as pd

mydf = pd.DataFrame({
    'Dt': ['2021-03-01', '2021-04-01', '2021-05-01', '2021-06-01', '2020-10-01',
           '2020-11-01', '2020-12-01'],
    'Id': ['B', 'B', 'B', 'B', 'A', 'A', 'A'],
    'Sales': [2, 42, 20, 4, 47, 67, 46]
})
mydf['Dt'] = pd.to_datetime(mydf['Dt'])

Fill missing dates in 2 level of groups in pandas

Use GroupBy.apply with lambd function with div.DataFrame.asfreq:

df['date'] = pd.to_datetime(df['date'])


df = (df.set_index('date')
        .groupby(['country','county'])['sales']
        .apply(lambda x: x.asfreq('d', fill_value=0))
        .reset_index()
        [['date','country','county','sales']])
print (df)
        date country county  sales
0 2021-01-01       a      c      1
1 2021-01-02       a      c      2
2 2021-01-01       a      d      1
3 2021-01-02       a      d      0
4 2021-01-03       a      d     45
5 2021-01-01       b      e      2
6 2021-01-02       b      e    341
7 2021-01-05       b      f     14
8 2021-01-06       b      f      0
9 2021-01-07       b      f     25

Fill missing dates in group and convert data to weekly

The code works using the latest version of pandas.

Update your pandas version.

(It's good code, by the way!)

Filling missing dates on a DataFrame across different groups

Let's try it with pivot + date_range + reindex + stack:

tmp = df.pivot('date','customer','attended')
tmp.index = pd.to_datetime(tmp.index)
out = tmp.reindex(pd.date_range(tmp.index[0], tmp.index[-1])).fillna(False).stack().reset_index().rename(columns={0:'attended'})

Output:

     level_0 customer  attended
0 2022-01-01     John      True
1 2022-01-01     Mark     False
2 2022-01-02     John      True
3 2022-01-02     Mark     False
4 2022-01-03     John     False
5 2022-01-03     Mark     False
6 2022-01-04     John      True
7 2022-01-04     Mark     False
8 2022-01-05     John     False
9 2022-01-05     Mark      True

Fill in missing dates with NAs by group in R - with NA at end of date range as well

I think you're close with your second attempt. If you want to manually enforce the limits of the expansion in the complete call, you can do it there. It wasn't clear what limits you were after but perhaps the below can get you there. Note that I used two date ranges because it seemed like you wanted to hit two time ranges. But adjust if I misunderstood. Can also be called programmatically if you have those dates stored somewhere. Also, I converted your date column to an actual date format using as.Date() during import.

library(tidyverse)

table <- "ID    Date    dist.km\n 1 1     2007-10-15     15147\n 2 1     2007-10-16     15156\n 3 1     2007-10-17     15173\n 4 1     2007-10-18     15185\n 5 1     2007-10-19     15194\n 6 1     2007-10-25     15202\n 7 1     2007-10-26     15216\n 8 1     2007-10-27     15240\n 9 1     2007-10-28     15270\n10 1     2007-10-29     15290\n11 2     2008-10-15     15147\n12 2     2008-10-16     15156\n13 2     2008-10-17     15173\n14 2     2008-10-18     15185\n15 2     2008-10-19     15194\n16 2     2008-10-20     15202\n17 2     2008-10-21     15216\n18 2     2008-10-29     15240\n19 2     2008-10-30     15270\n20 2     2008-10-31     15290"

#Create a dataframe with the above table
df <- read.table(text=table, header = TRUE) %>% 
  mutate(Date = as.Date(Date))

# expand by feeding the limits of the date ranges to cover
newdat2 <- df %>%
  group_by(ID) %>%
  complete(Date = c(
    seq.Date(
      from = as.Date("2007-10-15"),
      to = as.Date("2008-02-15"),
      by = "day"
    ),
    seq.Date(
      from = as.Date("2008-10-15"),
      to = as.Date("2009-02-15"),
      by = "day"
    )
  ))

newdat2

#> # A tibble: 496 x 3
#> # Groups:   ID [2]
#>       ID Date       dist.km
#>    <int> <date>       <int>
#>  1     1 2007-10-15   15147
#>  2     1 2007-10-16   15156
#>  3     1 2007-10-17   15173
#>  4     1 2007-10-18   15185
#>  5     1 2007-10-19   15194
#>  6     1 2007-10-20      NA
#>  7     1 2007-10-21      NA
#>  8     1 2007-10-22      NA
#>  9     1 2007-10-23      NA
#> 10     1 2007-10-24      NA
#> # ... with 486 more rows

^{Created on 2021-03-15 by the reprex package (v1.0.0)}

Filling missing dates within group with duplicate date pandas python

>>> df.set_index("day") \
      .groupby("ID")["val"] \
      .resample("D") \
      .first() \
      .fillna(0) \
      .reset_index()

   ID        day    val
0  AA 2020-01-26  100.0
1  AA 2020-01-27    0.0
2  AA 2020-01-28  200.0
3  BB 2020-01-26  100.0
4  BB 2020-01-27  100.0
5  BB 2020-01-28    0.0
6  BB 2020-01-29   40.0

Note: the function first() is useless. It's because Resampler.fillna() only works with the method keyword. You cannot pass a value unlike DataFrame.fillna().

Expanding and filling the dataframe for missing dates by each group

I would set the df index to Date, then group by ID and finally reindex depending on the oldest (replacing it with the first day of the month) and most recent dates:

import pandas as pd

df = pd.DataFrame({"ID":[1,1,1,2,2,2],
                   "Date":["29.12.2020","05.01.2021","15.02.2021","11.04.2021","27.05.2021","29.05.2021"],
                   "Amount":[6,5,7,9,8,7]}) 
df["Date"] = pd.to_datetime(df["Date"], format="%d.%m.%Y")
df = df.set_index("Date")

new_df = pd.DataFrame()
for id_val, obs_period in df.groupby("ID"):
    date_range = pd.date_range(min(obs_period.index).replace(day=1), max(obs_period.index))
    obs_period = obs_period.reindex(date_range, fill_value=pd.NA)
    obs_period["ID"] = id_val
    if pd.isna(obs_period.at[obs_period.index[0], "Amount"]):
        obs_period.at[obs_period.index[0], "Amount"] = 0  # adding 0 at the beginning of the period if undefined
    obs_period= obs_period.ffill()      # filling Amount with last value
    new_df = pd.concat([new_df, obs_period])

print(new_df)

BTW you should specify your date format while converting df["Date"]

Output:

            ID  Amount
2020-12-01   1     0.0
2020-12-02   1     0.0
2020-12-03   1     0.0
2020-12-04   1     0.0
2020-12-05   1     0.0
...         ..     ...
2021-05-25   2     9.0
2021-05-26   2     9.0
2021-05-27   2     8.0
2021-05-28   2     8.0
2021-05-29   2     7.0

[136 rows x 2 columns]

Fill in missing dates across multiple partitions (Snowflake)

WITH fake_data AS (
    SELECT * FROM VALUES
        ('A','USD','2020-01-01'::date,3)
        ,('A','USD','2020-01-03'::date,4)
        ,('A','USD','2020-01-04'::date,2)
        ,('A','CAD','2021-01-04'::date,5)
        ,('A','CAD','2021-01-06'::date,6)
        ,('A','CAD','2020-01-07'::date,1)
        ,('B','USD','2019-01-01'::date,3)
        ,('B','USD','2019-01-03'::date,4)
        ,('B','USD','2019-01-04'::date,5)
        ,('B','CAD','2017-01-04'::date,3)
        ,('B','CAD','2017-01-06'::date,2)
        ,('B','CAD','2017-01-07'::date,2)
    d(Name,Currency,Date,Amount)
), partition_ranges AS (
    SELECT name,
        currency, 
        min(date) as min_date, 
        max(date) as max_date,
        datediff('days', min_date, max_date) as span
    FROM fake_data
    GROUP BY 1,2
), huge_range as (
    SELECT ROW_NUMBER() OVER(order by true)-1 as rn
    FROM table(generator(ROWCOUNT => 10000000))
), in_fill as (
    SELECT pr.name,
        pr.currency,
        dateadd('day', hr.rn, pr.min_date) as date
    FROM partition_ranges as pr
    JOIN huge_range as hr ON pr.span >= hr.rn
)
SELECT 
    i.name, 
    i.currency, 
    i.date,
    nvl(d.amount, 0) as amount
from in_fill as i
left join fake_data as d on d.name = i.name and d.currency = i.currency and d.date = i.date
order by 1,2,3;

How to Make R Beep/Play a Sound at the End of a Script

Animated Sorted Bar Chart with Bars Overtaking Each Other

Insert Picture/Table in R Markdown

Mutate_Each/Summarise_Each in Dplyr: How to Select Certain Columns and Give New Names to Mutated Columns

Apply a Function to Every Row of a Matrix or a Data Frame

How to Specify the Actual X Axis Values to Plot as X Axis Ticks in R

How to Add a Ggplot2 Subtitle with Different Size and Colour

Is There a Better Alternative Than String Manipulation to Programmatically Build Formulas

How to Convert R Markdown to HTML? I.E., What Does "Knit HTML" Do in Rstudio 0.96

How to Connect R with Access Database in 64-Bit Window

How to Calculate Combination and Permutation in R

How to Create a Marimekko/Mosaic Plot in Ggplot2

Remove Backslashes from Character String

R: X 'Probs' Outside [0,1]

Ggplot2 Plot Area Margins

Options for Caching/Memoization/Hashing in R

How to Drop Columns by Name Pattern in R

Submit

NAME	CURRENCY	DATE	AMOUNT
A	CAD	2020-01-07	1
A	CAD	2020-01-08	0
A	CAD	2020-01-09	0
A	CAD	2020-01-10	0
A	CAD	2020-01-11	0
A	CAD	2020-01-12	0
A	CAD	2020-01-13	0
A	CAD	2020-01-14	0
A	CAD	2020-01-15	0
A	CAD	2020-01-16	0
A	CAD	2020-01-17	0
A	CAD	2020-01-18	0
A	CAD	2020-01-19	0
A	CAD	2020-01-20	0
A	CAD	2020-01-21	0
A	CAD	2020-01-22	0
A	CAD	2020-01-23	0
A	CAD	2020-01-24	0
A	CAD	2020-01-25	0
A	CAD	2020-01-26	0
A	CAD	2020-01-27	0
A	CAD	2020-01-28	0
A	CAD	2020-01-29	0
A	CAD	2020-01-30	0
A	CAD	2020-01-31	0
A	CAD	2020-02-01	0
A	CAD	2020-02-02	0
A	CAD	2020-02-03	0
A	CAD	2020-02-04	0
A	CAD	2020-02-05	0
A	CAD	2020-02-06	0
A	CAD	2020-02-07	0
A	CAD	2020-02-08	0
A	CAD	2020-02-09	0
A	CAD	2020-02-10	0
A	CAD	2020-02-11	0
A	CAD	2020-02-12	0
A	CAD	2020-02-13	0
A	CAD	2020-02-14	0
A	CAD	2020-02-15	0
A	CAD	2020-02-16	0
A	CAD	2020-02-17	0
A	CAD	2020-02-18	0
A	CAD	2020-02-19	0
A	CAD	2020-02-20	0
A	CAD	2020-02-21	0
A	CAD	2020-02-22	0
A	CAD	2020-02-23	0
A	CAD	2020-02-24	0
A	CAD	2020-02-25	0
A	CAD	2020-02-26	0
A	CAD	2020-02-27	0
A	CAD	2020-02-28	0
A	CAD	2020-02-29	0
A	CAD	2020-03-01	0
A	CAD	2020-03-02	0
A	CAD	2020-03-03	0
A	CAD	2020-03-04	0
A	CAD	2020-03-05	0
A	CAD	2020-03-06	0
A	CAD	2020-03-07	0
A	CAD	2020-03-08	0
A	CAD	2020-03-09	0
A	CAD	2020-03-10	0
A	CAD	2020-03-11	0
A	CAD	2020-03-12	0
A	CAD	2020-03-13	0
A	CAD	2020-03-14	0
A	CAD	2020-03-15	0
A	CAD	2020-03-16	0
A	CAD	2020-03-17	0
A	CAD	2020-03-18	0
A	CAD	2020-03-19	0
A	CAD	2020-03-20	0
A	CAD	2020-03-21	0
A	CAD	2020-03-22	0
A	CAD	2020-03-23	0
A	CAD	2020-03-24	0
A	CAD	2020-03-25	0
A	CAD	2020-03-26	0
A	CAD	2020-03-27	0
A	CAD	2020-03-28	0
A	CAD	2020-03-29	0
A	CAD	2020-03-30	0
A	CAD	2020-03-31	0
A	CAD	2020-04-01	0
A	CAD	2020-04-02	0
A	CAD	2020-04-03	0
A	CAD	2020-04-04	0
A	CAD	2020-04-05	0
A	CAD	2020-04-06	0
A	CAD	2020-04-07	0
A	CAD	2020-04-08	0
A	CAD	2020-04-09	0
A	CAD	2020-04-10	0
A	CAD	2020-04-11	0
A	CAD	2020-04-12	0
A	CAD	2020-04-13	0
A	CAD	2020-04-14	0
A	CAD	2020-04-15	0
A	CAD	2020-04-16	0
A	CAD	2020-04-17	0
A	CAD	2020-04-18	0
A	CAD	2020-04-19	0
A	CAD	2020-04-20	0
A	CAD	2020-04-21	0
A	CAD	2020-04-22	0
A	CAD	2020-04-23	0
A	CAD	2020-04-24	0
A	CAD	2020-04-25	0
A	CAD	2020-04-26	0
A	CAD	2020-04-27	0
A	CAD	2020-04-28	0
A	CAD	2020-04-29	0
A	CAD	2020-04-30	0
A	CAD	2020-05-01	0
A	CAD	2020-05-02	0
A	CAD	2020-05-03	0
A	CAD	2020-05-04	0
A	CAD	2020-05-05	0
A	CAD	2020-05-06	0
A	CAD	2020-05-07	0
A	CAD	2020-05-08	0
A	CAD	2020-05-09	0
A	CAD	2020-05-10	0
A	CAD	2020-05-11	0
A	CAD	2020-05-12	0
A	CAD	2020-05-13	0
A	CAD	2020-05-14	0
A	CAD	2020-05-15	0
A	CAD	2020-05-16	0
A	CAD	2020-05-17	0
A	CAD	2020-05-18	0
A	CAD	2020-05-19	0
A	CAD	2020-05-20	0
A	CAD	2020-05-21	0
A	CAD	2020-05-22	0
A	CAD	2020-05-23	0
A	CAD	2020-05-24	0
A	CAD	2020-05-25	0
A	CAD	2020-05-26	0
A	CAD	2020-05-27	0
A	CAD	2020-05-28	0
A	CAD	2020-05-29	0
A	CAD	2020-05-30	0
A	CAD	2020-05-31	0
A	CAD	2020-06-01	0
A	CAD	2020-06-02	0
A	CAD	2020-06-03	0
A	CAD	2020-06-04	0
A	CAD	2020-06-05	0
A	CAD	2020-06-06	0
A	CAD	2020-06-07	0
A	CAD	2020-06-08	0
A	CAD	2020-06-09	0
A	CAD	2020-06-10	0
A	CAD	2020-06-11	0
A	CAD	2020-06-12	0
A	CAD	2020-06-13	0
A	CAD	2020-06-14	0
A	CAD	2020-06-15	0
A	CAD	2020-06-16	0
A	CAD	2020-06-17	0
A	CAD	2020-06-18	0
A	CAD	2020-06-19	0
A	CAD	2020-06-20	0
A	CAD	2020-06-21	0
A	CAD	2020-06-22	0
A	CAD	2020-06-23	0
A	CAD	2020-06-24	0
A	CAD	2020-06-25	0
A	CAD	2020-06-26	0
A	CAD	2020-06-27	0
A	CAD	2020-06-28	0
A	CAD	2020-06-29	0
A	CAD	2020-06-30	0
A	CAD	2020-07-01	0
A	CAD	2020-07-02	0
A	CAD	2020-07-03	0
A	CAD	2020-07-04	0
A	CAD	2020-07-05	0
A	CAD	2020-07-06	0
A	CAD	2020-07-07	0
A	CAD	2020-07-08	0
A	CAD	2020-07-09	0
A	CAD	2020-07-10	0
A	CAD	2020-07-11	0
A	CAD	2020-07-12	0
A	CAD	2020-07-13	0
A	CAD	2020-07-14	0
A	CAD	2020-07-15	0
A	CAD	2020-07-16	0
A	CAD	2020-07-17	0
A	CAD	2020-07-18	0
A	CAD	2020-07-19	0
A	CAD	2020-07-20	0
A	CAD	2020-07-21	0
A	CAD	2020-07-22	0
A	CAD	2020-07-23	0
A	CAD	2020-07-24	0
A	CAD	2020-07-25	0
A	CAD	2020-07-26	0
A	CAD	2020-07-27	0
A	CAD	2020-07-28	0
A	CAD	2020-07-29	0
A	CAD	2020-07-30	0
A	CAD	2020-07-31	0
A	CAD	2020-08-01	0
A	CAD	2020-08-02	0
A	CAD	2020-08-03	0
A	CAD	2020-08-04	0
A	CAD	2020-08-05	0
A	CAD	2020-08-06	0
A	CAD	2020-08-07	0
A	CAD	2020-08-08	0
A	CAD	2020-08-09	0
A	CAD	2020-08-10	0
A	CAD	2020-08-11	0
A	CAD	2020-08-12	0
A	CAD	2020-08-13	0
A	CAD	2020-08-14	0
A	CAD	2020-08-15	0
A	CAD	2020-08-16	0
A	CAD	2020-08-17	0
A	CAD	2020-08-18	0
A	CAD	2020-08-19	0
A	CAD	2020-08-20	0
A	CAD	2020-08-21	0
A	CAD	2020-08-22	0
A	CAD	2020-08-23	0
A	CAD	2020-08-24	0
A	CAD	2020-08-25	0
A	CAD	2020-08-26	0
A	CAD	2020-08-27	0
A	CAD	2020-08-28	0
A	CAD	2020-08-29	0
A	CAD	2020-08-30	0
A	CAD	2020-08-31	0
A	CAD	2020-09-01	0
A	CAD	2020-09-02	0
A	CAD	2020-09-03	0
A	CAD	2020-09-04	0
A	CAD	2020-09-05	0
A	CAD	2020-09-06	0
A	CAD	2020-09-07	0
A	CAD

Fill Missing Dates by Group