Insert Missing Time Rows into a Dataframe

Python pandas: insert rows for missing dates, time series in groupby dataframe

Use custom function with DataFrame.asfreq in GroupBy.apply and then reassign Index by GroupBy.cumcount:

df['date'] = pd.to_datetime(df['date'])

df = (df.set_index('date')
        .groupby('Serial_no')
        .apply(lambda x: x.asfreq('MS'))
        .drop('Serial_no', axis=1))
df = df.reset_index()
df["Index"] = df.groupby("Serial_no").cumcount() + 1
print (df)
    Serial_no       date  Index     x    y
0           1 2014-01-01      1   2.0  3.0
1           1 2014-02-01      2   NaN  NaN
2           1 2014-03-01      3   3.0  3.0
3           1 2014-04-01      4   6.0  2.0
4           2 2011-03-01      1   5.1  1.3
5           2 2011-04-01      2   5.8  0.6
6           2 2011-05-01      3   6.5 -0.1
7           2 2011-06-01      4   NaN  NaN
8           2 2011-07-01      5   3.0  5.0
9           3 2019-10-01      1   7.9 -1.5
10          3 2019-11-01      2   8.6 -2.2
11          3 2019-12-01      3   NaN  NaN
12          3 2020-01-01      4  10.0 -3.6
13          3 2020-02-01      5  10.7 -4.3
14          3 2020-03-01      6   4.0  3.0

Alternative solution with DataFrame.reindex:

df['date'] = pd.to_datetime(df['date'])

f = lambda x: x.reindex(pd.date_range(x.index.min(), x.index.max(), freq='MS', name='date'))
df = df.set_index('date').groupby('Serial_no').apply(f).drop('Serial_no', axis=1)
df = df.reset_index()
df["Index"] = df.groupby("Serial_no").cumcount() + 1

Fill missing dates in a pandas DataFrame

You could create a date range and use "Fecha" column to set_index + reindex to add missing months. Then fillna + reset_index fetches the desired outcome:

df['Fecha'] = pd.to_datetime(df['Fecha'])
df = (df.set_index('Fecha')
      .reindex(pd.date_range('2020-01-01', '2021-12-01', freq='MS'))
      .rename_axis(['Fecha'])
      .fillna(0)
      .reset_index())

Output:

        Fecha  unidades
0  2020-01-01       2.0
1  2020-02-01       0.0
2  2020-03-01       0.0
3  2020-04-01       0.0
4  2020-05-01       0.0
5  2020-06-01       0.0
6  2020-07-01       0.0
7  2020-08-01       0.0
8  2020-09-01       4.0
9  2020-10-01      11.0
10 2020-11-01       4.0
11 2020-12-01       2.0
12 2021-01-01       0.0
13 2021-02-01       0.0
14 2021-03-01       9.0
15 2021-04-01       2.0
16 2021-05-01       1.0
17 2021-06-01       0.0
18 2021-07-01       1.0
19 2021-08-01       0.0
20 2021-09-01       0.0
21 2021-10-01       0.0
22 2021-11-01       0.0
23 2021-12-01       0.0

insert missing rows in a Dataframe and fill with previous row values for other columns

An alternative, using an outer join:

t = pd.date_range(df.DateTime.min(), df.DateTime.max(), freq="5s", name="DateTime")
pd.merge(pd.DataFrame(t), df, how="outer").ffill()

Output:

Out[3]:
             DateTime    Price
0 2022-03-04 09:15:00  34526.0
1 2022-03-04 09:15:05  34487.0
2 2022-03-04 09:15:10  34470.0
3 2022-03-04 09:15:15  34470.0
4 2022-03-04 09:15:20  34466.0
5 2022-03-04 09:15:25  34466.0
6 2022-03-04 09:15:30  34466.0
7 2022-03-04 09:15:35  34466.0
8 2022-03-04 09:15:40  34466.0
9 2022-03-04 09:15:45  34448.0

how to add missing rows of time series data to panda dataframes in python

If need add 0 for missing Datetimes for each product separately use custom function in GroupBy.apply with DataFrame.reindex by minimal and maximal datetime:

df = pd.read_csv("test.txt", sep="\t", parse_dates=['date'])

f = lambda x: x.reindex(pd.date_range(x.index.min(), 
                                      x.index.max(), name='date'), fill_value=0)
df = (df.set_index('date')
        .groupby('product')
        .apply(f)
        .drop('product', axis=1)
        .reset_index())
print (df)
   product       date  price  amount
0        A 2019-11-17     10      20
1        A 2019-11-18      0       0
2        A 2019-11-19     15      20
3        A 2019-11-20      0       0
4        A 2019-11-21      0       0
5        A 2019-11-22      0       0
6        A 2019-11-23      0       0
7        A 2019-11-24     20      30
8        C 2019-12-01     40      50
9        C 2019-12-02      0       0
10       C 2019-12-03      0       0
11       C 2019-12-04      0       0
12       C 2019-12-05     45      35

Add missing timestamp row to a dataframe

assuming your df looks like

              datetime  value
0  2020-12-01T08:00:00  145.9
1  2020-12-01T10:00:00  100.0
2  2020-12-01T16:00:00   99.3
3  2020-12-01T18:00:00   91.0

make sure datetime column is dtype datetime;

df['datetime'] = pd.to_datetime(df['datetime'])

so that you can now resample to 2-hourly frequency:

df.resample('2H', on='datetime').mean()

                     value
datetime                  
2020-12-01 08:00:00  145.9
2020-12-01 10:00:00  100.0
2020-12-01 12:00:00    NaN
2020-12-01 14:00:00    NaN
2020-12-01 16:00:00   99.3
2020-12-01 18:00:00   91.0

Note that you don't need to set the on= keyword if your df already has a datetime index. The df resulting from resampling will have a datetime index.

Also note that I'm using .mean() as aggfunc, meaning that if you have multiple values within the two hour intervals, you'll get the mean of that.

Insert rows for missing dates/times

I think the easiest thing ist to set Date first as already described, convert to zoo, and then just set a merge:

df$timestamp<-as.POSIXct(df$timestamp,format="%m/%d/%y %H:%M")

df1.zoo<-zoo(df[,-1],df[,1]) #set date to Index

df2 <- merge(df1.zoo,zoo(,seq(start(df1.zoo),end(df1.zoo),by="min")), all=TRUE)

Start and end are given from your df1 (original data) and you are setting by - e.g min - as you need for your example. all=TRUE sets all missing values at the missing dates to NAs.

Add missing dates to pandas dataframe

You could use Series.reindex:

import pandas as pd

idx = pd.date_range('09-01-2013', '09-30-2013')

s = pd.Series({'09-02-2013': 2,
               '09-03-2013': 10,
               '09-06-2013': 5,
               '09-07-2013': 1})
s.index = pd.DatetimeIndex(s.index)

s = s.reindex(idx, fill_value=0)
print(s)

yields

2013-09-01     0
2013-09-02     2
2013-09-03    10
2013-09-04     0
2013-09-05     0
2013-09-06     5
2013-09-07     1
2013-09-08     0
...

Add missing rows in pandas DataFrame

Here's one way using groupby.apply where we use date_range to add the missing times. Then merge it back to df and fill in the missing values of the other columns:

df['time'] = pd.to_datetime(df['time'])
out = df.merge(df.groupby('id')['time'].apply(lambda x: pd.date_range(x.iat[0], x.iat[-1], freq='S')).explode(), how='right')
out['id'] = out['id'].ffill().astype(int)
out['reward'] = out['reward'].fillna(0)

Output:

    id  reward                time
0    1    0.10 2022-04-23 10:00:00
1    1    0.00 2022-04-23 10:00:01
2    1    0.00 2022-04-23 10:00:02
3    1    0.00 2022-04-23 10:00:03
4    1    0.00 2022-04-23 10:00:04
5    1    0.15 2022-04-23 10:00:05
6    1    0.00 2022-04-23 10:00:06
7    1    0.05 2022-04-23 10:00:07
8    2    0.25 2022-04-23 12:00:00
9    2    0.00 2022-04-23 12:00:01
10   2    0.00 2022-04-23 12:00:02
11   2    0.40 2022-04-23 12:00:03
12   3    0.45 2022-04-23 15:00:00

Insert Missing Time Rows into a Dataframe

Python pandas: insert rows for missing dates, time series in groupby dataframe

Fill missing dates in a pandas DataFrame

insert missing rows in a Dataframe and fill with previous row values for other columns

how to add missing rows of time series data to panda dataframes in python

Add missing timestamp row to a dataframe

Insert rows for missing dates/times

Add missing dates to pandas dataframe

Add missing rows in pandas DataFrame

Related Topics

Leave a reply