Pandas: Merge Data Frames on Datetime Index

Pandas: Merge data frames on datetime index

You can add parameters left_index=True and right_index=True if you need merge by indexes in function merge:

merge=pd.merge(df,d, how='inner', left_index=True, right_index=True)

Sample (first value of index in d was changed for matching):

print df
           catcode_amt type feccandid_amt  amount
date                                             
1915-12-31       A5000  24K     H6TX08100    1000
1916-12-31       T6100  24K     H8CA52052     500
1954-12-31       H3100  24K     S8AK00090    1000
1985-12-31       J7120  24E     H8OH18088      36
1997-12-31       z9600  24K     S6ND00058    2000

print d
           catcode_disp disposition            feccandid_disp  bills
date                                                                
1997-12-31        A0000     support                 S4HI00011    1.0
2007-12-31        A1000      oppose  S4IA00020', 'P20000741 1    NaN
2007-12-31        A1000     support                 S8MT00010    1.0
2007-12-31        A1500     support                 S6WI00061    2.0
2007-12-31        A1600     support  S4IA00020', 'P20000741 3    NaN

merge=pd.merge(df,d, how='inner', left_index=True, right_index=True)
print merge
           catcode_amt type feccandid_amt  amount catcode_disp disposition  \
date                                                                         
1997-12-31       z9600  24K     S6ND00058    2000        A0000     support   

           feccandid_disp  bills  
date                              
1997-12-31      S4HI00011    1.0

Or you can use concat:

print pd.concat([df,d], join='inner', axis=1)

date                                                                         
1997-12-31       z9600  24K     S6ND00058    2000        A0000     support   

           feccandid_disp  bills  
date                              
1997-12-31      S4HI00011    1.0

EDIT: EdChum is right:

I add duplicates to DataFrame df (last 2 values in index):

print df
           catcode_amt type feccandid_amt  amount
date                                             
1915-12-31       A5000  24K     H6TX08100    1000
1916-12-31       T6100  24K     H8CA52052     500
1954-12-31       H3100  24K     S8AK00090    1000
2007-12-31       J7120  24E     H8OH18088      36
2007-12-31       z9600  24K     S6ND00058    2000

print d
           catcode_disp disposition            feccandid_disp  bills
date                                                                
1997-12-31        A0000     support                 S4HI00011    1.0
2007-12-31        A1000      oppose  S4IA00020', 'P20000741 1    NaN
2007-12-31        A1000     support                 S8MT00010    1.0
2007-12-31        A1500     support                 S6WI00061    2.0
2007-12-31        A1600     support  S4IA00020', 'P20000741 3    NaN

merge=pd.merge(df,d, how='inner', left_index=True, right_index=True)

print merge
           catcode_amt type feccandid_amt  amount catcode_disp disposition  \
date                                                                         
2007-12-31       J7120  24E     H8OH18088      36        A1000      oppose   
2007-12-31       J7120  24E     H8OH18088      36        A1000     support   
2007-12-31       J7120  24E     H8OH18088      36        A1500     support   
2007-12-31       J7120  24E     H8OH18088      36        A1600     support   
2007-12-31       z9600  24K     S6ND00058    2000        A1000      oppose   
2007-12-31       z9600  24K     S6ND00058    2000        A1000     support   
2007-12-31       z9600  24K     S6ND00058    2000        A1500     support   
2007-12-31       z9600  24K     S6ND00058    2000        A1600     support   

                      feccandid_disp  bills  
date                                         
2007-12-31  S4IA00020', 'P20000741 1    NaN  
2007-12-31                 S8MT00010    1.0  
2007-12-31                 S6WI00061    2.0  
2007-12-31  S4IA00020', 'P20000741 3    NaN  
2007-12-31  S4IA00020', 'P20000741 1    NaN  
2007-12-31                 S8MT00010    1.0  
2007-12-31                 S6WI00061    2.0  
2007-12-31  S4IA00020', 'P20000741 3    NaN

Merging two DataFrames on DatetimeIndex

I've tried some more desperate stuff, and apparently, I needed to specify columns I want to merge. Also, outer join method is required here.

yfdata = pd.merge(coarse, fine, how='outer', on=coarse.columns.to_list(), left_index=True, right_index=True)

Pandas merge on `datetime` or `datetime` in `datetimeIndex`

So here's the option with merging:

Assume you have two DataFrames:

import pandas as pd
df1 = pd.DataFrame({'date': ['2015-01-01', '2015-01-02', '2015-01-03'], 
                    'data': ['A', 'B', 'C']})
df2 = pd.DataFrame({'date': ['2015-01-01 to 2015-01-02', '2015-01-01 to 2015-01-02', '2015-01-02 to 2015-01-03'], 
                    'data': ['E', 'F', 'G']})

Now do some cleaning to get all of the dates you need and make sure they are datetime

df1['date'] = pd.to_datetime(df1.date)

df2[['start', 'end']] = df2['date'].str.split(' to ', expand=True)
df2['start'] = pd.to_datetime(df2.start)
df2['end'] = pd.to_datetime(df2.end)
# No need for this anymore
df2 = df2.drop(columns='date')

Now merge it all together. You'll get 99x10K rows.

df = df1.assign(dummy=1).merge(df2.assign(dummy=1), on='dummy').drop(columns='dummy')

And subset to the dates that fall in between the ranges:

df[(df.date >= df.start) & (df.date <= df.end)]
#        date data_x data_y      start        end
#0 2015-01-01      A      E 2015-01-01 2015-01-02
#1 2015-01-01      A      F 2015-01-01 2015-01-02
#3 2015-01-02      B      E 2015-01-01 2015-01-02
#4 2015-01-02      B      F 2015-01-01 2015-01-02
#5 2015-01-02      B      G 2015-01-02 2015-01-03
#8 2015-01-03      C      G 2015-01-02 2015-01-03

If for instance, some dates in df2 were a single date, since we're using .str.split we will get None for the second date. Then just use .loc to set it appropriately.

df2 = pd.DataFrame({'date': ['2015-01-01 to 2015-01-02', '2015-01-01 to 2015-01-02', '2015-01-02 to 2015-01-03',
                             '2015-01-03'], 
                    'data': ['E', 'F', 'G', 'H']})

df2[['start', 'end']] = df2['date'].str.split(' to ', expand=True)
df2.loc[df2.end.isnull(), 'end'] = df2.loc[df2.end.isnull(), 'start']
#  data      start        end
#0    E 2015-01-01 2015-01-02
#1    F 2015-01-01 2015-01-02
#2    G 2015-01-02 2015-01-03
#3    H 2015-01-03 2015-01-03

Now the rest follows unchanged

Pandas - how to merge dataframes on datetime column of different format?

Use merge_asof with sorted both DataFrames by datetimes:

#if necessary
df1['Time Stamp'] = pd.to_datetime(df1['Time Stamp'])
df2['Time Stamp'] = pd.to_datetime(df2['Time Stamp'])
df1 = df1.sort_values('Time Stamp')
df2 = df2.sort_values('Time Stamp')

df = pd.merge_asof(df1, df2, on='Time Stamp')
print (df)
           Time Stamp  HP_1H_mean  Coolant1_1H_mean  Extreme_1H_mean  \
0 2019-07-26 07:00:00  410.637966        414.607081              0.0   
1 2019-07-26 08:00:00  403.521735        424.787366              0.0   
2 2019-07-26 09:00:00  403.143925        425.739639              0.0   
3 2019-07-26 10:00:00  410.542895        426.210538              0.0   
4 2019-07-27 00:00:00    0.000000          0.000000              0.0   
5 2019-07-27 01:00:00    0.000000          0.000000              0.0   
6 2019-07-27 02:00:00    0.000000          0.000000              0.0   
7 2019-07-27 03:00:00    0.000000          0.000000              0.0   

   Qty Compl  
0        150  
1        150  
2        150  
3        150  
4         20  
5         20  
6         20  
7         20

Merge two dataframes on closest matching datetime index

Logic here use the merge_asof , we need to adjust it due to , merge_asof will use the 2nd dataframe mutiple times , then we need additional key here is datetime to drop the duplicate

masterdf.index=pd.to_datetime(masterdf.index)
masterdf=masterdf.sort_index().reset_index()
slavedf.index=pd.to_datetime(slavedf.index)
slavedf=slavedf.sort_index().reset_index()
slavedf['datetime2']=slavedf['datetime']
slavedf['key']=slavedf.index
newdf=pd.merge_asof(masterdf,slavedf,on='datetime',tolerance=pd.Timedelta('60s'),direction='nearest')
newdf['diff']=(newdf.datetime-newdf.datetime2).abs()
newdf=newdf.sort_values('diff').drop_duplicates('key')
newdf
Out[35]: 
             datetime         AA  BB           datetime2     diff
2 2019-10-01 07:53:54  77.425134  60 2019-10-01 07:53:54 00:00:00
1 2019-10-01 07:53:01  77.491655  50 2019-10-01 07:53:00 00:00:01

Merge two dataframes with different Date Time Indexes

You can create a temporary merge key in df1 by normalising the index of df1 then you should be able to merge df1 with the other dataframe df2 based on this merge key:

df1.assign(key=df1.index.normalize())\
   .merge(df2, left_on='key', right_index=True, how='left').drop('key', 1)

                     A  B  C
2019-08-26 13:00:00  a  1  Y
2019-08-26 13:30:00  b  2  Y
2019-08-26 14:00:00  c  3  Y
2019-08-26 14:30:00  d  4  Y
2019-08-26 15:00:00  e  5  Y

Combine multiple dfs based on datetime index

Use merge:

>>> pd.merge(df1, df2, on='Date', how='outer')
         Date  x   y    z     n
0  2021-07-01  1   2  NaN   NaN
1  2021-07-02  2   4  NaN   NaN
2  2021-07-06  3   6  5.0  10.0
3  2021-07-07  4   8  6.0  12.0
4  2021-07-08  5  10  7.0  14.0

How to combine two different length dataframes with datetime index

concat by default concatenate along rows (axis=0). You can specify axis=1 so it concatenate along columns (and join on index):

pd.concat([a, b], axis=1)

              A    B
01-01-1990  1.0  4.0
01-01-1991  2.0  NaN
01-01-1993  3.0  7.0
01-01-1992  NaN  6.0
01-01-1994  NaN  8.0

Pandas: Merge Data Frames on Datetime Index