Pandas: Merge Data Frames on Datetime Index

Pandas: Merge data frames on datetime index

You can add parameters left_index=True and right_index=True if you need merge by indexes in function merge:

merge=pd.merge(df,d, how='inner', left_index=True, right_index=True)

Sample (first value of index in d was changed for matching):

print df
catcode_amt type feccandid_amt amount
date
1915-12-31 A5000 24K H6TX08100 1000
1916-12-31 T6100 24K H8CA52052 500
1954-12-31 H3100 24K S8AK00090 1000
1985-12-31 J7120 24E H8OH18088 36
1997-12-31 z9600 24K S6ND00058 2000

print d
catcode_disp disposition feccandid_disp bills
date
1997-12-31 A0000 support S4HI00011 1.0
2007-12-31 A1000 oppose S4IA00020', 'P20000741 1 NaN
2007-12-31 A1000 support S8MT00010 1.0
2007-12-31 A1500 support S6WI00061 2.0
2007-12-31 A1600 support S4IA00020', 'P20000741 3 NaN

merge=pd.merge(df,d, how='inner', left_index=True, right_index=True)
print merge
catcode_amt type feccandid_amt amount catcode_disp disposition \
date
1997-12-31 z9600 24K S6ND00058 2000 A0000 support

feccandid_disp bills
date
1997-12-31 S4HI00011 1.0

Or you can use concat:

print pd.concat([df,d], join='inner', axis=1)

date
1997-12-31 z9600 24K S6ND00058 2000 A0000 support

feccandid_disp bills
date
1997-12-31 S4HI00011 1.0

EDIT: EdChum is right:

I add duplicates to DataFrame df (last 2 values in index):

print df
catcode_amt type feccandid_amt amount
date
1915-12-31 A5000 24K H6TX08100 1000
1916-12-31 T6100 24K H8CA52052 500
1954-12-31 H3100 24K S8AK00090 1000
2007-12-31 J7120 24E H8OH18088 36
2007-12-31 z9600 24K S6ND00058 2000

print d
catcode_disp disposition feccandid_disp bills
date
1997-12-31 A0000 support S4HI00011 1.0
2007-12-31 A1000 oppose S4IA00020', 'P20000741 1 NaN
2007-12-31 A1000 support S8MT00010 1.0
2007-12-31 A1500 support S6WI00061 2.0
2007-12-31 A1600 support S4IA00020', 'P20000741 3 NaN

merge=pd.merge(df,d, how='inner', left_index=True, right_index=True)
print merge
catcode_amt type feccandid_amt amount catcode_disp disposition \
date
2007-12-31 J7120 24E H8OH18088 36 A1000 oppose
2007-12-31 J7120 24E H8OH18088 36 A1000 support
2007-12-31 J7120 24E H8OH18088 36 A1500 support
2007-12-31 J7120 24E H8OH18088 36 A1600 support
2007-12-31 z9600 24K S6ND00058 2000 A1000 oppose
2007-12-31 z9600 24K S6ND00058 2000 A1000 support
2007-12-31 z9600 24K S6ND00058 2000 A1500 support
2007-12-31 z9600 24K S6ND00058 2000 A1600 support

feccandid_disp bills
date
2007-12-31 S4IA00020', 'P20000741 1 NaN
2007-12-31 S8MT00010 1.0
2007-12-31 S6WI00061 2.0
2007-12-31 S4IA00020', 'P20000741 3 NaN
2007-12-31 S4IA00020', 'P20000741 1 NaN
2007-12-31 S8MT00010 1.0
2007-12-31 S6WI00061 2.0
2007-12-31 S4IA00020', 'P20000741 3 NaN

Merging two DataFrames on DatetimeIndex

I've tried some more desperate stuff, and apparently, I needed to specify columns I want to merge. Also, outer join method is required here.

yfdata = pd.merge(coarse, fine, how='outer', on=coarse.columns.to_list(), left_index=True, right_index=True)

Pandas merge on `datetime` or `datetime` in `datetimeIndex`

So here's the option with merging:

Assume you have two DataFrames:

import pandas as pd
df1 = pd.DataFrame({'date': ['2015-01-01', '2015-01-02', '2015-01-03'],
'data': ['A', 'B', 'C']})
df2 = pd.DataFrame({'date': ['2015-01-01 to 2015-01-02', '2015-01-01 to 2015-01-02', '2015-01-02 to 2015-01-03'],
'data': ['E', 'F', 'G']})

Now do some cleaning to get all of the dates you need and make sure they are datetime

df1['date'] = pd.to_datetime(df1.date)

df2[['start', 'end']] = df2['date'].str.split(' to ', expand=True)
df2['start'] = pd.to_datetime(df2.start)
df2['end'] = pd.to_datetime(df2.end)
# No need for this anymore
df2 = df2.drop(columns='date')

Now merge it all together. You'll get 99x10K rows.

df = df1.assign(dummy=1).merge(df2.assign(dummy=1), on='dummy').drop(columns='dummy')

And subset to the dates that fall in between the ranges:

df[(df.date >= df.start) & (df.date <= df.end)]
# date data_x data_y start end
#0 2015-01-01 A E 2015-01-01 2015-01-02
#1 2015-01-01 A F 2015-01-01 2015-01-02
#3 2015-01-02 B E 2015-01-01 2015-01-02
#4 2015-01-02 B F 2015-01-01 2015-01-02
#5 2015-01-02 B G 2015-01-02 2015-01-03
#8 2015-01-03 C G 2015-01-02 2015-01-03

If for instance, some dates in df2 were a single date, since we're using .str.split we will get None for the second date. Then just use .loc to set it appropriately.

df2 = pd.DataFrame({'date': ['2015-01-01 to 2015-01-02', '2015-01-01 to 2015-01-02', '2015-01-02 to 2015-01-03',
'2015-01-03'],
'data': ['E', 'F', 'G', 'H']})

df2[['start', 'end']] = df2['date'].str.split(' to ', expand=True)
df2.loc[df2.end.isnull(), 'end'] = df2.loc[df2.end.isnull(), 'start']
# data start end
#0 E 2015-01-01 2015-01-02
#1 F 2015-01-01 2015-01-02
#2 G 2015-01-02 2015-01-03
#3 H 2015-01-03 2015-01-03

Now the rest follows unchanged

Pandas - how to merge dataframes on datetime column of different format?

Use merge_asof with sorted both DataFrames by datetimes:

#if necessary
df1['Time Stamp'] = pd.to_datetime(df1['Time Stamp'])
df2['Time Stamp'] = pd.to_datetime(df2['Time Stamp'])
df1 = df1.sort_values('Time Stamp')
df2 = df2.sort_values('Time Stamp')

df = pd.merge_asof(df1, df2, on='Time Stamp')
print (df)
Time Stamp HP_1H_mean Coolant1_1H_mean Extreme_1H_mean \
0 2019-07-26 07:00:00 410.637966 414.607081 0.0
1 2019-07-26 08:00:00 403.521735 424.787366 0.0
2 2019-07-26 09:00:00 403.143925 425.739639 0.0
3 2019-07-26 10:00:00 410.542895 426.210538 0.0
4 2019-07-27 00:00:00 0.000000 0.000000 0.0
5 2019-07-27 01:00:00 0.000000 0.000000 0.0
6 2019-07-27 02:00:00 0.000000 0.000000 0.0
7 2019-07-27 03:00:00 0.000000 0.000000 0.0

Qty Compl
0 150
1 150
2 150
3 150
4 20
5 20
6 20
7 20

Merge two dataframes on closest matching datetime index

Logic here use the merge_asof , we need to adjust it due to , merge_asof will use the 2nd dataframe mutiple times , then we need additional key here is datetime to drop the duplicate

masterdf.index=pd.to_datetime(masterdf.index)
masterdf=masterdf.sort_index().reset_index()
slavedf.index=pd.to_datetime(slavedf.index)
slavedf=slavedf.sort_index().reset_index()
slavedf['datetime2']=slavedf['datetime']
slavedf['key']=slavedf.index
newdf=pd.merge_asof(masterdf,slavedf,on='datetime',tolerance=pd.Timedelta('60s'),direction='nearest')
newdf['diff']=(newdf.datetime-newdf.datetime2).abs()
newdf=newdf.sort_values('diff').drop_duplicates('key')
newdf
Out[35]:
datetime AA BB datetime2 diff
2 2019-10-01 07:53:54 77.425134 60 2019-10-01 07:53:54 00:00:00
1 2019-10-01 07:53:01 77.491655 50 2019-10-01 07:53:00 00:00:01

Merge two dataframes with different Date Time Indexes

You can create a temporary merge key in df1 by normalising the index of df1 then you should be able to merge df1 with the other dataframe df2 based on this merge key:

df1.assign(key=df1.index.normalize())\
.merge(df2, left_on='key', right_index=True, how='left').drop('key', 1)


                     A  B  C
2019-08-26 13:00:00 a 1 Y
2019-08-26 13:30:00 b 2 Y
2019-08-26 14:00:00 c 3 Y
2019-08-26 14:30:00 d 4 Y
2019-08-26 15:00:00 e 5 Y

Combine multiple dfs based on datetime index

Use merge:

>>> pd.merge(df1, df2, on='Date', how='outer')
Date x y z n
0 2021-07-01 1 2 NaN NaN
1 2021-07-02 2 4 NaN NaN
2 2021-07-06 3 6 5.0 10.0
3 2021-07-07 4 8 6.0 12.0
4 2021-07-08 5 10 7.0 14.0

How to combine two different length dataframes with datetime index

concat by default concatenate along rows (axis=0). You can specify axis=1 so it concatenate along columns (and join on index):

pd.concat([a, b], axis=1)

A B
01-01-1990 1.0 4.0
01-01-1991 2.0 NaN
01-01-1993 3.0 7.0
01-01-1992 NaN 6.0
01-01-1994 NaN 8.0


Related Topics



Leave a reply



Submit