Pandas: Merge data frames on datetime index
You can add parameters left_index=True
and right_index=True
if you need merge by indexes in function merge
:
merge=pd.merge(df,d, how='inner', left_index=True, right_index=True)
Sample (first value of index in d
was changed for matching):
print df
catcode_amt type feccandid_amt amount
date
1915-12-31 A5000 24K H6TX08100 1000
1916-12-31 T6100 24K H8CA52052 500
1954-12-31 H3100 24K S8AK00090 1000
1985-12-31 J7120 24E H8OH18088 36
1997-12-31 z9600 24K S6ND00058 2000
print d
catcode_disp disposition feccandid_disp bills
date
1997-12-31 A0000 support S4HI00011 1.0
2007-12-31 A1000 oppose S4IA00020', 'P20000741 1 NaN
2007-12-31 A1000 support S8MT00010 1.0
2007-12-31 A1500 support S6WI00061 2.0
2007-12-31 A1600 support S4IA00020', 'P20000741 3 NaN
merge=pd.merge(df,d, how='inner', left_index=True, right_index=True)
print merge
catcode_amt type feccandid_amt amount catcode_disp disposition \
date
1997-12-31 z9600 24K S6ND00058 2000 A0000 support
feccandid_disp bills
date
1997-12-31 S4HI00011 1.0
Or you can use concat
:
print pd.concat([df,d], join='inner', axis=1)
date
1997-12-31 z9600 24K S6ND00058 2000 A0000 support
feccandid_disp bills
date
1997-12-31 S4HI00011 1.0
EDIT: EdChum is right:
I add duplicates to DataFrame df
(last 2 values in index):
print df
catcode_amt type feccandid_amt amount
date
1915-12-31 A5000 24K H6TX08100 1000
1916-12-31 T6100 24K H8CA52052 500
1954-12-31 H3100 24K S8AK00090 1000
2007-12-31 J7120 24E H8OH18088 36
2007-12-31 z9600 24K S6ND00058 2000
print d
catcode_disp disposition feccandid_disp bills
date
1997-12-31 A0000 support S4HI00011 1.0
2007-12-31 A1000 oppose S4IA00020', 'P20000741 1 NaN
2007-12-31 A1000 support S8MT00010 1.0
2007-12-31 A1500 support S6WI00061 2.0
2007-12-31 A1600 support S4IA00020', 'P20000741 3 NaN
merge=pd.merge(df,d, how='inner', left_index=True, right_index=True)
print merge
catcode_amt type feccandid_amt amount catcode_disp disposition \
date
2007-12-31 J7120 24E H8OH18088 36 A1000 oppose
2007-12-31 J7120 24E H8OH18088 36 A1000 support
2007-12-31 J7120 24E H8OH18088 36 A1500 support
2007-12-31 J7120 24E H8OH18088 36 A1600 support
2007-12-31 z9600 24K S6ND00058 2000 A1000 oppose
2007-12-31 z9600 24K S6ND00058 2000 A1000 support
2007-12-31 z9600 24K S6ND00058 2000 A1500 support
2007-12-31 z9600 24K S6ND00058 2000 A1600 support
feccandid_disp bills
date
2007-12-31 S4IA00020', 'P20000741 1 NaN
2007-12-31 S8MT00010 1.0
2007-12-31 S6WI00061 2.0
2007-12-31 S4IA00020', 'P20000741 3 NaN
2007-12-31 S4IA00020', 'P20000741 1 NaN
2007-12-31 S8MT00010 1.0
2007-12-31 S6WI00061 2.0
2007-12-31 S4IA00020', 'P20000741 3 NaN
Merging two DataFrames on DatetimeIndex
I've tried some more desperate stuff, and apparently, I needed to specify columns I want to merge. Also, outer join method is required here.
yfdata = pd.merge(coarse, fine, how='outer', on=coarse.columns.to_list(), left_index=True, right_index=True)
Pandas merge on `datetime` or `datetime` in `datetimeIndex`
So here's the option with merging:
Assume you have two DataFrames:
import pandas as pd
df1 = pd.DataFrame({'date': ['2015-01-01', '2015-01-02', '2015-01-03'],
'data': ['A', 'B', 'C']})
df2 = pd.DataFrame({'date': ['2015-01-01 to 2015-01-02', '2015-01-01 to 2015-01-02', '2015-01-02 to 2015-01-03'],
'data': ['E', 'F', 'G']})
Now do some cleaning to get all of the dates you need and make sure they are datetime
df1['date'] = pd.to_datetime(df1.date)
df2[['start', 'end']] = df2['date'].str.split(' to ', expand=True)
df2['start'] = pd.to_datetime(df2.start)
df2['end'] = pd.to_datetime(df2.end)
# No need for this anymore
df2 = df2.drop(columns='date')
Now merge it all together. You'll get 99x10K rows.
df = df1.assign(dummy=1).merge(df2.assign(dummy=1), on='dummy').drop(columns='dummy')
And subset to the dates that fall in between the ranges:
df[(df.date >= df.start) & (df.date <= df.end)]
# date data_x data_y start end
#0 2015-01-01 A E 2015-01-01 2015-01-02
#1 2015-01-01 A F 2015-01-01 2015-01-02
#3 2015-01-02 B E 2015-01-01 2015-01-02
#4 2015-01-02 B F 2015-01-01 2015-01-02
#5 2015-01-02 B G 2015-01-02 2015-01-03
#8 2015-01-03 C G 2015-01-02 2015-01-03
If for instance, some dates in df2
were a single date, since we're using .str.split
we will get None
for the second date. Then just use .loc
to set it appropriately.
df2 = pd.DataFrame({'date': ['2015-01-01 to 2015-01-02', '2015-01-01 to 2015-01-02', '2015-01-02 to 2015-01-03',
'2015-01-03'],
'data': ['E', 'F', 'G', 'H']})
df2[['start', 'end']] = df2['date'].str.split(' to ', expand=True)
df2.loc[df2.end.isnull(), 'end'] = df2.loc[df2.end.isnull(), 'start']
# data start end
#0 E 2015-01-01 2015-01-02
#1 F 2015-01-01 2015-01-02
#2 G 2015-01-02 2015-01-03
#3 H 2015-01-03 2015-01-03
Now the rest follows unchanged
Pandas - how to merge dataframes on datetime column of different format?
Use merge_asof
with sorted both DataFrames by datetimes:
#if necessary
df1['Time Stamp'] = pd.to_datetime(df1['Time Stamp'])
df2['Time Stamp'] = pd.to_datetime(df2['Time Stamp'])
df1 = df1.sort_values('Time Stamp')
df2 = df2.sort_values('Time Stamp')
df = pd.merge_asof(df1, df2, on='Time Stamp')
print (df)
Time Stamp HP_1H_mean Coolant1_1H_mean Extreme_1H_mean \
0 2019-07-26 07:00:00 410.637966 414.607081 0.0
1 2019-07-26 08:00:00 403.521735 424.787366 0.0
2 2019-07-26 09:00:00 403.143925 425.739639 0.0
3 2019-07-26 10:00:00 410.542895 426.210538 0.0
4 2019-07-27 00:00:00 0.000000 0.000000 0.0
5 2019-07-27 01:00:00 0.000000 0.000000 0.0
6 2019-07-27 02:00:00 0.000000 0.000000 0.0
7 2019-07-27 03:00:00 0.000000 0.000000 0.0
Qty Compl
0 150
1 150
2 150
3 150
4 20
5 20
6 20
7 20
Merge two dataframes on closest matching datetime index
Logic here use the merge_asof
, we need to adjust it due to , merge_asof
will use the 2nd dataframe mutiple times , then we need additional key here is datetime to drop the duplicate
masterdf.index=pd.to_datetime(masterdf.index)
masterdf=masterdf.sort_index().reset_index()
slavedf.index=pd.to_datetime(slavedf.index)
slavedf=slavedf.sort_index().reset_index()
slavedf['datetime2']=slavedf['datetime']
slavedf['key']=slavedf.index
newdf=pd.merge_asof(masterdf,slavedf,on='datetime',tolerance=pd.Timedelta('60s'),direction='nearest')
newdf['diff']=(newdf.datetime-newdf.datetime2).abs()
newdf=newdf.sort_values('diff').drop_duplicates('key')
newdf
Out[35]:
datetime AA BB datetime2 diff
2 2019-10-01 07:53:54 77.425134 60 2019-10-01 07:53:54 00:00:00
1 2019-10-01 07:53:01 77.491655 50 2019-10-01 07:53:00 00:00:01
Merge two dataframes with different Date Time Indexes
You can create a temporary merge key
in df1
by normalising the index of df1
then you should be able to merge
df1
with the other dataframe df2
based on this merge
key:
df1.assign(key=df1.index.normalize())\
.merge(df2, left_on='key', right_index=True, how='left').drop('key', 1)
A B C
2019-08-26 13:00:00 a 1 Y
2019-08-26 13:30:00 b 2 Y
2019-08-26 14:00:00 c 3 Y
2019-08-26 14:30:00 d 4 Y
2019-08-26 15:00:00 e 5 Y
Combine multiple dfs based on datetime index
Use merge
:
>>> pd.merge(df1, df2, on='Date', how='outer')
Date x y z n
0 2021-07-01 1 2 NaN NaN
1 2021-07-02 2 4 NaN NaN
2 2021-07-06 3 6 5.0 10.0
3 2021-07-07 4 8 6.0 12.0
4 2021-07-08 5 10 7.0 14.0
How to combine two different length dataframes with datetime index
concat
by default concatenate along rows (axis=0
). You can specify axis=1
so it concatenate along columns (and join on index):
pd.concat([a, b], axis=1)
A B
01-01-1990 1.0 4.0
01-01-1991 2.0 NaN
01-01-1993 3.0 7.0
01-01-1992 NaN 6.0
01-01-1994 NaN 8.0
Related Topics
Python Strftime - Date Without Leading 0
Django Model Choice Option as a Multi Select Box
How to Downgrade Tensorflow, Multiple Versions Possible
Make Alternate Letters Capital
Printing Simple Diamond Pattern in Python
Python Sockets Multiple Messages on Same Connection
Importerror: No Module Named Bs4 (Beautifulsoup)
How to Populate New Column Based on Values in Other Columns
How to Find the Maximum Consecutive Occurrences of a Number in Python
Keras + Tensorflow and Multiprocessing in Python
Using Pyserial to Send Binary Data
Python Command Not Working in Command Prompt
Install Utils Package in Python Facing With Error Package Not Found
Opencv Typeerror: Expected Cv::Umat for Argument 'Src' - What Is This
How to Delete a Character in an Item in a List (Python)
How to Deal With Certificates Using Selenium
How to Compile Multiple Python Files into Single .Exe File Using Pyinstaller