How to Keep Index When Using Pandas Merge

How to keep index when using pandas merge

In [5]: a.reset_index().merge(b, how="left").set_index('index')
Out[5]:
       col1  to_merge_on  col2
index
a         1            1     1
b         2            3     2
c         3            4   NaN

Note that for some left merge operations, you may end up with more rows than in a when there are multiple matches between a and b. In this case, you may need to drop duplicates.

Keep index of First dataframe when doing inner merge on columns

Use reset_index() to keep the index of ClientFileDf and then set that index:

df2 = pd.merge(ClientFileDf.reset_index(), df_CPCodeDF,  how='inner', \
left_on=['CPCode','CPPAN'], \
right_on = ['HEDGE_CP_CODE','HEDGE_PAN_NO']).set_index('index')

Setting the index after merging with pandas?

Here's what happens:

the output index is the intersection of the index/column merge keys [0, 1].
missing keys are replaced with NaN
NaNs result in the index type being upcasted to float

To set the index, just assign to it:

s2 = pd.merge(s, df, how='left', left_index=True, right_on='id')
s2.index = s.index

    score  id value
10      5  10   NaN
11      6  11     a
12      7  12   NaN
13      8  13     b
14      9  14   NaN

You can also merge on s (just because I dislike calling pd.merge directly):

(s.to_frame()
  .merge(df, how='left', left_index=True, right_on='id')
  .set_axis(s.index, axis=0, inplace=False))

    score  id value
10      5  10   NaN
11      6  11     a
12      7  12   NaN
13      8  13     b
14      9  14   NaN

Pandas merge and retain the index

Provisional solution:

In [255]: a = a.reset_index()

In [256]: a
Out[256]: 
   id1 id2  col1  to_merge_on
0    1   a     1            2
1    1   b     3            4
2    2   a     1            2
3    2   b     3            4

In [271]: c = pd.merge(a, b, how="left")

In [272]: c
Out[272]: 
   id1 id2  col1  to_merge_on  col2
0    1   a     1            2   NaN
1    2   a     1            2   NaN
2    2   b     3            3     2
3    1   b     3            4   NaN

In [273]: c = c.set_index(['id1','id2'])

In [274]: c
Out[274]: 
         col1  to_merge_on  col2
id1 id2                         
1   a       1            2   NaN
2   a       1            2   NaN
    b       3            3     2
1   b       3            4   NaN

merge two DataFrame with two columns and keep the same order with original indexes in the result

when constructing the merged dataframe, get the index values from each dataframe.

merged_df = pd.merge(df1, df2, how="outer", on=['key1', 'key2'])

use combine_first to combine index_x & index_y

merged_df['combined_index'] =merged_df.index_x.combine_first(merged_df.index_y)

sort using combined_index & index_x dropping columns which are not needed & resetting index.

output = merged_df.sort_values(
    ['combined_index', 'index_x']
).drop(
    ['index_x', 'index_y', 'combined_index'], axis=1
).reset_index(drop=True)

This results in the following output:

  key1 key2  Value1  Value2
0    K   a5   apple     NaN
1    K   a9     NaN   apple
2    K   a4   guava     NaN
3   A1   a7    kiwi    kiwi
4   A3   a9     NaN   grape
5   A2   a9   grape     NaN
6   B1   b2  banana  banana
7   C2   c7     NaN   guava
8   B9   b8   peach     NaN
9   C3   c1   berry  orange

Merge two dataframes by index

Use merge, which is an inner join by default:

pd.merge(df1, df2, left_index=True, right_index=True)

Or join, which is a left join by default:

df1.join(df2)

Or concat), which is an outer join by default:

pd.concat([df1, df2], axis=1)

Samples:

df1 = pd.DataFrame({'a':range(6),
                    'b':[5,3,6,9,2,4]}, index=list('abcdef'))

print (df1)
   a  b
a  0  5
b  1  3
c  2  6
d  3  9
e  4  2
f  5  4

df2 = pd.DataFrame({'c':range(4),
                    'd':[10,20,30, 40]}, index=list('abhi'))

print (df2)
   c   d
a  0  10
b  1  20
h  2  30
i  3  40

# Default inner join
df3 = pd.merge(df1, df2, left_index=True, right_index=True)
print (df3)
   a  b  c   d
a  0  5  0  10
b  1  3  1  20

# Default left join
df4 = df1.join(df2)
print (df4)
   a  b    c     d
a  0  5  0.0  10.0
b  1  3  1.0  20.0
c  2  6  NaN   NaN
d  3  9  NaN   NaN
e  4  2  NaN   NaN
f  5  4  NaN   NaN

# Default outer join
df5 = pd.concat([df1, df2], axis=1)
print (df5)
     a    b    c     d
a  0.0  5.0  0.0  10.0
b  1.0  3.0  1.0  20.0
c  2.0  6.0  NaN   NaN
d  3.0  9.0  NaN   NaN
e  4.0  2.0  NaN   NaN
f  5.0  4.0  NaN   NaN
h  NaN  NaN  2.0  30.0
i  NaN  NaN  3.0  40.0

How to merge two dataframes according to their indexes?

Having your DataFrame :

>>> df1 = pd.DataFrame({'col_a': [1, 2, 3]}, index=['a/aa/aaa','b/bb/bbb', 'c/cc/ccc'])
>>> df2 = pd.DataFrame({'col_b': [4, 5, 6]}, index=['bb/bbb', 'ccc', 'hello'])

And changing the index to column :

>>> df1=df1.reset_index(drop=False)
>>> df1 = df1.rename(columns={'index': 'value_df1'})
>>> df1
    value_df1   col_a
0   a/aa/aaa    1
1   b/bb/bbb    2
2   c/cc/ccc    3

>>> df2=df2.reset_index(drop=False)
>>> df2 = df2.rename(columns={'index': 'value_df2'})
>>> df2
    value_df2       col_b
0   bb/bbb          4
1   ccc             5
2   hello           6

We merge both DataFrame on the joincolumn :

>>> df1['join'] = 1
>>> df2['join'] = 1
>>> dfFull = df1.merge(df2, on='join').drop('join', axis=1)
>>> dfFull
    value_df1   col_a   value_df2       col_b
0   a/aa/aaa    1       bb/bbb          4
1   a/aa/aaa    1       ccc             5
2   a/aa/aaa    1       hello           6
3   b/bb/bbb    2       bb/bbb          4
4   b/bb/bbb    2       ccc             5
5   b/bb/bbb    2       hello           6
6   c/cc/ccc    3       bb/bbb          4
7   c/cc/ccc    3       ccc             5
8   c/cc/ccc    3       hello           6

Then we use an apply to match the initial index value :

>>> df2.drop('join', axis=1, inplace=True)
>>> dfFull['match'] = dfFull.apply(lambda x: x['value_df1'].find(x['value_df2']), axis=1).ge(0)
>>> dfFull
    value_df1   col_a   value_df2       col_b   match
0   a/aa/aaa    1       bb/bbb          4       False
1   a/aa/aaa    1       ccc             5       False
2   a/aa/aaa    1       hello           6       False
3   b/bb/bbb    2       bb/bbb          4       True
4   b/bb/bbb    2       ccc             5       False
5   b/bb/bbb    2       hello           6       False
6   c/cc/ccc    3       bb/bbb          4       False
7   c/cc/ccc    3       ccc             5       True
8   c/cc/ccc    3       hello           6       False

Filtering on the row where the column match is True and dropping the match column, we get the expected result :

>>> dfFull[dfFull['match']].drop(['match'], axis=1)
    value_df1   col_a   value_df2   col_b
3   b/bb/bbb    2       bb/bbb      4       
7   c/cc/ccc    3       ccc         5

This solution is inspired by this post.

pandas not matching initial index when I try to join/merge/loc

You can try with merge() method:

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html

It would help a lot if you could provide a snippet of dataframes you are working on.

Is there a way to merge on Interval Index and another Column Value in pandas?

Merge your dataframe on your UniqueID column then check if Trip_Date is between Start_Date and End_date. Finally, set to nan all rows where the condition is not met:

# Only if this columns have not datetime64 dtype
df1['Start_Date'] = pd.to_datetime(df1['Start_Date'], dayfirst=True)
df1['End_Date'] = pd.to_datetime(df1['End_Date'], dayfirst=True)
df2['Trip_Date'] = pd.to_datetime(df2['Trip_Date'], dayfirst=True)

out = pd.merge(df1, df2, on='UniqueID', how='left')
m = out['Trip_Date'].between(out['Start_Date'], out['End_Date'])

out.loc[~m, ['Trip_Date', 'Value']] = np.NaN

Output:

>>> out
  UniqueID Start_Date   End_Date  Trip_Date  Value
0      ID1 2020-01-01 2020-08-01 2020-02-10    1.0
1      ID1 2020-01-01 2020-08-01 2020-02-15  207.0
2      ID2 2020-02-01 2020-04-01 2020-03-06   10.0
3      ID3 2020-03-01 2020-05-01        NaT    NaN
4      ID4 2020-04-01 2020-09-01        NaT    NaN
5      ID5 2020-05-01 2020-10-01        NaT    NaN
6      ID6 2020-06-01 2020-11-01        NaT    NaN

How to Keep Index When Using Pandas Merge