Pandas "Can Only Compare Identically-Labeled Dataframe Objects" Error

Pandas Can only compare identically-labeled DataFrame objects error

Here's a small example to demonstrate this (which only applied to DataFrames, not Series, until Pandas 0.19 where it applies to both):

In [1]: df1 = pd.DataFrame([[1, 2], [3, 4]])

In [2]: df2 = pd.DataFrame([[3, 4], [1, 2]], index=[1, 0])

In [3]: df1 == df2
Exception: Can only compare identically-labeled DataFrame objects

One solution is to sort the index first (Note: some functions require sorted indexes):

In [4]: df2.sort_index(inplace=True)

In [5]: df1 == df2
Out[5]:
0 1
0 True True
1 True True

Note: == is also sensitive to the order of columns, so you may have to use sort_index(axis=1):

In [11]: df1.sort_index().sort_index(axis=1) == df2.sort_index().sort_index(axis=1)
Out[11]:
0 1
0 True True
1 True True

Note: This can still raise (if the index/columns aren't identically labelled after sorting).

Comparing 2 dataframes gives : Can only compare identically-labeled DataFrame objects

You can use reindex_like to make bru2 have the same indexing as bru then compare the dataframes.

bru2.reindex_like(bru).compare(bru)

And you can use pd.Index.difference to find the rows or columns in bru2 that are in bru.

bru.index.difference(bru2.index) #and like wise with bru.columns and bru2.columns

Compare two DataFrames for differences but getting 'Can only compare identically-labeled DataFrame objects' error

Seems some indices are different, is possible extract same in both by Index.intersection:

BOOL_FIELDS = ['is_mobile','is_desktop','is_cancelled','is_existing_customer']

customer_df_2020.set_index('customer_id',inplace=True)
customer_df_2021.set_index('customer_id',inplace=True)

sameidx = customer_df_2020.index.intersection(customer_df_2021.index)

temp_df = (customer_df_2020.loc[sameidx, BOOL_FIELDS] !=
customer_df_2021.loc[sameidx, BOOL_FIELDS])

ErrorCan only compare identically-labeled Series objects and sort_index

I think you need reset_index for same index values and then comapare - for create new column is better use mask or numpy.where:

Also instead + use | because working with booleans.

df1 = df1.reset_index(drop=True)
df2 = df2.reset_index(drop=True)
df1['v_100'] = df1['choice'].mask(df1['choice'] != df2['choice'],
(df1['choice'] + df2['choice']) * 0.5)

df1['v_100'] = np.where(df1['choice'] != df2['choice'],
(df1['choice'] | df2['choice']) * 0.5,
df1['choice'])

Samples:

print (df1)
v_100 choice
5 7 True
6 0 True
7 7 False
8 2 True

print (df2)
v_100 choice
4 1 False
5 2 True
6 74 True
7 6 True

df1 = df1.reset_index(drop=True)
df2 = df2.reset_index(drop=True)
print (df1)
v_100 choice
0 7 True
1 0 True
2 7 False
3 2 True

print (df2)
v_100 choice
0 1 False
1 2 True
2 74 True
3 6 True

df1['v_100'] = df1['choice'].mask(df1['choice'] != df2['choice'],
(df1['choice'] | df2['choice']) * 0.5)

print (df1)
v_100 choice
0 0.5 True
1 1.0 True
2 0.5 False
3 1.0 True

Pandas Join- Can only compare identically-labeled Series objects

I suggest you use pd.merge

df = pd.merge(telemetry, errors1, how='left', left_on=['machineID','datetime'], right_on = ['machineID','datetime'])

Python Pandas Only Compare Identically Labeled DataFrame Objects

In order to get around this, you want to compare the underlying numpy arrays.

import pandas as pd

df1 = pd.DataFrame([[1, 2], [3, 4]], columns=['A', 'B'], index=['One', 'Two'])
df2 = pd.DataFrame([[1, 2], [3, 4]], columns=['a', 'b'], index=['one', 'two'])

df1.values == df2.values

array([[ True, True],
[ True, True]], dtype=bool)


Related Topics



Leave a reply



Submit