Pandas Can only compare identically-labeled DataFrame objects error
Here's a small example to demonstrate this (which only applied to DataFrames, not Series, until Pandas 0.19 where it applies to both):
In [1]: df1 = pd.DataFrame([[1, 2], [3, 4]])
In [2]: df2 = pd.DataFrame([[3, 4], [1, 2]], index=[1, 0])
In [3]: df1 == df2
Exception: Can only compare identically-labeled DataFrame objects
One solution is to sort the index first (Note: some functions require sorted indexes):
In [4]: df2.sort_index(inplace=True)
In [5]: df1 == df2
Out[5]:
0 1
0 True True
1 True True
Note: ==
is also sensitive to the order of columns, so you may have to use sort_index(axis=1)
:
In [11]: df1.sort_index().sort_index(axis=1) == df2.sort_index().sort_index(axis=1)
Out[11]:
0 1
0 True True
1 True True
Note: This can still raise (if the index/columns aren't identically labelled after sorting).
Comparing 2 dataframes gives : Can only compare identically-labeled DataFrame objects
You can use reindex_like
to make bru2 have the same indexing as bru then compare the dataframes.
bru2.reindex_like(bru).compare(bru)
And you can use pd.Index.difference
to find the rows or columns in bru2 that are in bru.
bru.index.difference(bru2.index) #and like wise with bru.columns and bru2.columns
Compare two DataFrames for differences but getting 'Can only compare identically-labeled DataFrame objects' error
Seems some indices are different, is possible extract same in both by Index.intersection
:
BOOL_FIELDS = ['is_mobile','is_desktop','is_cancelled','is_existing_customer']
customer_df_2020.set_index('customer_id',inplace=True)
customer_df_2021.set_index('customer_id',inplace=True)
sameidx = customer_df_2020.index.intersection(customer_df_2021.index)
temp_df = (customer_df_2020.loc[sameidx, BOOL_FIELDS] !=
customer_df_2021.loc[sameidx, BOOL_FIELDS])
ErrorCan only compare identically-labeled Series objects and sort_index
I think you need reset_index
for same index values and then comapare - for create new column is better use mask
or numpy.where
:
Also instead +
use |
because working with booleans.
df1 = df1.reset_index(drop=True)
df2 = df2.reset_index(drop=True)
df1['v_100'] = df1['choice'].mask(df1['choice'] != df2['choice'],
(df1['choice'] + df2['choice']) * 0.5)
df1['v_100'] = np.where(df1['choice'] != df2['choice'],
(df1['choice'] | df2['choice']) * 0.5,
df1['choice'])
Samples:
print (df1)
v_100 choice
5 7 True
6 0 True
7 7 False
8 2 True
print (df2)
v_100 choice
4 1 False
5 2 True
6 74 True
7 6 True
df1 = df1.reset_index(drop=True)
df2 = df2.reset_index(drop=True)
print (df1)
v_100 choice
0 7 True
1 0 True
2 7 False
3 2 True
print (df2)
v_100 choice
0 1 False
1 2 True
2 74 True
3 6 True
df1['v_100'] = df1['choice'].mask(df1['choice'] != df2['choice'],
(df1['choice'] | df2['choice']) * 0.5)
print (df1)
v_100 choice
0 0.5 True
1 1.0 True
2 0.5 False
3 1.0 True
Pandas Join- Can only compare identically-labeled Series objects
I suggest you use pd.merge
df = pd.merge(telemetry, errors1, how='left', left_on=['machineID','datetime'], right_on = ['machineID','datetime'])
Python Pandas Only Compare Identically Labeled DataFrame Objects
In order to get around this, you want to compare the underlying numpy arrays.
import pandas as pd
df1 = pd.DataFrame([[1, 2], [3, 4]], columns=['A', 'B'], index=['One', 'Two'])
df2 = pd.DataFrame([[1, 2], [3, 4]], columns=['a', 'b'], index=['one', 'two'])
df1.values == df2.values
array([[ True, True],
[ True, True]], dtype=bool)
Related Topics
Why Does Sys.Exit() Not Exit When Called Inside a Thread in Python
Multiple Variables in a 'With' Statement
Using Break in a List Comprehension
How to Calculate Mean Values Grouped on Another Column in Pandas
Move an Object Every Few Seconds in Pygame
Print a String as Hexadecimal Bytes
Matplotlib Fill Between Multiple Lines
How to Make Image/Images Disappear in Pygame
How to Set Class Attributes from Variable Arguments (Kwargs) in Python
What Is the Point of Indexing in Pandas
How to Redirect Stdout and Stderr to Logger in Python