Find Rows in a Data Frame Where Two Columns Are Equal

python pandas select rows where two columns are (not) equal

Ways to be confused by == versus != when comparing pd.Series

As expected

df[['Len_old', 'Len_new']].assign(NE=df.Len_old != df.Len_new)

   Len_old  Len_new     NE
0       15       15  False
1       12       12  False
2       10        8   True
3        4        5   True
4        9       10   True

But if one of the column's values were strings!

df[['Len_old', 'Len_new']].assign(NE=df.Len_old.astype(str) != df.Len_new)

   Len_old  Len_new    NE
0       15       15  True
1       12       12  True
2       10        8  True
3        4        5  True
4        9       10  True

Make sure both are the same types.

Pandas: select rows where two columns are different

I am a fan of readability, use query:

df.query('a != b')

Output:

   a  b   c
1  0  2  74
3  1  4  44

Pandas Identify Rows where two columns have same strings

One option to ignore the order between two columns is to sort each row within itself, which np.sort can do. Then you can form a new dataframe with these ordered rows. duplicated with keep=False will mark all the duplicated rows as True which we can use as a mask to index the original dataframe:

rows_sorted_df = pd.DataFrame(np.sort(df))
dups = rows_sorted_df.duplicated(keep=False)
result = df[dups]

to get

>>> rows_sorted_df

     0    1
0  BOS   SF
1   LA   SF
2   LA  NYC
3  BOS   SF

>>> dups

0     True
1    False
2    False
3     True

>>> result

  Source destination
0    BOS          SF
3     SF         BOS

Select rows that match values in multiple columns in pandas

TLDR

Use one of the following, based on your requirements:

df[(df[key_names] == keys).all(1)]

df[df[key_names].isin(keys).all(1)]

You're quite close, you have successfully created your mask, you just need to reduce it to a single dimension for indexing.

>>> df[key_names].isin(keys)
     k1     k2
0  True  False
1  True   True
2  True  False

You are only interested in rows where all values, are True, and so you can reduce the dimension using all across the first axis.

>>> df[key_names].isin(keys).all(1)
0    False
1     True
2    False
dtype: bool

The one caveat here is that isin is not order dependent, so you would get the same results using another ordering of your values.

>>> df[key_names].isin([5, 1]).all(1)
0    False
1     True
2    False
dtype: bool

If you only want an exact ordering match, use == for broadcasted comparison, instead of isin

>>> (df[key_names] == keys).all(1)
0    False
1     True
2    False
dtype: bool

>>> (df[key_names] == [5, 1]).all(1)
0    False
1    False
2    False
dtype: bool

The last step here is using the 1D mask you've created to index the original DataFrame:

>>> df[(df[key_names] == keys).all(1)]
   k1  k2  v1  v2
1   1   5   5   6

Find rows in a data frame where two columns are equal

mteq <- mtcars[mtcars$gear==mtcars$carb, ]

Pandas - find rows with matching values in two columns and multiply value in another column

One way is to groupby A + C, take the product and count, filter out those that only have a single item in the group, then inner merge back on A + C to your original frame, eg:

df.merge(
    df.groupby(['A', 'C']).D.agg(['prod', 'count'])
    [lambda r: r['count'] > 1],
    left_on=['A', 'C'],
    right_index=True
)

Gives you:

     A   C  D  id  prod  count
0  foo  10  9   1    63      2
2  foo  10  7   3    63      2
4  foo  50  5   5    15      2
6  foo  50  3   7    15      2

Then drop/rename columns as appropriate.

pandas: get rows by comparing two columns of dataframe to list of tuples

Use DataFrame.merge with DataFrame created by tuples, there is no on parameter for default interecton of all columns in both DataFrames, here A and B:

df = my_df.merge(pd.DataFrame(my_tuples, columns=['A','B']))
print (df)
   A  B  C  D
0  0  1  1  2
1  4  5  1  2

Or:

df = my_df[my_df.set_index(['A','B']).index.isin(my_tuples)]
print (df)
   A  B  C  D
0  0  1  1  2
2  4  5  1  2

Find rows with similar values in another dataframe

This is a perfect use case for melt as starting point before merge your two dataframes. melt flat your value columns (FeatureX). After merging, you have two columns values_x (features from df1) and values_y (features from df2) you need to compare.

Now, with query, keep rows where this 2 columns are equals. Then, use value_counts on (Fruit, Order) columns then reformat the dataframe with rename and reset_index. Finally, drop_duplicates on Fruit column to keep the first count, the highest value because the Matches column is already sorted.

You can execute this one-line step by step to see the transformation of the dataframe:

out = pd.merge(df1.melt(['Fruit', 'Site']),
               df2.melt(['Order', 'Site']),
               on=['Site', 'variable']) \
        .query('value_x == value_y') \
        .value_counts(['Fruit', 'Order']) \
        .rename('Matches') \
        .reset_index() \
        .drop_duplicates('Fruit')

Final output:

>>> out
     Fruit Order  Matches
0    Apple    XY        3
1   Banana    XY        3
6   Cherry    XY        2
7   Durian    YY        2
12   Grape    ZZ        1

Note: check carefully my result because it's not equal to your output.

Pandas Dataframe: how can i compare values in two columns of a row are equal to the ones in the same columns of a subsequent row?

You had the right idea about shifted comparison, but you need to shift backwards so you compare the current row with the next one. Finally use an all condition to enforce that ALL columns are equal in a row:

df['Validity'] = df[['Fruit', 'Color']].eq(df[['Fruit', 'Color']].shift(-1)).all(axis=1)

df
    Fruit   Color  Weight  Validity
0   apple     red      50      True
1   apple     red      75     False
2   apple   green      45     False
3  orange  orange      80      True
4  orange  orange      90     False
5  orange     red      90     False