Find Rows in a Data Frame Where Two Columns Are Equal

python pandas select rows where two columns are (not) equal

Ways to be confused by == versus != when comparing pd.Series

As expected

df[['Len_old', 'Len_new']].assign(NE=df.Len_old != df.Len_new)

Len_old Len_new NE
0 15 15 False
1 12 12 False
2 10 8 True
3 4 5 True
4 9 10 True

But if one of the column's values were strings!

df[['Len_old', 'Len_new']].assign(NE=df.Len_old.astype(str) != df.Len_new)

Len_old Len_new NE
0 15 15 True
1 12 12 True
2 10 8 True
3 4 5 True
4 9 10 True

Make sure both are the same types.

Pandas: select rows where two columns are different

I am a fan of readability, use query:

df.query('a != b')

Output:

   a  b   c
1 0 2 74
3 1 4 44

Pandas Identify Rows where two columns have same strings

One option to ignore the order between two columns is to sort each row within itself, which np.sort can do. Then you can form a new dataframe with these ordered rows. duplicated with keep=False will mark all the duplicated rows as True which we can use as a mask to index the original dataframe:

rows_sorted_df = pd.DataFrame(np.sort(df))
dups = rows_sorted_df.duplicated(keep=False)
result = df[dups]

to get

>>> rows_sorted_df

0 1
0 BOS SF
1 LA SF
2 LA NYC
3 BOS SF

>>> dups

0 True
1 False
2 False
3 True

>>> result

Source destination
0 BOS SF
3 SF BOS

Select rows that match values in multiple columns in pandas

TLDR

Use one of the following, based on your requirements:

df[(df[key_names] == keys).all(1)]

df[df[key_names].isin(keys).all(1)]

You're quite close, you have successfully created your mask, you just need to reduce it to a single dimension for indexing.

>>> df[key_names].isin(keys)
k1 k2
0 True False
1 True True
2 True False

You are only interested in rows where all values, are True, and so you can reduce the dimension using all across the first axis.

>>> df[key_names].isin(keys).all(1)
0 False
1 True
2 False
dtype: bool

The one caveat here is that isin is not order dependent, so you would get the same results using another ordering of your values.

>>> df[key_names].isin([5, 1]).all(1)
0 False
1 True
2 False
dtype: bool

If you only want an exact ordering match, use == for broadcasted comparison, instead of isin

>>> (df[key_names] == keys).all(1)
0 False
1 True
2 False
dtype: bool

>>> (df[key_names] == [5, 1]).all(1)
0 False
1 False
2 False
dtype: bool

The last step here is using the 1D mask you've created to index the original DataFrame:

>>> df[(df[key_names] == keys).all(1)]
k1 k2 v1 v2
1 1 5 5 6

Find rows in a data frame where two columns are equal


mteq <- mtcars[mtcars$gear==mtcars$carb, ]

Pandas - find rows with matching values in two columns and multiply value in another column

One way is to groupby A + C, take the product and count, filter out those that only have a single item in the group, then inner merge back on A + C to your original frame, eg:

df.merge(
df.groupby(['A', 'C']).D.agg(['prod', 'count'])
[lambda r: r['count'] > 1],
left_on=['A', 'C'],
right_index=True
)

Gives you:

     A   C  D  id  prod  count
0 foo 10 9 1 63 2
2 foo 10 7 3 63 2
4 foo 50 5 5 15 2
6 foo 50 3 7 15 2

Then drop/rename columns as appropriate.

pandas: get rows by comparing two columns of dataframe to list of tuples

Use DataFrame.merge with DataFrame created by tuples, there is no on parameter for default interecton of all columns in both DataFrames, here A and B:

df = my_df.merge(pd.DataFrame(my_tuples, columns=['A','B']))
print (df)
A B C D
0 0 1 1 2
1 4 5 1 2

Or:

df = my_df[my_df.set_index(['A','B']).index.isin(my_tuples)]
print (df)
A B C D
0 0 1 1 2
2 4 5 1 2

Find rows with similar values in another dataframe

This is a perfect use case for melt as starting point before merge your two dataframes. melt flat your value columns (FeatureX). After merging, you have two columns values_x (features from df1) and values_y (features from df2) you need to compare.

Now, with query, keep rows where this 2 columns are equals. Then, use value_counts on (Fruit, Order) columns then reformat the dataframe with rename and reset_index. Finally, drop_duplicates on Fruit column to keep the first count, the highest value because the Matches column is already sorted.

You can execute this one-line step by step to see the transformation of the dataframe:

out = pd.merge(df1.melt(['Fruit', 'Site']),
df2.melt(['Order', 'Site']),
on=['Site', 'variable']) \
.query('value_x == value_y') \
.value_counts(['Fruit', 'Order']) \
.rename('Matches') \
.reset_index() \
.drop_duplicates('Fruit')

Final output:

>>> out
Fruit Order Matches
0 Apple XY 3
1 Banana XY 3
6 Cherry XY 2
7 Durian YY 2
12 Grape ZZ 1

Note: check carefully my result because it's not equal to your output.

Pandas Dataframe: how can i compare values in two columns of a row are equal to the ones in the same columns of a subsequent row?

You had the right idea about shifted comparison, but you need to shift backwards so you compare the current row with the next one. Finally use an all condition to enforce that ALL columns are equal in a row:

df['Validity'] = df[['Fruit', 'Color']].eq(df[['Fruit', 'Color']].shift(-1)).all(axis=1)

df
Fruit Color Weight Validity
0 apple red 50 True
1 apple red 75 False
2 apple green 45 False
3 orange orange 80 True
4 orange orange 90 False
5 orange red 90 False


Related Topics



Leave a reply



Submit