python pandas select rows where two columns are (not) equal
Ways to be confused by ==
versus !=
when comparing pd.Series
As expected
df[['Len_old', 'Len_new']].assign(NE=df.Len_old != df.Len_new)
Len_old Len_new NE
0 15 15 False
1 12 12 False
2 10 8 True
3 4 5 True
4 9 10 True
But if one of the column's values were strings!
df[['Len_old', 'Len_new']].assign(NE=df.Len_old.astype(str) != df.Len_new)
Len_old Len_new NE
0 15 15 True
1 12 12 True
2 10 8 True
3 4 5 True
4 9 10 True
Make sure both are the same types.
Pandas: select rows where two columns are different
I am a fan of readability, use query
:
df.query('a != b')
Output:
a b c
1 0 2 74
3 1 4 44
Pandas Identify Rows where two columns have same strings
One option to ignore the order between two columns is to sort each row within itself, which np.sort
can do. Then you can form a new dataframe with these ordered rows. duplicated
with keep=False
will mark all the duplicated rows as True
which we can use as a mask to index the original dataframe:
rows_sorted_df = pd.DataFrame(np.sort(df))
dups = rows_sorted_df.duplicated(keep=False)
result = df[dups]
to get
>>> rows_sorted_df
0 1
0 BOS SF
1 LA SF
2 LA NYC
3 BOS SF
>>> dups
0 True
1 False
2 False
3 True
>>> result
Source destination
0 BOS SF
3 SF BOS
Select rows that match values in multiple columns in pandas
TLDR
Use one of the following, based on your requirements:
df[(df[key_names] == keys).all(1)]
df[df[key_names].isin(keys).all(1)]
You're quite close, you have successfully created your mask, you just need to reduce it to a single dimension for indexing.
>>> df[key_names].isin(keys)
k1 k2
0 True False
1 True True
2 True False
You are only interested in rows where all values, are True
, and so you can reduce the dimension using all
across the first axis.
>>> df[key_names].isin(keys).all(1)
0 False
1 True
2 False
dtype: bool
The one caveat here is that isin
is not order dependent, so you would get the same results using another ordering of your values.
>>> df[key_names].isin([5, 1]).all(1)
0 False
1 True
2 False
dtype: bool
If you only want an exact ordering match, use ==
for broadcasted comparison, instead of isin
>>> (df[key_names] == keys).all(1)
0 False
1 True
2 False
dtype: bool
>>> (df[key_names] == [5, 1]).all(1)
0 False
1 False
2 False
dtype: bool
The last step here is using the 1D
mask you've created to index the original DataFrame:
>>> df[(df[key_names] == keys).all(1)]
k1 k2 v1 v2
1 1 5 5 6
Find rows in a data frame where two columns are equal
mteq <- mtcars[mtcars$gear==mtcars$carb, ]
Pandas - find rows with matching values in two columns and multiply value in another column
One way is to groupby A + C, take the product and count, filter out those that only have a single item in the group, then inner merge back on A + C to your original frame, eg:
df.merge(
df.groupby(['A', 'C']).D.agg(['prod', 'count'])
[lambda r: r['count'] > 1],
left_on=['A', 'C'],
right_index=True
)
Gives you:
A C D id prod count
0 foo 10 9 1 63 2
2 foo 10 7 3 63 2
4 foo 50 5 5 15 2
6 foo 50 3 7 15 2
Then drop/rename columns as appropriate.
pandas: get rows by comparing two columns of dataframe to list of tuples
Use DataFrame.merge
with DataFrame
created by tuples, there is no on
parameter for default interecton of all columns in both DataFrames
, here A
and B
:
df = my_df.merge(pd.DataFrame(my_tuples, columns=['A','B']))
print (df)
A B C D
0 0 1 1 2
1 4 5 1 2
Or:
df = my_df[my_df.set_index(['A','B']).index.isin(my_tuples)]
print (df)
A B C D
0 0 1 1 2
2 4 5 1 2
Find rows with similar values in another dataframe
This is a perfect use case for melt
as starting point before merge
your two dataframes. melt
flat your value columns (FeatureX
). After merging, you have two columns values_x
(features from df1) and values_y
(features from df2) you need to compare.
Now, with query
, keep rows where this 2 columns are equals. Then, use value_counts
on (Fruit, Order)
columns then reformat the dataframe with rename
and reset_index
. Finally, drop_duplicates
on Fruit
column to keep the first count, the highest value because the Matches
column is already sorted.
You can execute this one-line step by step to see the transformation of the dataframe:
out = pd.merge(df1.melt(['Fruit', 'Site']),
df2.melt(['Order', 'Site']),
on=['Site', 'variable']) \
.query('value_x == value_y') \
.value_counts(['Fruit', 'Order']) \
.rename('Matches') \
.reset_index() \
.drop_duplicates('Fruit')
Final output:
>>> out
Fruit Order Matches
0 Apple XY 3
1 Banana XY 3
6 Cherry XY 2
7 Durian YY 2
12 Grape ZZ 1
Note: check carefully my result because it's not equal to your output.
Pandas Dataframe: how can i compare values in two columns of a row are equal to the ones in the same columns of a subsequent row?
You had the right idea about shifted comparison, but you need to shift backwards so you compare the current row with the next one. Finally use an all
condition to enforce that ALL columns are equal in a row:
df['Validity'] = df[['Fruit', 'Color']].eq(df[['Fruit', 'Color']].shift(-1)).all(axis=1)
df
Fruit Color Weight Validity
0 apple red 50 True
1 apple red 75 False
2 apple green 45 False
3 orange orange 80 True
4 orange orange 90 False
5 orange red 90 False
Related Topics
Read/Write Data in Libsvm Format
Subsetting a Dataframe for a Specified Month and Year
Use R Code or Windows User Variable ("%Userprofile%") in Yaml
Converting Factors to Binary in R
Calculate Group Mean While Excluding Current Observation Using Dplyr
Dt: Dynamically Change Column Values Based on Selectinput from Another Column in R Shiny App
Creating a Density Histogram in Ggplot2
Filling Missing Dates in a Grouped Time Series - a Tidyverse-Way
Add Text to Horizontal Barplot in R, Y-Axis at Different Scale
Insert a Blank Row After Each Group of Data
How to Sort All Dataframes in a List of Dataframes on the Same Column
Sort a String of Comma-Separated Items Alphabetically
How to Use Data.Table Within Functions and Loops
In R, How to Add a Max by Group
Create Counter of Consecutive Runs of a Certain Value
How to Split a Data Frame into Multiple Dataframes with Each Two Columns as a New Dataframe
Lda with Topicmodels, How to See Which Topics Different Documents Belong To
Conditional Binary Join and Update by Reference Using the Data.Table Package