How to Select Rows from a Dataframe That Do Not Match

How I can select rows from a dataframe that do not match?

If I understand correctly, you need the negation of the %in% operator. Something like this should work:

subset(b, !(y %in% a$x))

> subset(b, !(y %in% a$x))
y
5 5
6 6

How I can select rows from a dataframe that do not match BUT group by id?

We could use anti_join

library(dplyr)
anti_join(a, b, by = c("a_id" = "b_id", "a_no" = "b_no"))
# a_id a_no
#1 2 2
#2 3 1

Or with data.table

library(data.table)
setDT(a)[!b, on = .(a_id = b_id, a_no = b_no)]

How to subset dataframe based on values that do not match in two columns in R?

You may use %in% to select rows where response_id is present in tweet_id.

subset(df, response_id %in% unique(tweet_id))

# tweet_id response_id time
#1 1 2 22:10:47
#2 3 1 22:08:27
#3 4 3 21:54:49
#4 5 4 21:49:35
#5 6 5 21:46:23
#6 8 6 21:30:45

If you want to use dplyr

library(dplyr)
df %>% filter(response_id %in% unique(tweet_id))

pandas get rows which are NOT in other dataframe

One method would be to store the result of an inner merge form both dfs, then we can simply select the rows when one column's values are not in this common:

In [119]:

common = df1.merge(df2,on=['col1','col2'])
print(common)
df1[(~df1.col1.isin(common.col1))&(~df1.col2.isin(common.col2))]
col1 col2
0 1 10
1 2 11
2 3 12
Out[119]:
col1 col2
3 4 13
4 5 14

EDIT

Another method as you've found is to use isin which will produce NaN rows which you can drop:

In [138]:

df1[~df1.isin(df2)].dropna()
Out[138]:
col1 col2
3 4 13
4 5 14

However if df2 does not start rows in the same manner then this won't work:

df2 = pd.DataFrame(data = {'col1' : [2, 3,4], 'col2' : [11, 12,13]})

will produce the entire df:

In [140]:

df1[~df1.isin(df2)].dropna()
Out[140]:
col1 col2
0 1 10
1 2 11
2 3 12
3 4 13
4 5 14

Pandas: select DataFrame matches and does not match condition

Lets call your dataframes df1 and df2. I am assuming both have the same length.

import numpy as np
import pandas as pd

mask1 = np.select([df2['Type']==1, [True], default=False) #this creates a pandas series with True values were df2['Type'] is 1 and False everywhere else.

#### Your first dataframe would be
df1[mask1]
#### Your second dataframe is
df1[~mask1]

# if you want to save them as dataframes best use .copy() i.e.:
df_type1 = df1[mask1].copy()
df_typeNot1 = df1[~mask1].copy()

How to select rows from a dataframe were any value is not equal to a specified value

Look at this mask

In [88]: df.values != 1
Out[88]:
array([[ True, True, True, False],
[False, True, True, True],
[ True, True, True, True]])

Slicing base on numpy convention. Every True will be repeated, so you have repeated rows in the output. You need additional all to check each row on all True and return a single True/False for each row.

df[(df.values != 1).all(1)]

Out[87]:
c1 c2 c3 c4
2 2 3 NaN 3

Note: my intention is reusing your code, so I didn't change it. While the concise code should be.

df[(df != 1).all(1)]

or

df[df.ne(1).all(1)]

python pandas select rows where two columns are (not) equal

Ways to be confused by == versus != when comparing pd.Series

As expected

df[['Len_old', 'Len_new']].assign(NE=df.Len_old != df.Len_new)

Len_old Len_new NE
0 15 15 False
1 12 12 False
2 10 8 True
3 4 5 True
4 9 10 True

But if one of the column's values were strings!

df[['Len_old', 'Len_new']].assign(NE=df.Len_old.astype(str) != df.Len_new)

Len_old Len_new NE
0 15 15 True
1 12 12 True
2 10 8 True
3 4 5 True
4 9 10 True

Make sure both are the same types.



Related Topics



Leave a reply



Submit