How I can select rows from a dataframe that do not match?
If I understand correctly, you need the negation of the %in%
operator. Something like this should work:
subset(b, !(y %in% a$x))
> subset(b, !(y %in% a$x))
y
5 5
6 6
How I can select rows from a dataframe that do not match BUT group by id?
We could use anti_join
library(dplyr)
anti_join(a, b, by = c("a_id" = "b_id", "a_no" = "b_no"))
# a_id a_no
#1 2 2
#2 3 1
Or with data.table
library(data.table)
setDT(a)[!b, on = .(a_id = b_id, a_no = b_no)]
How to subset dataframe based on values that do not match in two columns in R?
You may use %in%
to select rows where response_id
is present in tweet_id
.
subset(df, response_id %in% unique(tweet_id))
# tweet_id response_id time
#1 1 2 22:10:47
#2 3 1 22:08:27
#3 4 3 21:54:49
#4 5 4 21:49:35
#5 6 5 21:46:23
#6 8 6 21:30:45
If you want to use dplyr
library(dplyr)
df %>% filter(response_id %in% unique(tweet_id))
pandas get rows which are NOT in other dataframe
One method would be to store the result of an inner merge form both dfs, then we can simply select the rows when one column's values are not in this common:
In [119]:
common = df1.merge(df2,on=['col1','col2'])
print(common)
df1[(~df1.col1.isin(common.col1))&(~df1.col2.isin(common.col2))]
col1 col2
0 1 10
1 2 11
2 3 12
Out[119]:
col1 col2
3 4 13
4 5 14
EDIT
Another method as you've found is to use isin
which will produce NaN
rows which you can drop:
In [138]:
df1[~df1.isin(df2)].dropna()
Out[138]:
col1 col2
3 4 13
4 5 14
However if df2 does not start rows in the same manner then this won't work:
df2 = pd.DataFrame(data = {'col1' : [2, 3,4], 'col2' : [11, 12,13]})
will produce the entire df:
In [140]:
df1[~df1.isin(df2)].dropna()
Out[140]:
col1 col2
0 1 10
1 2 11
2 3 12
3 4 13
4 5 14
Pandas: select DataFrame matches and does not match condition
Lets call your dataframes df1
and df2
. I am assuming both have the same length.
import numpy as np
import pandas as pd
mask1 = np.select([df2['Type']==1, [True], default=False) #this creates a pandas series with True values were df2['Type'] is 1 and False everywhere else.
#### Your first dataframe would be
df1[mask1]
#### Your second dataframe is
df1[~mask1]
# if you want to save them as dataframes best use .copy() i.e.:
df_type1 = df1[mask1].copy()
df_typeNot1 = df1[~mask1].copy()
How to select rows from a dataframe were any value is not equal to a specified value
Look at this mask
In [88]: df.values != 1
Out[88]:
array([[ True, True, True, False],
[False, True, True, True],
[ True, True, True, True]])
Slicing base on numpy convention. Every True
will be repeated, so you have repeated rows in the output. You need additional all
to check each row on all True
and return a single True/False for each row.
df[(df.values != 1).all(1)]
Out[87]:
c1 c2 c3 c4
2 2 3 NaN 3
Note: my intention is reusing your code, so I didn't change it. While the concise code should be.
df[(df != 1).all(1)]
or
df[df.ne(1).all(1)]
python pandas select rows where two columns are (not) equal
Ways to be confused by ==
versus !=
when comparing pd.Series
As expected
df[['Len_old', 'Len_new']].assign(NE=df.Len_old != df.Len_new)
Len_old Len_new NE
0 15 15 False
1 12 12 False
2 10 8 True
3 4 5 True
4 9 10 True
But if one of the column's values were strings!
df[['Len_old', 'Len_new']].assign(NE=df.Len_old.astype(str) != df.Len_new)
Len_old Len_new NE
0 15 15 True
1 12 12 True
2 10 8 True
3 4 5 True
4 9 10 True
Make sure both are the same types.
Related Topics
Extract Every Nth Element of a Vector
Ggplot2 Heatmap with Colors for Ranged Values
Avoid Ggplot Sorting the X-Axis While Plotting Geom_Bar()
What Does the Capital Letter "I" in R Linear Regression Formula Mean
How to Create Grouped Barplot with R
R - Add Column That Counts Sequentially Within Groups But Repeats for Duplicates
More Than One Value for "Each" Argument in "Rep" Function
Scraping a Dynamic Ecommerce Page with Infinite Scroll
Remove Rows in R Matrix Where All Data Is Na
Cut Function in R- Labeling Without Scientific Notations for Use in Ggplot2
Select Only the First Row When Merging Data Frames with Multiple Matches
How Do Keep Only Unique Words Within Each String in a Vector
Ggplot2, Facet_Grid, Free Scales
Set Locale to System Default Utf-8
Label and Color Leaf Dendrogram
Check If Point Is in Spatial Object Which Consists of Multiple Polygons/Holes