pandas get rows which are NOT in other dataframe
One method would be to store the result of an inner merge form both dfs, then we can simply select the rows when one column's values are not in this common:
In [119]:
common = df1.merge(df2,on=['col1','col2'])
print(common)
df1[(~df1.col1.isin(common.col1))&(~df1.col2.isin(common.col2))]
col1 col2
0 1 10
1 2 11
2 3 12
Out[119]:
col1 col2
3 4 13
4 5 14
EDIT
Another method as you've found is to use isin
which will produce NaN
rows which you can drop:
In [138]:
df1[~df1.isin(df2)].dropna()
Out[138]:
col1 col2
3 4 13
4 5 14
However if df2 does not start rows in the same manner then this won't work:
df2 = pd.DataFrame(data = {'col1' : [2, 3,4], 'col2' : [11, 12,13]})
will produce the entire df:
In [140]:
df1[~df1.isin(df2)].dropna()
Out[140]:
col1 col2
0 1 10
1 2 11
2 3 12
3 4 13
4 5 14
Identify records in data frame A not contained in data frame B
Here are a few ways. #1 and #4 assume that the rows of x.1
are unique. (If rows of x.1
are not unique then they will return only one of the duplicates among the duplicated rows.) The others return all duplicates:
# 1
x.1[!duplicated(rbind(x.2, x.1))[-(1:nrow(x.2))],]
# 2
do.call("rbind", setdiff(split(x.1, rownames(x.1)), split(x.2, rownames(x.2))))
# 3
x.1p <- do.call("paste", x.1)
x.2p <- do.call("paste", x.2)
x.1[! x.1p %in% x.2p, ]
# 4
library(sqldf)
sqldf("select * from `x.1` except select * from `x.2`")
EDIT: x.1 and x.2 were swapped and this has been fixed. Also have corrected note on limitations at the beginning.
Identify records NOT in another dataframe
I think there is an alternative way. If we set both columns as index we can use .isin
method to filter out what's needed:
data1.set_index(['id1', 'id2'], inplace=True)
data2.set_index(['id1', 'id2'], inplace=True)
data1[~data1.index.isin(data2.index)].reset_index()
Yields:
id1 id2 number
0 a z 0
Regardless of what you have in the number
.
Check if a row in one data frame exist in another data frame
You can use merge
with parameter indicator
, then remove column Rating
and use numpy.where
:
df = pd.merge(df1, df2, on=['User','Movie'], how='left', indicator='Exist')
df.drop('Rating', inplace=True, axis=1)
df['Exist'] = np.where(df.Exist == 'both', True, False)
print (df)
User Movie Exist
0 1 333 False
1 1 1193 True
2 1 3 False
3 2 433 False
4 3 54 True
5 3 343 False
6 3 76 True
How I can select rows from a dataframe that do not match?
If I understand correctly, you need the negation of the %in%
operator. Something like this should work:
subset(b, !(y %in% a$x))
> subset(b, !(y %in% a$x))
y
5 5
6 6
Search for does-not-contain on a DataFrame in pandas
You can use the invert (~) operator (which acts like a not for boolean data):
new_df = df[~df["col"].str.contains(word)]
where new_df
is the copy returned by RHS.
contains also accepts a regular expression...
If the above throws a ValueError or TypeError, the reason is likely because you have mixed datatypes, so use na=False
:
new_df = df[~df["col"].str.contains(word, na=False)]
Or,
new_df = df[df["col"].str.contains(word) == False]
Pandas: Find rows which don't exist in another DataFrame by multiple columns
Since 0.17.0
there is a new indicator
param you can pass to merge
which will tell you whether the rows are only present in left, right or both:
In [5]:
merged = df.merge(other, how='left', indicator=True)
merged
Out[5]:
col1 col2 extra_col _merge
0 0 a this left_only
1 1 b is both
2 1 c just left_only
3 2 b something left_only
In [6]:
merged[merged['_merge']=='left_only']
Out[6]:
col1 col2 extra_col _merge
0 0 a this left_only
2 1 c just left_only
3 2 b something left_only
So you can now filter the merged df by selecting only 'left_only'
rows
Pandas dataframe select rows where a list-column contains any of a list of strings
IIUC Re-create your df then using isin
with any
should be faster than apply
df[pd.DataFrame(df.species.tolist()).isin(selection).any(1).values]
Out[64]:
molecule species
0 a [dog]
2 c [cat, dog]
3 d [cat, horse, pig]
Related Topics
How to Not Show All Labels on Ggplot Axis
How to Resolve Spherical Geometry Failures When Joining Spatial Data
How to Cross-Paste All Combinations of Two Vectors (Each-To-Each)
Get Decision Tree Rule/Path Pattern for Every Row of Predicted Dataset for Rpart/Ctree Package in R
Calculate Number of Days Between Two Dates in R
Elegant Way to Select the Color for a Particular Segment of a Line Plot
Failure to Connect to Odbc Database in R
How to Fit a Very Wide Grid.Table or Tablegrob to Fit on a PDF Page
Two-Way Density Plot Combined with One Way Density Plot with Selected Regions in R
How to Interrupt a Running Code in R with a Keyboard Command
Format Numbers to Significant Figures Nicely in R
Edit Datatable in Shiny with Dropdown Selection for Factor Variables
How to Detect Free Variable Names in R Functions
Mutate Multiple Variable to Create Multiple New Variables