Check If Each Row of a Data Frame Is Contained in Another Data Frame

Check if a row in one data frame exist in another data frame

You can use merge with parameter indicator, then remove column Rating and use numpy.where:

df = pd.merge(df1, df2, on=['User','Movie'], how='left', indicator='Exist')
df.drop('Rating', inplace=True, axis=1)
df['Exist'] = np.where(df.Exist == 'both', True, False)
print (df)
   User  Movie  Exist
0     1    333  False
1     1   1193   True
2     1      3  False
3     2    433  False
4     3     54   True
5     3    343  False
6     3     76   True

Check if each row of a data frame is contained in another data frame

One way is to paste the rows together, and compare them with %in%. The result is a logical vector the length of nrow(df1), as requested.

do.call(paste0, df1) %in% do.call(paste0, df2)
# [1] TRUE TRUE TRUE

How does one check if all rows in a dataframe match another dataframe?

Is a Dataframe a subset of another:

You can try solving this using merge and then comparison.

The inner-join of the 2 dataframes would be the same as the smaller dataframe if the second one is a superset for the first.

import pandas as pd

# df1 - smaller dataframe, df2 - larger dataframe

df1 = pd.DataFrame({'A ': [1], ' B ': [2], ' C': [3]})
df2 = pd.DataFrame({'A ': [2, 3, 1, 5], ' B ': [5, 2, 2, 1], ' C': [5, 7, 3, 5]})

df1.merge(df2).shape == df1.shape

True

If you have duplicates, then drop duplicates first -

df1.merge(df2).drop_duplicates().shape == df1.drop_duplicates().shape

More details here.

Check if a row in one data frame exist in another data frame but do not merge both data frames

Idea is use indicator=True parameter for helper column _merge and for False for match compare for not equal both. If is omit on parameter is joined by intersection of columnsname in both dataFrames, here CHR, START and END.

df2['Pass_validation?'] = df2.merge(df_validation, 
                                    indicator=True, 
                                    how='left')['_merge'].ne('both')
print (df2)
    CHR  START  END  Pass_validation?
0  1    1000   2000             False
1  2    1000   2000              True
2  3    1000   2000              True
3  4    1000   2000              True
4  5    1000   2000              True

Details:

print (df2.merge(df_validation, indicator=True, how='left'))
    CHR  START  END     _merge
0  1    1000   2000       both
1  2    1000   2000  left_only
2  3    1000   2000  left_only
3  4    1000   2000  left_only
4  5    1000   2000  left_only

Check if value from one dataframe exists in another dataframe

Use isin

Df1.name.isin(Df2.IDs).astype(int)

0    1
1    1
2    0
3    0
Name: name, dtype: int32

Show result in data frame

Df1.assign(InDf2=Df1.name.isin(Df2.IDs).astype(int))

   name  InDf2
0  Marc      1
1  Jake      1
2   Sam      0
3  Brad      0

In a Series object

pd.Series(Df1.name.isin(Df2.IDs).values.astype(int), Df1.name.values)

Marc    1
Jake    1
Sam     0
Brad    0
dtype: int32

Check if columns of one data frame are present in another data frame with non-zero element in R

We can use Map

Loop over the 'indx1', 'indx2' columns of 'df' in Map
Extract the corresponding columns of 'df1' - df1[[x]], df1[[y]]
Create the multiple logical expression with > and &
Check if there any TRUE value from the rows of 'df1'
Coerce to binary (+( - or use as.integer)
Convert the list output to a vector - unlist and assign it to create the 'count_occ' column in 'df'

df$count_occ <- unlist(Map(function(x, y) 
      +(any(df1[[x]] > 0 & df1[[y]] > 0, na.rm = TRUE)), df$indx1, df$indx2))

-output

df
  indx1 indx2 count_occ
1  aa 1    ac         0
2    ac  tg 0         1

data

df <- structure(list(indx1 = c("aa 1", "ac"), indx2 = c("ac", "tg 0"
)), class = "data.frame", row.names = c(NA, -2L))

df1 <- structure(list(col1 = c("A", "B", "C", "D", "E"), `aa 1` = c(1L, 
0L, 1L, 0L, 0L), `ab 2` = c(0L, 0L, 1L, 0L, 0L), ac = c(0L, 1L, 
0L, 0L, 1L), `bd 5` = c(1L, 1L, 1L, 5L, 0L), `tg 0` = c(4L, 0L, 
1L, 5L, 9L)), class = "data.frame", row.names = c(NA, -5L))

How to check if values in one dataframe exist in another dataframe in R?

Try this using %in% and a vector for all values:

#Code
df1$reply <- df1$user_name %in% c(df2$name,df2$organisation)

Output:

df1
  id reply user_name
1  1  TRUE      John
2  2  TRUE    Amazon
3  3 FALSE       Bob

Some data used:

#Data1
df1 <- structure(list(id = 1:3, reply = c(NA, NA, NA), user_name = c("John", 
"Amazon", "Bob")), class = "data.frame", row.names = c(NA, -3L
))

#Data2
df2 <- structure(list(name = c("John", "Pat"), organisation = c("Amazon", 
"Apple")), class = "data.frame", row.names = c(NA, -2L))

Find rows in a dataframe which contain all elements of a row of another dataframe

Using apply:

df1[ !apply(df1, 1, function(i) any(apply(df2, 1, function(j) all(j %in% i)))), ]
#   X1 X2 X3
# 5  A  C  E
# 7  B  C  D

Do the similar loops for df2 match counts:

cbind(df2, 
      cnt = apply(df2, 1, function(i) sum(apply(df1, 1, function(j) all(i %in% j)))))
#   X1 X2 cnt
# 1  A  B   3
# 2  A  D   3

Check if data.frame is a subset of another data.frame

sapply(
    chk,
    function(v) {
        sum(
            rowSums(sapply(v$a, `==`, lkp$a) &
                sapply(v$b, grepl, x = lkp$b)) > 0
        ) >= nrow(v)
    }
)

sapply(
    chk,
    function(v) {
        sum(
            colSums(
                do.call(
                    `&`,
                    Map(
                        function(x, y) outer(x, y, FUN = Vectorize(function(a, b) grepl(a, b))),
                        v,
                        lkp
                    )
                )
            ) > 0
        ) >= nrow(v)
    }
)

which gives

   c1    c2    c3    c4 
 TRUE  TRUE FALSE FALSE