Check If Each Row of a Data Frame Is Contained in Another Data Frame

Check if a row in one data frame exist in another data frame

You can use merge with parameter indicator, then remove column Rating and use numpy.where:

df = pd.merge(df1, df2, on=['User','Movie'], how='left', indicator='Exist')
df.drop('Rating', inplace=True, axis=1)
df['Exist'] = np.where(df.Exist == 'both', True, False)
print (df)
User Movie Exist
0 1 333 False
1 1 1193 True
2 1 3 False
3 2 433 False
4 3 54 True
5 3 343 False
6 3 76 True

Check if each row of a data frame is contained in another data frame

One way is to paste the rows together, and compare them with %in%. The result is a logical vector the length of nrow(df1), as requested.

do.call(paste0, df1) %in% do.call(paste0, df2)
# [1] TRUE TRUE TRUE

How does one check if all rows in a dataframe match another dataframe?

Is a Dataframe a subset of another:

You can try solving this using merge and then comparison.

The inner-join of the 2 dataframes would be the same as the smaller dataframe if the second one is a superset for the first.

import pandas as pd

# df1 - smaller dataframe, df2 - larger dataframe

df1 = pd.DataFrame({'A ': [1], ' B ': [2], ' C': [3]})
df2 = pd.DataFrame({'A ': [2, 3, 1, 5], ' B ': [5, 2, 2, 1], ' C': [5, 7, 3, 5]})

df1.merge(df2).shape == df1.shape
True

If you have duplicates, then drop duplicates first -

df1.merge(df2).drop_duplicates().shape == df1.drop_duplicates().shape

More details here.

Check if a row in one data frame exist in another data frame but do not merge both data frames

Idea is use indicator=True parameter for helper column _merge and for False for match compare for not equal both. If is omit on parameter is joined by intersection of columnsname in both dataFrames, here CHR, START and END.

df2['Pass_validation?'] = df2.merge(df_validation, 
indicator=True,
how='left')['_merge'].ne('both')
print (df2)
CHR START END Pass_validation?
0 1 1000 2000 False
1 2 1000 2000 True
2 3 1000 2000 True
3 4 1000 2000 True
4 5 1000 2000 True

Details:

print (df2.merge(df_validation, indicator=True, how='left'))
CHR START END _merge
0 1 1000 2000 both
1 2 1000 2000 left_only
2 3 1000 2000 left_only
3 4 1000 2000 left_only
4 5 1000 2000 left_only

Check if value from one dataframe exists in another dataframe

Use isin

Df1.name.isin(Df2.IDs).astype(int)

0 1
1 1
2 0
3 0
Name: name, dtype: int32

Show result in data frame

Df1.assign(InDf2=Df1.name.isin(Df2.IDs).astype(int))

name InDf2
0 Marc 1
1 Jake 1
2 Sam 0
3 Brad 0

In a Series object

pd.Series(Df1.name.isin(Df2.IDs).values.astype(int), Df1.name.values)

Marc 1
Jake 1
Sam 0
Brad 0
dtype: int32

Check if columns of one data frame are present in another data frame with non-zero element in R

We can use Map

  1. Loop over the 'indx1', 'indx2' columns of 'df' in Map
  2. Extract the corresponding columns of 'df1' - df1[[x]], df1[[y]]
  3. Create the multiple logical expression with > and &
  4. Check if there any TRUE value from the rows of 'df1'
  5. Coerce to binary (+( - or use as.integer)
  6. Convert the list output to a vector - unlist and assign it to create the 'count_occ' column in 'df'
df$count_occ <- unlist(Map(function(x, y) 
+(any(df1[[x]] > 0 & df1[[y]] > 0, na.rm = TRUE)), df$indx1, df$indx2))

-output

df
indx1 indx2 count_occ
1 aa 1 ac 0
2 ac tg 0 1

data

df <- structure(list(indx1 = c("aa 1", "ac"), indx2 = c("ac", "tg 0"
)), class = "data.frame", row.names = c(NA, -2L))

df1 <- structure(list(col1 = c("A", "B", "C", "D", "E"), `aa 1` = c(1L,
0L, 1L, 0L, 0L), `ab 2` = c(0L, 0L, 1L, 0L, 0L), ac = c(0L, 1L,
0L, 0L, 1L), `bd 5` = c(1L, 1L, 1L, 5L, 0L), `tg 0` = c(4L, 0L,
1L, 5L, 9L)), class = "data.frame", row.names = c(NA, -5L))

How to check if values in one dataframe exist in another dataframe in R?

Try this using %in% and a vector for all values:

#Code
df1$reply <- df1$user_name %in% c(df2$name,df2$organisation)

Output:

df1
id reply user_name
1 1 TRUE John
2 2 TRUE Amazon
3 3 FALSE Bob

Some data used:

#Data1
df1 <- structure(list(id = 1:3, reply = c(NA, NA, NA), user_name = c("John",
"Amazon", "Bob")), class = "data.frame", row.names = c(NA, -3L
))

#Data2
df2 <- structure(list(name = c("John", "Pat"), organisation = c("Amazon",
"Apple")), class = "data.frame", row.names = c(NA, -2L))

Find rows in a dataframe which contain all elements of a row of another dataframe

Using apply:

df1[ !apply(df1, 1, function(i) any(apply(df2, 1, function(j) all(j %in% i)))), ]
# X1 X2 X3
# 5 A C E
# 7 B C D

Do the similar loops for df2 match counts:

cbind(df2, 
cnt = apply(df2, 1, function(i) sum(apply(df1, 1, function(j) all(i %in% j)))))
# X1 X2 cnt
# 1 A B 3
# 2 A D 3

Check if data.frame is a subset of another data.frame

sapply(
chk,
function(v) {
sum(
rowSums(sapply(v$a, `==`, lkp$a) &
sapply(v$b, grepl, x = lkp$b)) > 0
) >= nrow(v)
}
)

or

sapply(
chk,
function(v) {
sum(
colSums(
do.call(
`&`,
Map(
function(x, y) outer(x, y, FUN = Vectorize(function(a, b) grepl(a, b))),
v,
lkp
)
)
) > 0
) >= nrow(v)
}
)

which gives

   c1    c2    c3    c4 
TRUE TRUE FALSE FALSE


Related Topics



Leave a reply



Submit