Check Whether Values in One Data Frame Column Exist in a Second Data Frame

Check whether values in one data frame column exist in a second data frame

Use %in% as follows

A$C %in% B$C

Which will tell you which values of column C of A are in B.

What is returned is a logical vector. In the specific case of your example, you get:

A$C %in% B$C
# [1]  TRUE FALSE  TRUE  TRUE

Which you can use as an index to the rows of A or as an index to A$C to get the actual values:

# as a row index
A[A$C %in% B$C,  ]  # note the comma to indicate we are indexing rows

# as an index to A$C
A$C[A$C %in% B$C]
[1] 1 3 4  # returns all values of A$C that are in B$C

We can negate it too:

A$C[!A$C %in% B$C]
[1] 2   # returns all values of A$C that are NOT in B$C

If you want to know if a specific value is in B$C, use the same function:

  2 %in% B$C   # "is the value 2 in B$C ?"  
  # FALSE

  A$C[2] %in% B$C  # "is the 2nd element of A$C in B$C ?"  
  # FALSE

Check if columns of one data frame are present in another data frame with non-zero element in R

We can use Map

Loop over the 'indx1', 'indx2' columns of 'df' in Map
Extract the corresponding columns of 'df1' - df1[[x]], df1[[y]]
Create the multiple logical expression with > and &
Check if there any TRUE value from the rows of 'df1'
Coerce to binary (+( - or use as.integer)
Convert the list output to a vector - unlist and assign it to create the 'count_occ' column in 'df'

df$count_occ <- unlist(Map(function(x, y) 
      +(any(df1[[x]] > 0 & df1[[y]] > 0, na.rm = TRUE)), df$indx1, df$indx2))

-output

df
  indx1 indx2 count_occ
1  aa 1    ac         0
2    ac  tg 0         1

data

df <- structure(list(indx1 = c("aa 1", "ac"), indx2 = c("ac", "tg 0"
)), class = "data.frame", row.names = c(NA, -2L))

df1 <- structure(list(col1 = c("A", "B", "C", "D", "E"), `aa 1` = c(1L, 
0L, 1L, 0L, 0L), `ab 2` = c(0L, 0L, 1L, 0L, 0L), ac = c(0L, 1L, 
0L, 0L, 1L), `bd 5` = c(1L, 1L, 1L, 5L, 0L), `tg 0` = c(4L, 0L, 
1L, 5L, 9L)), class = "data.frame", row.names = c(NA, -5L))

How to check if values in one dataframe exist in another dataframe in R?

Try this using %in% and a vector for all values:

#Code
df1$reply <- df1$user_name %in% c(df2$name,df2$organisation)

Output:

df1
  id reply user_name
1  1  TRUE      John
2  2  TRUE    Amazon
3  3 FALSE       Bob

Some data used:

#Data1
df1 <- structure(list(id = 1:3, reply = c(NA, NA, NA), user_name = c("John", 
"Amazon", "Bob")), class = "data.frame", row.names = c(NA, -3L
))

#Data2
df2 <- structure(list(name = c("John", "Pat"), organisation = c("Amazon", 
"Apple")), class = "data.frame", row.names = c(NA, -2L))

Check if value from one dataframe exists in another dataframe

Use isin

Df1.name.isin(Df2.IDs).astype(int)

0    1
1    1
2    0
3    0
Name: name, dtype: int32

Show result in data frame

Df1.assign(InDf2=Df1.name.isin(Df2.IDs).astype(int))

   name  InDf2
0  Marc      1
1  Jake      1
2   Sam      0
3  Brad      0

In a Series object

pd.Series(Df1.name.isin(Df2.IDs).values.astype(int), Df1.name.values)

Marc    1
Jake    1
Sam     0
Brad    0
dtype: int32

Check if values of one dataframe exist in another dataframe in exact order

We may also do this with mget to return a list of data.frames, bind them together, and do a group by mean of logical vector

library(dplyr)
mget(ls(pattern = '^Reference_[A-Z]$')) %>%
    bind_rows() %>% 
    bind_cols(df1) %>% 
    group_by(group, type = type...1) %>% 
    summarise(score = mean(value...2 == value...5))
# Groups:   group [2]
#  group type  score
#  <int> <chr> <dbl>
#1     1 A     1    
#2     2 B     0    
#3     2 C     0.667

Check if value from one dataframe exists in another dataframe in R

Using the same data and outcome as the original Python example

Df1 <- data.frame(name =  c('Marc', 'Jake', 'Sam', 'Brad'))
Df2 <- data.frame(IDs = c('Jake', 'John', 'Marc', 'Tony', 'Bob'))
Df1$presentinDf2 <- as.integer(Df1$name %in% Df2$IDs)
Df1
#>   name presentinDf2
#> 1 Marc            1
#> 2 Jake            1
#> 3  Sam            0
#> 4 Brad            0

How do I check if pandas df column value exists based on value in another column?

Compare Year for 2018 and then test if all values are only 2018:

mask = df['Year'].eq(2018).groupby(df['ID']).transform('all')

Another idea is test if Year is not 2018, filter ID for not matched at least one non 2018 row and last invert mask by ~ for get only 2018 groups:

mask = ~df['ID'].isin(df.loc[df['Year'].ne(2018), 'ID'])

Last convert mask to integers:

df['ID_only_in_2018'] = mask.astype(int)

Or:

df['ID_only_in_2018'] = np.where(mask, 1, 0)

Or:

df['ID_only_in_2018'] = mask.view('i1')

print (df)
   Year  ID  Value  ID_only_in_2018
0  2016   1    100                0
1  2017   1    102                0
2  2017   1    105                0
3  2018   1     98                0
4  2016   2    121                0
5  2016   2    101                0
6  2016   2    133                0
7  2018   3    102                1

Check if a row in one data frame exist in another data frame but do not merge both data frames

Idea is use indicator=True parameter for helper column _merge and for False for match compare for not equal both. If is omit on parameter is joined by intersection of columnsname in both dataFrames, here CHR, START and END.

df2['Pass_validation?'] = df2.merge(df_validation, 
                                    indicator=True, 
                                    how='left')['_merge'].ne('both')
print (df2)
    CHR  START  END  Pass_validation?
0  1    1000   2000             False
1  2    1000   2000              True
2  3    1000   2000              True
3  4    1000   2000              True
4  5    1000   2000              True

Details:

print (df2.merge(df_validation, indicator=True, how='left'))
    CHR  START  END     _merge
0  1    1000   2000       both
1  2    1000   2000  left_only
2  3    1000   2000  left_only
3  4    1000   2000  left_only
4  5    1000   2000  left_only

Check Whether Values in One Data Frame Column Exist in a Second Data Frame