Check Whether Values in One Data Frame Column Exist in a Second Data Frame

Check whether values in one data frame column exist in a second data frame

Use %in% as follows

A$C %in% B$C

Which will tell you which values of column C of A are in B.

What is returned is a logical vector. In the specific case of your example, you get:

A$C %in% B$C
# [1] TRUE FALSE TRUE TRUE

Which you can use as an index to the rows of A or as an index to A$C to get the actual values:

# as a row index
A[A$C %in% B$C, ] # note the comma to indicate we are indexing rows

# as an index to A$C
A$C[A$C %in% B$C]
[1] 1 3 4 # returns all values of A$C that are in B$C

We can negate it too:

A$C[!A$C %in% B$C]
[1] 2   # returns all values of A$C that are NOT in B$C


If you want to know if a specific value is in B$C, use the same function:

  2 %in% B$C   # "is the value 2 in B$C ?"  
# FALSE

A$C[2] %in% B$C # "is the 2nd element of A$C in B$C ?"
# FALSE

Check if columns of one data frame are present in another data frame with non-zero element in R

We can use Map

  1. Loop over the 'indx1', 'indx2' columns of 'df' in Map
  2. Extract the corresponding columns of 'df1' - df1[[x]], df1[[y]]
  3. Create the multiple logical expression with > and &
  4. Check if there any TRUE value from the rows of 'df1'
  5. Coerce to binary (+( - or use as.integer)
  6. Convert the list output to a vector - unlist and assign it to create the 'count_occ' column in 'df'
df$count_occ <- unlist(Map(function(x, y) 
+(any(df1[[x]] > 0 & df1[[y]] > 0, na.rm = TRUE)), df$indx1, df$indx2))

-output

df
indx1 indx2 count_occ
1 aa 1 ac 0
2 ac tg 0 1

data

df <- structure(list(indx1 = c("aa 1", "ac"), indx2 = c("ac", "tg 0"
)), class = "data.frame", row.names = c(NA, -2L))

df1 <- structure(list(col1 = c("A", "B", "C", "D", "E"), `aa 1` = c(1L,
0L, 1L, 0L, 0L), `ab 2` = c(0L, 0L, 1L, 0L, 0L), ac = c(0L, 1L,
0L, 0L, 1L), `bd 5` = c(1L, 1L, 1L, 5L, 0L), `tg 0` = c(4L, 0L,
1L, 5L, 9L)), class = "data.frame", row.names = c(NA, -5L))

How to check if values in one dataframe exist in another dataframe in R?

Try this using %in% and a vector for all values:

#Code
df1$reply <- df1$user_name %in% c(df2$name,df2$organisation)

Output:

df1
id reply user_name
1 1 TRUE John
2 2 TRUE Amazon
3 3 FALSE Bob

Some data used:

#Data1
df1 <- structure(list(id = 1:3, reply = c(NA, NA, NA), user_name = c("John",
"Amazon", "Bob")), class = "data.frame", row.names = c(NA, -3L
))

#Data2
df2 <- structure(list(name = c("John", "Pat"), organisation = c("Amazon",
"Apple")), class = "data.frame", row.names = c(NA, -2L))

Check if value from one dataframe exists in another dataframe

Use isin

Df1.name.isin(Df2.IDs).astype(int)

0 1
1 1
2 0
3 0
Name: name, dtype: int32

Show result in data frame

Df1.assign(InDf2=Df1.name.isin(Df2.IDs).astype(int))

name InDf2
0 Marc 1
1 Jake 1
2 Sam 0
3 Brad 0

In a Series object

pd.Series(Df1.name.isin(Df2.IDs).values.astype(int), Df1.name.values)

Marc 1
Jake 1
Sam 0
Brad 0
dtype: int32

Check if values of one dataframe exist in another dataframe in exact order

We may also do this with mget to return a list of data.frames, bind them together, and do a group by mean of logical vector

library(dplyr)
mget(ls(pattern = '^Reference_[A-Z]$')) %>%
bind_rows() %>%
bind_cols(df1) %>%
group_by(group, type = type...1) %>%
summarise(score = mean(value...2 == value...5))
# Groups: group [2]
# group type score
# <int> <chr> <dbl>
#1 1 A 1
#2 2 B 0
#3 2 C 0.667

Check if value from one dataframe exists in another dataframe in R

Using the same data and outcome as the original Python example

Df1 <- data.frame(name =  c('Marc', 'Jake', 'Sam', 'Brad'))
Df2 <- data.frame(IDs = c('Jake', 'John', 'Marc', 'Tony', 'Bob'))
Df1$presentinDf2 <- as.integer(Df1$name %in% Df2$IDs)
Df1
#> name presentinDf2
#> 1 Marc 1
#> 2 Jake 1
#> 3 Sam 0
#> 4 Brad 0

How do I check if pandas df column value exists based on value in another column?

Compare Year for 2018 and then test if all values are only 2018:

mask = df['Year'].eq(2018).groupby(df['ID']).transform('all')

Another idea is test if Year is not 2018, filter ID for not matched at least one non 2018 row and last invert mask by ~ for get only 2018 groups:

mask = ~df['ID'].isin(df.loc[df['Year'].ne(2018), 'ID'])

Last convert mask to integers:

df['ID_only_in_2018'] = mask.astype(int)

Or:

df['ID_only_in_2018'] = np.where(mask, 1, 0)

Or:

df['ID_only_in_2018'] = mask.view('i1')


print (df)
Year ID Value ID_only_in_2018
0 2016 1 100 0
1 2017 1 102 0
2 2017 1 105 0
3 2018 1 98 0
4 2016 2 121 0
5 2016 2 101 0
6 2016 2 133 0
7 2018 3 102 1

Check if a row in one data frame exist in another data frame but do not merge both data frames

Idea is use indicator=True parameter for helper column _merge and for False for match compare for not equal both. If is omit on parameter is joined by intersection of columnsname in both dataFrames, here CHR, START and END.

df2['Pass_validation?'] = df2.merge(df_validation, 
indicator=True,
how='left')['_merge'].ne('both')
print (df2)
CHR START END Pass_validation?
0 1 1000 2000 False
1 2 1000 2000 True
2 3 1000 2000 True
3 4 1000 2000 True
4 5 1000 2000 True

Details:

print (df2.merge(df_validation, indicator=True, how='left'))
CHR START END _merge
0 1 1000 2000 both
1 2 1000 2000 left_only
2 3 1000 2000 left_only
3 4 1000 2000 left_only
4 5 1000 2000 left_only


Related Topics



Leave a reply



Submit