Check whether values in one data frame column exist in a second data frame
Use %in%
as follows
A$C %in% B$C
Which will tell you which values of column C of A are in B.
What is returned is a logical vector. In the specific case of your example, you get:
A$C %in% B$C
# [1] TRUE FALSE TRUE TRUE
Which you can use as an index to the rows of A
or as an index to A$C
to get the actual values:
# as a row index
A[A$C %in% B$C, ] # note the comma to indicate we are indexing rows
# as an index to A$C
A$C[A$C %in% B$C]
[1] 1 3 4 # returns all values of A$C that are in B$C
We can negate it too:
A$C[!A$C %in% B$C]
[1] 2 # returns all values of A$C that are NOT in B$C
If you want to know if a specific value is in B$C, use the same function:
2 %in% B$C # "is the value 2 in B$C ?"
# FALSE
A$C[2] %in% B$C # "is the 2nd element of A$C in B$C ?"
# FALSE
Check if columns of one data frame are present in another data frame with non-zero element in R
We can use Map
- Loop over the 'indx1', 'indx2' columns of 'df' in
Map
- Extract the corresponding columns of 'df1' -
df1[[x]]
,df1[[y]]
- Create the multiple logical expression with
>
and&
- Check if there
any
TRUE
value from the rows of 'df1' - Coerce to binary (
+(
- or useas.integer
) - Convert the
list
output to avector
-unlist
and assign it to create the 'count_occ' column in 'df'
df$count_occ <- unlist(Map(function(x, y)
+(any(df1[[x]] > 0 & df1[[y]] > 0, na.rm = TRUE)), df$indx1, df$indx2))
-output
df
indx1 indx2 count_occ
1 aa 1 ac 0
2 ac tg 0 1
data
df <- structure(list(indx1 = c("aa 1", "ac"), indx2 = c("ac", "tg 0"
)), class = "data.frame", row.names = c(NA, -2L))
df1 <- structure(list(col1 = c("A", "B", "C", "D", "E"), `aa 1` = c(1L,
0L, 1L, 0L, 0L), `ab 2` = c(0L, 0L, 1L, 0L, 0L), ac = c(0L, 1L,
0L, 0L, 1L), `bd 5` = c(1L, 1L, 1L, 5L, 0L), `tg 0` = c(4L, 0L,
1L, 5L, 9L)), class = "data.frame", row.names = c(NA, -5L))
How to check if values in one dataframe exist in another dataframe in R?
Try this using %in%
and a vector for all values:
#Code
df1$reply <- df1$user_name %in% c(df2$name,df2$organisation)
Output:
df1
id reply user_name
1 1 TRUE John
2 2 TRUE Amazon
3 3 FALSE Bob
Some data used:
#Data1
df1 <- structure(list(id = 1:3, reply = c(NA, NA, NA), user_name = c("John",
"Amazon", "Bob")), class = "data.frame", row.names = c(NA, -3L
))
#Data2
df2 <- structure(list(name = c("John", "Pat"), organisation = c("Amazon",
"Apple")), class = "data.frame", row.names = c(NA, -2L))
Check if value from one dataframe exists in another dataframe
Use isin
Df1.name.isin(Df2.IDs).astype(int)
0 1
1 1
2 0
3 0
Name: name, dtype: int32
Show result in data frame
Df1.assign(InDf2=Df1.name.isin(Df2.IDs).astype(int))
name InDf2
0 Marc 1
1 Jake 1
2 Sam 0
3 Brad 0
In a Series object
pd.Series(Df1.name.isin(Df2.IDs).values.astype(int), Df1.name.values)
Marc 1
Jake 1
Sam 0
Brad 0
dtype: int32
Check if values of one dataframe exist in another dataframe in exact order
We may also do this with mget
to return a list
of data.frames
, bind them together, and do a group by mean
of logical vector
library(dplyr)
mget(ls(pattern = '^Reference_[A-Z]$')) %>%
bind_rows() %>%
bind_cols(df1) %>%
group_by(group, type = type...1) %>%
summarise(score = mean(value...2 == value...5))
# Groups: group [2]
# group type score
# <int> <chr> <dbl>
#1 1 A 1
#2 2 B 0
#3 2 C 0.667
Check if value from one dataframe exists in another dataframe in R
Using the same data and outcome as the original Python example
Df1 <- data.frame(name = c('Marc', 'Jake', 'Sam', 'Brad'))
Df2 <- data.frame(IDs = c('Jake', 'John', 'Marc', 'Tony', 'Bob'))
Df1$presentinDf2 <- as.integer(Df1$name %in% Df2$IDs)
Df1
#> name presentinDf2
#> 1 Marc 1
#> 2 Jake 1
#> 3 Sam 0
#> 4 Brad 0
How do I check if pandas df column value exists based on value in another column?
Compare Year for 2018
and then test if all values are only 2018
:
mask = df['Year'].eq(2018).groupby(df['ID']).transform('all')
Another idea is test if Year is not 2018
, filter ID
for not matched at least one non 2018
row and last invert mask by ~
for get only 2018
groups:
mask = ~df['ID'].isin(df.loc[df['Year'].ne(2018), 'ID'])
Last convert mask to integers:
df['ID_only_in_2018'] = mask.astype(int)
Or:
df['ID_only_in_2018'] = np.where(mask, 1, 0)
Or:
df['ID_only_in_2018'] = mask.view('i1')
print (df)
Year ID Value ID_only_in_2018
0 2016 1 100 0
1 2017 1 102 0
2 2017 1 105 0
3 2018 1 98 0
4 2016 2 121 0
5 2016 2 101 0
6 2016 2 133 0
7 2018 3 102 1
Check if a row in one data frame exist in another data frame but do not merge both data frames
Idea is use indicator=True
parameter for helper column _merge
and for False
for match compare for not equal both
. If is omit on
parameter is joined by intersection of columnsname in both dataFrames, here CHR
, START
and END
.
df2['Pass_validation?'] = df2.merge(df_validation,
indicator=True,
how='left')['_merge'].ne('both')
print (df2)
CHR START END Pass_validation?
0 1 1000 2000 False
1 2 1000 2000 True
2 3 1000 2000 True
3 4 1000 2000 True
4 5 1000 2000 True
Details:
print (df2.merge(df_validation, indicator=True, how='left'))
CHR START END _merge
0 1 1000 2000 both
1 2 1000 2000 left_only
2 3 1000 2000 left_only
3 4 1000 2000 left_only
4 5 1000 2000 left_only
Related Topics
Change Background and Text of Strips Associated to Multiple Panels in R/Lattice
Calculating Mean for Every N Values from a Vector
Find K Nearest Neighbors, Starting from a Distance Matrix
Find Start and End Positions/Indices of Runs/Consecutive Values
What's the Difference Between Integer Class and Numeric Class in R
Differencebetween Gc() and Rm()
How to Save Data File into .Rdata
How to Sort a Data Frame by Date
Command Lines Error in Rstudio Console
Splitting a Data.Frame by a Variable
Merge Dataframes of Different Sizes
Reshape from Long to Wide and Create Columns with Binary Value
Convert from Billion to Million and Vice Versa
Expand Spacing Between Tick Marks on X Axis
How to Produce Different Geom_Vline in Different Facets in R