Check if a row in one data frame exist in another data frame
You can use merge
with parameter indicator
, then remove column Rating
and use numpy.where
:
df = pd.merge(df1, df2, on=['User','Movie'], how='left', indicator='Exist')
df.drop('Rating', inplace=True, axis=1)
df['Exist'] = np.where(df.Exist == 'both', True, False)
print (df)
User Movie Exist
0 1 333 False
1 1 1193 True
2 1 3 False
3 2 433 False
4 3 54 True
5 3 343 False
6 3 76 True
Check if each row of a data frame is contained in another data frame
One way is to paste the rows together, and compare them with %in%
. The result is a logical vector the length of nrow(df1)
, as requested.
do.call(paste0, df1) %in% do.call(paste0, df2)
# [1] TRUE TRUE TRUE
How does one check if all rows in a dataframe match another dataframe?
Is a Dataframe a subset of another:
You can try solving this using merge and then comparison.
The inner-join of the 2 dataframes would be the same as the smaller dataframe if the second one is a superset for the first.
import pandas as pd
# df1 - smaller dataframe, df2 - larger dataframe
df1 = pd.DataFrame({'A ': [1], ' B ': [2], ' C': [3]})
df2 = pd.DataFrame({'A ': [2, 3, 1, 5], ' B ': [5, 2, 2, 1], ' C': [5, 7, 3, 5]})
df1.merge(df2).shape == df1.shape
True
If you have duplicates, then drop duplicates first -
df1.merge(df2).drop_duplicates().shape == df1.drop_duplicates().shape
More details here.
Check if a row in one data frame exist in another data frame but do not merge both data frames
Idea is use indicator=True
parameter for helper column _merge
and for False
for match compare for not equal both
. If is omit on
parameter is joined by intersection of columnsname in both dataFrames, here CHR
, START
and END
.
df2['Pass_validation?'] = df2.merge(df_validation,
indicator=True,
how='left')['_merge'].ne('both')
print (df2)
CHR START END Pass_validation?
0 1 1000 2000 False
1 2 1000 2000 True
2 3 1000 2000 True
3 4 1000 2000 True
4 5 1000 2000 True
Details:
print (df2.merge(df_validation, indicator=True, how='left'))
CHR START END _merge
0 1 1000 2000 both
1 2 1000 2000 left_only
2 3 1000 2000 left_only
3 4 1000 2000 left_only
4 5 1000 2000 left_only
Check if value from one dataframe exists in another dataframe
Use isin
Df1.name.isin(Df2.IDs).astype(int)
0 1
1 1
2 0
3 0
Name: name, dtype: int32
Show result in data frame
Df1.assign(InDf2=Df1.name.isin(Df2.IDs).astype(int))
name InDf2
0 Marc 1
1 Jake 1
2 Sam 0
3 Brad 0
In a Series object
pd.Series(Df1.name.isin(Df2.IDs).values.astype(int), Df1.name.values)
Marc 1
Jake 1
Sam 0
Brad 0
dtype: int32
Check if columns of one data frame are present in another data frame with non-zero element in R
We can use Map
- Loop over the 'indx1', 'indx2' columns of 'df' in
Map
- Extract the corresponding columns of 'df1' -
df1[[x]]
,df1[[y]]
- Create the multiple logical expression with
>
and&
- Check if there
any
TRUE
value from the rows of 'df1' - Coerce to binary (
+(
- or useas.integer
) - Convert the
list
output to avector
-unlist
and assign it to create the 'count_occ' column in 'df'
df$count_occ <- unlist(Map(function(x, y)
+(any(df1[[x]] > 0 & df1[[y]] > 0, na.rm = TRUE)), df$indx1, df$indx2))
-output
df
indx1 indx2 count_occ
1 aa 1 ac 0
2 ac tg 0 1
data
df <- structure(list(indx1 = c("aa 1", "ac"), indx2 = c("ac", "tg 0"
)), class = "data.frame", row.names = c(NA, -2L))
df1 <- structure(list(col1 = c("A", "B", "C", "D", "E"), `aa 1` = c(1L,
0L, 1L, 0L, 0L), `ab 2` = c(0L, 0L, 1L, 0L, 0L), ac = c(0L, 1L,
0L, 0L, 1L), `bd 5` = c(1L, 1L, 1L, 5L, 0L), `tg 0` = c(4L, 0L,
1L, 5L, 9L)), class = "data.frame", row.names = c(NA, -5L))
How to check if values in one dataframe exist in another dataframe in R?
Try this using %in%
and a vector for all values:
#Code
df1$reply <- df1$user_name %in% c(df2$name,df2$organisation)
Output:
df1
id reply user_name
1 1 TRUE John
2 2 TRUE Amazon
3 3 FALSE Bob
Some data used:
#Data1
df1 <- structure(list(id = 1:3, reply = c(NA, NA, NA), user_name = c("John",
"Amazon", "Bob")), class = "data.frame", row.names = c(NA, -3L
))
#Data2
df2 <- structure(list(name = c("John", "Pat"), organisation = c("Amazon",
"Apple")), class = "data.frame", row.names = c(NA, -2L))
Find rows in a dataframe which contain all elements of a row of another dataframe
Using apply:
df1[ !apply(df1, 1, function(i) any(apply(df2, 1, function(j) all(j %in% i)))), ]
# X1 X2 X3
# 5 A C E
# 7 B C D
Do the similar loops for df2 match counts:
cbind(df2,
cnt = apply(df2, 1, function(i) sum(apply(df1, 1, function(j) all(i %in% j)))))
# X1 X2 cnt
# 1 A B 3
# 2 A D 3
Check if data.frame is a subset of another data.frame
sapply(
chk,
function(v) {
sum(
rowSums(sapply(v$a, `==`, lkp$a) &
sapply(v$b, grepl, x = lkp$b)) > 0
) >= nrow(v)
}
)
or
sapply(
chk,
function(v) {
sum(
colSums(
do.call(
`&`,
Map(
function(x, y) outer(x, y, FUN = Vectorize(function(a, b) grepl(a, b))),
v,
lkp
)
)
) > 0
) >= nrow(v)
}
)
which gives
c1 c2 c3 c4
TRUE TRUE FALSE FALSE
Related Topics
Convert Comma Separated String to Integer in R
Loop Character Values in Ggtitle
Coding Practice in R:What Are the Advantages and Disadvantages of Different Styles
How to Find Row Number of a Value in R Code
Ggplot2: Is There a Fix for Jagged, Poor-Quality Text Produced by Geom_Text()
Adjusting Width of Tables Made with Kable() in Rmarkdown Documents
Differencebetween Names and Colnames
Could Not Find Function Inside Foreach Loop
Detect Non Ascii Characters in a String
One-Class Classification with Svm in R
How to Run an 'R' Script Without Suppressing Output
Regular Analysis Over Irregular Time Series
Hide Certain Columns in a Responsive Data Table Using Dt Package
How to Read \" Double-Quote Escaped Values with Read.Table in R
How Does One Change the Levels of a Factor Column in a Data.Table