Subsetting a Data Frame Based on Contents of Another Data Frame

Subset of dataframe based on values in another dataframe

As mentioned in the comments there were whitespaces in the data hence it didn't match. We can use trimws to remove the whitespace and then try to subset it.

df2[trimws(df2$relevantcolumn) %in% trimws(df1), ]

Or if df1 is dataframe

df2[trimws(df2$relevantcolumn) %in% trimws(df1$relevant_column), ]

subset a data frame based on two conditions from another data frame

May be we need to do an inner_join

library(dplyr)
inner_join(DF1, DF2, by = c("Date", "First.Name" = "Participant.ID")))

Or using data.table

library(data.table)
setDT(DF10[DF2, on = .(Date, First.Name = Participant.ID)]

Subset a data frame based on another

Use setdiff to exclude observations appearing in both df

> x[setdiff(x$id, y$id),]  
id g
1 1 21
2 2 52
5 5 35

Use merge to include observations present in both df

> merge(x, y)
id g u
1 3 43 55
2 4 94 77

or looking for this subset?

> x[intersect(x$id, y$id),]
id g
3 3 43
4 4 94

How to subset a dataframe based on columns from another dataframe?

When using merge, by default the data frames are joined by the variables they have in common, and the results are sorted. So you can do:

merge(df2, df1[c('x', 'y')])

# x y value
# 1 1 1 12
# 2 1 2 11
# 3 1 3 9
# 4 1 4 10
# 5 1 5 8

To sort by the order of df1, use @Mankind_008's method

merge(df1[c('x','y')], df2 , sort = F)

Example:

set.seed(0)
df1 <- df1[sample(seq_len(nrow(df1))),]
df2 <- df2[sample(seq_len(nrow(df2))),]
df1
# x y value
# 5 1 5 7
# 2 1 2 4
# 4 1 4 6
# 3 1 3 5
# 1 1 1 3
merge(df1[c('x','y')], df2 , sort = F)
# x y value
# 1 1 5 8
# 2 1 2 11
# 3 1 4 10
# 4 1 3 9
# 5 1 1 12

Flag subset of a dataframe based on another dataframe values

First create Multiindex on both the dataframes then use MultiIndex.isin to test for the occurrence of the index values of first dataframe in the index of second dataframe in order the create boolean flag:

i1 = first_df.set_index([first_df['A'] * 10, 'B']).index
i2 = second_df.set_index(['C1', 'C2']).index

first_df['Match'] = i1.isin(i2)

Result

print(first_df)

A B C D Match
1 1 a q zz True
2 2 b w xx True
3 3 c e yy False
4 4 d r vv False


Related Topics



Leave a reply



Submit