Subset of dataframe based on values in another dataframe
As mentioned in the comments there were whitespaces in the data hence it didn't match. We can use trimws
to remove the whitespace and then try to subset it.
df2[trimws(df2$relevantcolumn) %in% trimws(df1), ]
Or if df1
is dataframe
df2[trimws(df2$relevantcolumn) %in% trimws(df1$relevant_column), ]
subset a data frame based on two conditions from another data frame
May be we need to do an inner_join
library(dplyr)
inner_join(DF1, DF2, by = c("Date", "First.Name" = "Participant.ID")))
Or using data.table
library(data.table)
setDT(DF10[DF2, on = .(Date, First.Name = Participant.ID)]
Subset a data frame based on another
Use setdiff
to exclude observations appearing in both df
> x[setdiff(x$id, y$id),]
id g
1 1 21
2 2 52
5 5 35
Use merge
to include observations present in both df
> merge(x, y)
id g u
1 3 43 55
2 4 94 77
or looking for this subset?
> x[intersect(x$id, y$id),]
id g
3 3 43
4 4 94
How to subset a dataframe based on columns from another dataframe?
When using merge, by default the data frames are joined by the variables they have in common, and the results are sorted. So you can do:
merge(df2, df1[c('x', 'y')])
# x y value
# 1 1 1 12
# 2 1 2 11
# 3 1 3 9
# 4 1 4 10
# 5 1 5 8
To sort by the order of df1
, use @Mankind_008's method
merge(df1[c('x','y')], df2 , sort = F)
Example:
set.seed(0)
df1 <- df1[sample(seq_len(nrow(df1))),]
df2 <- df2[sample(seq_len(nrow(df2))),]
df1
# x y value
# 5 1 5 7
# 2 1 2 4
# 4 1 4 6
# 3 1 3 5
# 1 1 1 3
merge(df1[c('x','y')], df2 , sort = F)
# x y value
# 1 1 5 8
# 2 1 2 11
# 3 1 4 10
# 4 1 3 9
# 5 1 1 12
Flag subset of a dataframe based on another dataframe values
First create Multiindex on both the dataframes then use MultiIndex.isin
to test for the occurrence of the index values of first dataframe in the index of second dataframe in order the create boolean flag:
i1 = first_df.set_index([first_df['A'] * 10, 'B']).index
i2 = second_df.set_index(['C1', 'C2']).index
first_df['Match'] = i1.isin(i2)
Result
print(first_df)
A B C D Match
1 1 a q zz True
2 2 b w xx True
3 3 c e yy False
4 4 d r vv False
Related Topics
Equivalent to Unix "Less" Command Within R Console
Create Columns from Factors and Count
Generate Dynamic R Markdown Blocks
Why Is the Terminology of Labels and Levels in Factors So Weird
Sample Rows of Subgroups from Dataframe with Dplyr
Formatting Reactive Data.Frames in Shiny
Splitting a Data.Frame by a Variable
How to Save Data File into .Rdata
How to Remove Columns from a Data.Frame
Pass Function Arguments to Both Dplyr and Ggplot
The Condition Has Length > 1 and Only the First Element Will Be Used in If Else Statement
What Ides Are Available for R in Linux
Delete Columns/Rows with More Than X% Missing
How to Position Strip Labels in Facet_Wrap Like in Facet_Grid
Deleting Columns from a Data.Frame Where Na Is More Than 15% of the Column Length