Subset of dataframe based on values in another dataframe
As mentioned in the comments there were whitespaces in the data hence it didn't match. We can use trimws
to remove the whitespace and then try to subset it.
df2[trimws(df2$relevantcolumn) %in% trimws(df1), ]
Or if df1
is dataframe
df2[trimws(df2$relevantcolumn) %in% trimws(df1$relevant_column), ]
pandas get rows which are NOT in other dataframe
One method would be to store the result of an inner merge form both dfs, then we can simply select the rows when one column's values are not in this common:
In [119]:
common = df1.merge(df2,on=['col1','col2'])
print(common)
df1[(~df1.col1.isin(common.col1))&(~df1.col2.isin(common.col2))]
col1 col2
0 1 10
1 2 11
2 3 12
Out[119]:
col1 col2
3 4 13
4 5 14
EDIT
Another method as you've found is to use isin
which will produce NaN
rows which you can drop:
In [138]:
df1[~df1.isin(df2)].dropna()
Out[138]:
col1 col2
3 4 13
4 5 14
However if df2 does not start rows in the same manner then this won't work:
df2 = pd.DataFrame(data = {'col1' : [2, 3,4], 'col2' : [11, 12,13]})
will produce the entire df:
In [140]:
df1[~df1.isin(df2)].dropna()
Out[140]:
col1 col2
0 1 10
1 2 11
2 3 12
3 4 13
4 5 14
subset a column in data frame based on another data frame/list
We can use %in%
to get a logical vector and subset
the rows of the 'table1' based on that.
subset(table1, gene_ID %in% accessions40$V1)
A better option would be data.table
library(data.table)
setDT(table1)[gene_ID %chin% accessions40$V1]
Or use filter
from dplyr
library(dplyr)
table1 %>%
filter(gene_ID %in% accessions40$V1)
Subsetting a dataframe based on values in another dataframe
d <- read.table(text="hh_id trans_type transaction_value
hh1 food 4
hh1 water 5
hh1 transport 4
hh2 water 3
hh3 transport 1
hh3 food 10
hh4 food 5
hh4 transport 15
hh4 water 10", header=T)
dw <- as.character(with(d, hh_id[trans_type=="water"]))
ds <- d[which(d$hh_id%in%dw),]
ds
# hh_id trans_type transaction_value
# 1 hh1 food 4
# 2 hh1 water 5
# 3 hh1 transport 4
# 4 hh2 water 3
# 7 hh4 food 5
# 8 hh4 transport 15
# 9 hh4 water 10
Subset a dataframe using start and stop points from another dataframe?
Using dplyr
we can do a left_join
on dat
and df
and select only those rows which lie in between
first
and last
of their respective id
.
library(dplyr)
left_join(dat, df, by = c("dat_id" = "id")) %>%
filter(between(dat_frame, first, last)) %>%
select(-first, -last)
Or using the same logic in base R
subset(merge(dat, df, by.x = "dat_id", by.y = "id", all.x = TRUE),
dat_frame >= first & dat_frame <= last)
Related Topics
How to Add New Calculated Variables to a Data Frame
R Replacing Zeros in Dataframe with Next Non Zero Value
How to Get This Data Structure in R
Multiplying Combinations of a List of Lists in R
Combining Rows Based on a Column
How Can One Mix 2 or More Color Palettes to Show a Combined Color Value
How to Use Geom_Rect with Discrete Axis Values
Downgrade R Version (No Issues with Bioconductor Installation)
R - Calculate Test Mse Given a Trained Model from a Training Set and a Test Set
Highlight a Single "Bar" in Ggplot
Backports 1.1.1 Package Fails to Install
Using Ggplot2 with Columns That Have Spaces in Their Names
Mapping Variable to Hexagon Size with Geom_Hex
Changing Names in a List of Dataframes