subset a column in data frame based on another data frame/list
We can use %in%
to get a logical vector and subset
the rows of the 'table1' based on that.
subset(table1, gene_ID %in% accessions40$V1)
A better option would be data.table
library(data.table)
setDT(table1)[gene_ID %chin% accessions40$V1]
Or use filter
from dplyr
library(dplyr)
table1 %>%
filter(gene_ID %in% accessions40$V1)
Subset of dataframe based on values in another dataframe
As mentioned in the comments there were whitespaces in the data hence it didn't match. We can use trimws
to remove the whitespace and then try to subset it.
df2[trimws(df2$relevantcolumn) %in% trimws(df1), ]
Or if df1
is dataframe
df2[trimws(df2$relevantcolumn) %in% trimws(df1$relevant_column), ]
R: Filter a dataframe based on another dataframe
If you are only wanting to keep the rownames in e
that occur in pf
(or that don't occur, then use !rownames(e)
), then you can just filter
on the rownames:
library(tidyverse)
e %>%
filter(rownames(e) %in% rownames(pf))
Another possibility is to create a rownames column for both dataframes. Then, we can do the semi_join
on the rownames (i.e., rn
). Then, convert the rn
column back to the rownames.
library(tidyverse)
list(e, pf) %>%
map(~ .x %>%
as.data.frame %>%
rownames_to_column('rn')) %>%
reduce(full_join, by = 'rn') %>%
column_to_rownames('rn')
Output
JHU_113_2.CEL JHU_144.CEL JHU_173.CEL JHU_176R.CEL JHU_182.CEL JHU_186.CEL JHU_187.CEL JHU_188.CEL JHU_203.CEL
2315374 6.28274 6.79161 6.11265 6.13997 6.68056 6.48156 6.45415 6.04542 5.99176
2315376 5.81678 5.71165 6.02794 5.37082 5.95527 5.75999 5.87863 5.54830 6.35571
2315587 8.88557 8.95699 8.36898 8.28993 8.41361 8.64980 8.74305 8.31915 8.43548
2315588 6.28650 6.66750 6.07503 6.76625 6.19819 6.84260 6.13916 6.40219 6.45059
2315591 6.97515 6.61705 6.51994 6.74982 6.60917 6.55182 6.62240 6.44394 5.76592
2315595 5.94179 5.39178 5.09497 4.96199 2.96431 4.95204 5.00979 4.06493 5.38048
2315598 4.99420 5.56888 5.57912 5.43960 5.19249 5.87991 5.60540 5.09513 5.43618
2315603 7.67845 7.90005 7.47594 6.75087 7.62805 8.00069 7.34296 6.81338 7.52014
2315604 6.20952 6.59687 6.14608 5.70518 6.49572 6.12622 6.23690 6.39569 6.70869
2315640 5.85307 6.07303 6.41875 6.07282 6.28283 6.13699 6.16377 6.48616 6.34162
How to subset windows in a dataframe using start- and end-values from another dataframe in R?
Maybe try this approach with purrr::map2
# dataframe of data to subset
df1 <- tibble(my_values = rnorm(100, mean = 45, sd = 30) %>% abs())
# dataframe of windows (i.e. row number IDs) to extract from data
df2 <-tibble::tribble(
~window_start, ~window_end,
3L, 10L,
21L, 25L,
52L, 63L,
78L, 90L
)
subset_thats_in <- function(mini, maxi){
df1 %>%
filter(between(my_values, mini, maxi))
}
purrr::map2(df2$window_start,
df2$window_end,
subset_thats_in)
[[1]]
# A tibble: 4 × 1
my_values
<dbl>
1 6.47
2 8.69
3 7.73
4 7.35
[[2]]
# A tibble: 12 × 1
my_values
<dbl>
1 24.2
2 22.9
3 22.4
4 24.4
5 22.6
6 21.7
7 23.2
8 21.3
9 23.3
10 21.1
11 23.5
12 22.6
[[3]]
# A tibble: 10 × 1
my_values
<dbl>
1 54.0
2 61.4
3 62.5
4 60.8
5 60.5
6 55.5
7 61.4
8 59.0
9 57.9
10 53.3
[[4]]
# A tibble: 6 × 1
my_values
<dbl>
1 87.8
2 79.1
3 80.5
4 82.7
5 85.2
6 80.6
Subsetting a dataframe based on values in another dataframe
d <- read.table(text="hh_id trans_type transaction_value
hh1 food 4
hh1 water 5
hh1 transport 4
hh2 water 3
hh3 transport 1
hh3 food 10
hh4 food 5
hh4 transport 15
hh4 water 10", header=T)
dw <- as.character(with(d, hh_id[trans_type=="water"]))
ds <- d[which(d$hh_id%in%dw),]
ds
# hh_id trans_type transaction_value
# 1 hh1 food 4
# 2 hh1 water 5
# 3 hh1 transport 4
# 4 hh2 water 3
# 7 hh4 food 5
# 8 hh4 transport 15
# 9 hh4 water 10
Related Topics
Regex Match Exact Number of a Specific Character
Shift Legend into Empty Facets of a Faceted Plot in Ggplot2
Use Superscripts in R Axis Labels
Logistic Regression - Defining Reference Level in R
Parallel Execution of Random Forest in R
Encrypting R Script Under Ms-Windows
Skip Specific Rows Using Read.CSV in R
Group by in R, Ddply with Weighted.Mean
How to Create 3D - Matlab Style - Surface Plots in R
R: Split Variable Column into Multiple (Unbalanced) Columns by Comma
Create Data Set from Clicks in Shiny Ggplot
How to Make a Ggplot2 Contour Plot Analogue to Lattice:Filled.Contour()
Scatterplot with Color Groups - Base R Plot
The Right Way to Plot Multiple Y Values as Separate Lines with Ggplot2