How to 'subset' a named vector in R?
How about this:
foo[c('a','b')]
R: Subset vector by names
x <- c(0.1234, 9.345, 8.888, 5.345, 1.234)
names(x) <- c("GSM12", "GSM13", "GSM15", "GSM16", "GSM17")
y <- c("GSM12", "GSM15", "GSM16")
as @Gregor mentioned:
x[y]
GSM12 GSM15 GSM16
0.1234 8.8880 5.3450
Subset vector by value and corresponding name
You could combine names and values:
bar[!paste0(bar, names(bar)) %in% paste0(foo, names(foo))]
#b b e
#4 3 2
Simple and efficient way to subset a data frame using values and names in a vector
Personally, I wonder whether it is a good idea to use a named vector to subset a dataframe, since it can only be used for equality =
, while larger than
and smaller than
cannot be expressed this way. I would recommend using a quoted expression instead of a named vector (see approach below).
However, I figured out a tidyverse
way to write a function with said functionality:
library(tidyverse)
set.seed(123)
n <- 10
ds.df <- data.frame(col1 = round(rnorm(n,2,4), digit=1),
col2 = sample.int(2, n, replace=T),
col3 = sample.int(n*10, n),
col4 = sample(letters, n, replace=T))
new_filter <- function (data, expr) {
exprs_ls <- purrr::imap(expr, ~ rlang::exprs(!! rlang::sym(.y) == !!.x))
filter(data, !!! unname(unlist(exprs_ls)))
}
new_filter(ds.df, c(col1 = -0.2, col4 = "i"))
#> col1 col2 col3 col4
#> 1 -0.2 1 9 i
Created on 2020-06-17 by the reprex package (v0.3.0)
Below is my alternative approach.
In base R you can use quote
to quote the subset expression (instead of creating a vector) and then you can use eval to evaluate it inside subset
.
n <- 10
ds.df=data.frame(col1=round(rnorm(n,2,4),digit=1),
col2=sample.int(2,n,replace=T),
col3=sample.int(n*10,n),
col4=sample(letters,n,replace=T))
subset_v = quote(col1 > 2 & col3 > 40)
subset(ds.df, eval(subset_v))
#> col1 col2 col3 col4
#> 1 6.6 1 93 m
#> 2 7.0 2 62 j
#> 4 3.9 1 94 t
#> 7 4.5 1 46 r
#> 8 2.8 2 98 h
#> 10 4.9 1 78 p
Created on 2020-06-17 by the reprex package (v0.3.0)
Same approach but using dplyr filter
library(dplyr)
n <- 10
ds.df = data.frame(col1 = round(rnorm(n,2,4), digit=1),
col2 = sample.int(2, n, replace=T),
col3 = sample.int(n*10, n),
col4 = sample(letters, n, replace=T))
filter_v = expr(col1 > 2 & col3 > 40)
filter(ds.df, !! filter_v)
#> col1 col2 col3 col4
#> 1 3.3 1 70 a
#> 2 2.5 2 82 q
#> 3 3.6 1 51 z
Created on 2020-06-17 by the reprex package (v0.3.0)
R: How to slice a window of elements in named vector
You could do
x[which(names(x) == "b"):which(names(x) == "d")]
#> b c d
#> 36 67 25
The problem being that there is no guarantee in a named vector that names are unique, and if there are duplicate names the entire concept becomes meaningless.
If you wanted a complete solution that allows for tidyverse-style non-standard evaluation and sensible error messages you could have
subset_named <- function(data, exp)
{
if(missing(exp)) return(data)
exp <- as.list(match.call())$exp
if(is.numeric(exp)) return(data[exp])
if(is.character(exp)) return(data[exp])
tryCatch({
ss <- suppressWarnings(eval(exp))
return(data[ss])},
error = function(e)
{
if(as.character(exp[[1]]) != ":")
stop("`exp` must be a sequence created by ':'")
n <- names(data)
first <- as.character(exp[[2]])
second <- as.character(exp[[3]])
first_match <- which(n == first)
second_match <- which(n == second)
if(length(first_match) == 0)
stop("\"", first, "\" not found in names(",
deparse(substitute(data)), ")")
if(length(second_match) == 0)
stop("\"", second, "\" not found in names(",
deparse(substitute(data)), ")")
if(length(first_match) > 1) {
warning("\"", first,
"\" found more than once. Using first occurence only")
first_match <- first_match[1]
}
if(length(second_match) > 1) {
warning("\"", second,
"\" found more than once. Using first occurence only")
second_match <- second_match[1]
}
return(data[first_match:second_match])
})
}
That allows the following behaviour:
subset_named(x, "b":"d")
#> b c d
#> 36 67 25
subset_named(x, b:d)
#> b c d
#> 36 67 25
subset_named(x, 1:3)
#> a b c
#> 54 36 67
subset_named(x, "e")
#> e
#> 76
subset_named(x)
#> a b c d e
#> 54 36 67 25 76
Logical comparison of elements from named list vs named vector in R
To access to the element of a list by its name, you have to use double brackets:
means_list[["condition1"]] > means_list[["condition2"]]
Replacement of column values based on a named vector
You could use col
:
df$col1 <- vec[as.character(df$col)]
Or in mutate
:
library(dplyr)
df %>% mutate(col1 = vec[as.character(col)])
# col col1
# <int> <chr>
# 1 1 a
# 2 1 a
# 3 1 a
# 4 1 a
# 5 2 b
# 6 2 b
# 7 3 c
# 8 3 c
# 9 3 c
#10 3 c
#11 3 c
Subsetting a logical vector with a logical vector in R
[]
is used for subsetting a vector. You can subset a vector using integer index or logical values.
When you are using logical vector to subset a vector, a value in the vector is selected if it is TRUE. In your example you are subsetting a logical vector with a logical vector which might be confusing. Let's take another example :
a <- c(10, 20)
b <- c(TRUE, FALSE)
a[b]
#[1] 10
Since 1st value is TRUE
and second is FALSE
, the first value is selected.
Now if we invert the values, 20 would be selected because !b
returns FALSE TRUE
.
a[!b]
#[1] 20
Now implement this same logic in your example -
a = c(FALSE, FALSE)
b <- a
!b
returns TRUE TRUE
, hence both the values are selected when you do b[!a]
and the none of the value is selected when you do b[a]
.
Select Subset of Columns based on Vector R
Use %in%
:
names.use <- names(df)[!(names(df) %in% f)]
Then names.use
will contain the names of all the columns which are not contained in your vector of names f
.
To subset your data frame using the columns you want, you can use the following:
df.subset <- df[, names.use]
Related Topics
Use Hooks to Format Table in Output
R: Finding the Intersect of Two Lines
Select a Sequence of Columns: ':' Works But Not 'Seq'
As.Date Produces Unexpected Result in a Sequence of Week-Based Dates
How to Configure R-3.0.1 with --Enable-R-Shlib
Adding an Image to a Datatable in R
Shiny Error in Match.Arg(Position):'Arg' Must Be Null or a Character Vector
R Read Abbreviated Month Form a Date That Is Not in English
Technique for Finding Bad Data in Read.CSV in R
How to Extend the 'Summary' Function to Include Sd, Kurtosis and Skew
How to Get Mean of Every N Rows and Keep the Date Index
R: Get the Min/Max of Each Item of a Vector Compared to Single Value
Getting Table() to Return Zeroes in R
Error:Could Not Find Build Tools Necessary to Build
Selecting Multiple Columns in Data Frame Using Partial Column Name