How to 'Subset' a Named Vector in R

How to 'subset' a named vector in R?

How about this:

foo[c('a','b')]

R: Subset vector by names

x <- c(0.1234,   9.345,   8.888,  5.345,  1.234)
names(x) <- c("GSM12", "GSM13", "GSM15", "GSM16", "GSM17")
y <- c("GSM12", "GSM15", "GSM16")

as @Gregor mentioned:

x[y]

GSM12 GSM15 GSM16
0.1234 8.8880 5.3450

Subset vector by value and corresponding name

You could combine names and values:

bar[!paste0(bar, names(bar)) %in% paste0(foo, names(foo))]

#b b e
#4 3 2

Simple and efficient way to subset a data frame using values and names in a vector

Personally, I wonder whether it is a good idea to use a named vector to subset a dataframe, since it can only be used for equality =, while larger than and smaller than cannot be expressed this way. I would recommend using a quoted expression instead of a named vector (see approach below).

However, I figured out a tidyverse way to write a function with said functionality:

library(tidyverse)

set.seed(123)
n <- 10

ds.df <- data.frame(col1 = round(rnorm(n,2,4), digit=1),
col2 = sample.int(2, n, replace=T),
col3 = sample.int(n*10, n),
col4 = sample(letters, n, replace=T))

new_filter <- function (data, expr) {
exprs_ls <- purrr::imap(expr, ~ rlang::exprs(!! rlang::sym(.y) == !!.x))
filter(data, !!! unname(unlist(exprs_ls)))
}

new_filter(ds.df, c(col1 = -0.2, col4 = "i"))
#> col1 col2 col3 col4
#> 1 -0.2 1 9 i

Created on 2020-06-17 by the reprex package (v0.3.0)
Below is my alternative approach.
In base R you can use quote to quote the subset expression (instead of creating a vector) and then you can use eval to evaluate it inside subset.

n <- 10   

ds.df=data.frame(col1=round(rnorm(n,2,4),digit=1),
col2=sample.int(2,n,replace=T),
col3=sample.int(n*10,n),
col4=sample(letters,n,replace=T))

subset_v = quote(col1 > 2 & col3 > 40)

subset(ds.df, eval(subset_v))
#> col1 col2 col3 col4
#> 1 6.6 1 93 m
#> 2 7.0 2 62 j
#> 4 3.9 1 94 t
#> 7 4.5 1 46 r
#> 8 2.8 2 98 h
#> 10 4.9 1 78 p

Created on 2020-06-17 by the reprex package (v0.3.0)


Same approach but using dplyr filter

library(dplyr)

n <- 10

ds.df = data.frame(col1 = round(rnorm(n,2,4), digit=1),
col2 = sample.int(2, n, replace=T),
col3 = sample.int(n*10, n),
col4 = sample(letters, n, replace=T))

filter_v = expr(col1 > 2 & col3 > 40)

filter(ds.df, !! filter_v)

#> col1 col2 col3 col4
#> 1 3.3 1 70 a
#> 2 2.5 2 82 q
#> 3 3.6 1 51 z

Created on 2020-06-17 by the reprex package (v0.3.0)

R: How to slice a window of elements in named vector

You could do

x[which(names(x) == "b"):which(names(x) == "d")]
#> b c d
#> 36 67 25

The problem being that there is no guarantee in a named vector that names are unique, and if there are duplicate names the entire concept becomes meaningless.

If you wanted a complete solution that allows for tidyverse-style non-standard evaluation and sensible error messages you could have

subset_named <- function(data, exp)
{
if(missing(exp)) return(data)
exp <- as.list(match.call())$exp
if(is.numeric(exp)) return(data[exp])
if(is.character(exp)) return(data[exp])

tryCatch({
ss <- suppressWarnings(eval(exp))
return(data[ss])},
error = function(e)
{
if(as.character(exp[[1]]) != ":")
stop("`exp` must be a sequence created by ':'")
n <- names(data)
first <- as.character(exp[[2]])
second <- as.character(exp[[3]])
first_match <- which(n == first)
second_match <- which(n == second)
if(length(first_match) == 0)
stop("\"", first, "\" not found in names(",
deparse(substitute(data)), ")")
if(length(second_match) == 0)
stop("\"", second, "\" not found in names(",
deparse(substitute(data)), ")")
if(length(first_match) > 1) {
warning("\"", first,
"\" found more than once. Using first occurence only")
first_match <- first_match[1]
}
if(length(second_match) > 1) {
warning("\"", second,
"\" found more than once. Using first occurence only")
second_match <- second_match[1]
}
return(data[first_match:second_match])
})
}

That allows the following behaviour:

subset_named(x, "b":"d")
#> b c d
#> 36 67 25

subset_named(x, b:d)
#> b c d
#> 36 67 25

subset_named(x, 1:3)
#> a b c
#> 54 36 67

subset_named(x, "e")
#> e
#> 76

subset_named(x)
#> a b c d e
#> 54 36 67 25 76

Logical comparison of elements from named list vs named vector in R

To access to the element of a list by its name, you have to use double brackets:

means_list[["condition1"]] > means_list[["condition2"]]

Replacement of column values based on a named vector

You could use col :

df$col1 <- vec[as.character(df$col)]

Or in mutate :

library(dplyr)
df %>% mutate(col1 = vec[as.character(col)])
# col col1
# <int> <chr>
# 1 1 a
# 2 1 a
# 3 1 a
# 4 1 a
# 5 2 b
# 6 2 b
# 7 3 c
# 8 3 c
# 9 3 c
#10 3 c
#11 3 c

Subsetting a logical vector with a logical vector in R

[] is used for subsetting a vector. You can subset a vector using integer index or logical values.

When you are using logical vector to subset a vector, a value in the vector is selected if it is TRUE. In your example you are subsetting a logical vector with a logical vector which might be confusing. Let's take another example :

a <- c(10, 20)
b <- c(TRUE, FALSE)
a[b]
#[1] 10

Since 1st value is TRUE and second is FALSE, the first value is selected.

Now if we invert the values, 20 would be selected because !b returns FALSE TRUE.

a[!b]
#[1] 20

Now implement this same logic in your example -

a = c(FALSE, FALSE)
b <- a

!b returns TRUE TRUE, hence both the values are selected when you do b[!a] and the none of the value is selected when you do b[a].

Select Subset of Columns based on Vector R

Use %in%:

names.use <- names(df)[!(names(df) %in% f)]

Then names.use will contain the names of all the columns which are not contained in your vector of names f.

To subset your data frame using the columns you want, you can use the following:

df.subset <- df[, names.use]


Related Topics



Leave a reply



Submit