Dplyr::Select One Column and Output as Vector

dplyr::select one column and output as vector

The best way to do it (IMO):

library(dplyr)
df <- data_frame(x = 1:10, y = LETTERS[1:10])

df %>% 
  filter(x > 5) %>% 
  .$y

In dplyr 0.7.0, you can now use pull():

df %>% filter(x > 5) %>% pull(y)

Extract a dplyr tbl column as a vector

With dplyr >= 0.7.0, you can use pull() to get a vector from a tbl.

library(dplyr, warn.conflicts = FALSE)
db <- src_sqlite(tempfile(), create = TRUE)
iris2 <- copy_to(db, iris)
vec <- pull(iris2, Species)
head(vec)
#> [1] "setosa" "setosa" "setosa" "setosa" "setosa" "setosa"

dplyr r : selecting columns whose names are in an external vector

We could use any_of with select

library(dplyr)
data %>%
     select(any_of(col_names))

-output

 a b
1  1 e
2  4 e
3 13 f
4  8 m
5 10 z
6  3 y
...

Convert data.frame column to a vector?

I'm going to attempt to explain this without making any mistakes, but I'm betting this will attract a clarification or two in the comments.

A data frame is a list. When you subset a data frame using the name of a column and [, what you're getting is a sublist (or a sub data frame). If you want the actual atomic column, you could use [[, or somewhat confusingly (to me) you could do aframe[,2] which returns a vector, not a sublist.

So try running this sequence and maybe things will be clearer:

avector <- as.vector(aframe['a2'])
class(avector) 

avector <- aframe[['a2']]
class(avector)

avector <- aframe[,2]
class(avector)

Dplyr syntax selecting columns and converting them to a single list

The unlist() function is probably what you are looking for.

Quoting from the built in documentation for ?unlist: "Given a list structure x, unlist simplifies it to produce a vector which contains all the atomic components which occur in x."

Since R data frames (and tibbles) are implemented as lists of column vectors with equal lengths, the unlist function will effectively convert a data frame into a vector.

Subset for the desired rows and columns with filter and select, then pipe the result through unlist() and then unique(). The result will be a vector with the distinct elements.

library(dplyr)

# The example data
tibble(a = c("Gene_1", "Gene_2", "Gene_1"),
       b = c("Gene_2", "Gene_3", "Gene_4"),
       c = c("X", "R", "X")) %>%
    
    # Subset data for desired feature
    filter(c == "X") %>%
    
    # Select identifier columns
    select(a, b) %>%
    
    # convert to a vector
    unlist() %>%
    
    # derive unique elements
    unique()

Result

[1] "Gene_1" "Gene_2" "Gene_4"

Can I get dplyr output as value, and not as data frame?

a. length() returns the number of elements in an object. When the object is a data.frame (or tibble) it will return the number of columns. The L after the number means it is an integer.

b. mean() requieres a numeric vector to work. In b you are passing a tibble to the function.

c. dplyr functions are meant to receive tibbles for input and produce tibbles for output. You can pull() a column of your tibble so it becomes a vector.

C <- flights %>% filter(carrier == "AA") %>% pull(hour) %>% mean()

Use vector of columns in custom dplyr function

You don't necessarily need the function, as you can just mutate across the columns and get sums for each category.

library(tidyverse)

dat %>%
  group_by(category) %>%
  mutate(across(ends_with("take"), .fns = list(count = ~sum(. == "yes"))))

Or if you have a long list, then you can use vars directly in the across statement:

vars <- c("intake", "outtake", "pretake")

dat %>%
  group_by(category) %>%
  mutate(across(vars, .fns = list(count = ~sum(. == "yes"))))

Output

  category intake outtake pretake intake_count outtake_count pretake_count
   <chr>    <fct>  <fct>   <fct>          <int>         <int>         <int>
 1 a        no     yes     no                 0             2             0
 2 b        no     yes     yes                0             1             2
 3 c        no     yes     no                 1             1             0
 4 d        no     yes     yes                1             1             2
 5 e        no     yes     no                 1             1             0
 6 f        no     yes     yes                1             1             2
 7 g        no     yes     no                 1             1             0
 8 h        no     yes     yes                1             1             2
 9 i        no     yes     no                 1             1             0
10 j        no     yes     yes                1             1             2
11 a        no     yes     no                 0             2             0
12 b        no     no      yes                0             1             2
13 c        yes    no      no                 1             1             0
14 d        yes    no      yes                1             1             2
15 e        yes    no      no                 1             1             0
16 f        yes    no      yes                1             1             2
17 g        yes    no      no                 1             1             0
18 h        yes    no      yes                1             1             2
19 i        yes    no      no                 1             1             0
20 j        yes    no      yes                1             1             2

How to select quoted columns with with vector as input using dplyr

What's wrong with select? Did you try it? Have you got a different version of dplyr (0.7.2) than me?:

> dat %>% select(pp)
# A tibble: 6 x 2
  `YY_164.XXX-ad` `YY_165.XXX-ad`
            <dbl>           <dbl>
1           0.004           0.022
2           0.000           0.001
3           0.000           0.000
4           0.001           0.001
5           0.001           0.002
6           0.001           0.000

R using dplyr::select() in a list-column workflow

Both are list columns. We can extract by unlisting or extracting with [[ in select

dplyr::select(df_list_col$data[[1]], unlist(df_list_col$cols))

Or another option with !!!

select(df_list_col$data[[1]], !!! df_list_col$cols)

Or using the tidyverse syntax

library(dplyr)
library(purrr)
df_list_col %>% 
         mutate(subset = map2(data, cols, ~ .x %>% select(all_of(.y))))

-output

# A tibble: 1 x 3
#  data              cols      subset           
#  <list>            <list>    <list>           
#1 <tibble [10 × 4]> <chr [2]> <tibble [10 × 2]>

Or with pmap

df_list_col %>%
     mutate(subset = pmap(cur_data(),  ~ select(..1, all_of(..2 ))))

Dplyr::Select One Column and Output as Vector