Dplyr::Select One Column and Output as Vector

dplyr::select one column and output as vector

The best way to do it (IMO):

library(dplyr)
df <- data_frame(x = 1:10, y = LETTERS[1:10])

df %>%
filter(x > 5) %>%
.$y

In dplyr 0.7.0, you can now use pull():

df %>% filter(x > 5) %>% pull(y)

Extract a dplyr tbl column as a vector

With dplyr >= 0.7.0, you can use pull() to get a vector from a tbl.



library(dplyr, warn.conflicts = FALSE)
db <- src_sqlite(tempfile(), create = TRUE)
iris2 <- copy_to(db, iris)
vec <- pull(iris2, Species)
head(vec)
#> [1] "setosa" "setosa" "setosa" "setosa" "setosa" "setosa"

dplyr r : selecting columns whose names are in an external vector

We could use any_of with select

library(dplyr)
data %>%
select(any_of(col_names))

-output

 a b
1 1 e
2 4 e
3 13 f
4 8 m
5 10 z
6 3 y
...

Convert data.frame column to a vector?

I'm going to attempt to explain this without making any mistakes, but I'm betting this will attract a clarification or two in the comments.

A data frame is a list. When you subset a data frame using the name of a column and [, what you're getting is a sublist (or a sub data frame). If you want the actual atomic column, you could use [[, or somewhat confusingly (to me) you could do aframe[,2] which returns a vector, not a sublist.

So try running this sequence and maybe things will be clearer:

avector <- as.vector(aframe['a2'])
class(avector)

avector <- aframe[['a2']]
class(avector)

avector <- aframe[,2]
class(avector)

Dplyr syntax selecting columns and converting them to a single list

The unlist() function is probably what you are looking for.

Quoting from the built in documentation for ?unlist: "Given a list structure x, unlist simplifies it to produce a vector which contains all the atomic components which occur in x."

Since R data frames (and tibbles) are implemented as lists of column vectors with equal lengths, the unlist function will effectively convert a data frame into a vector.

Subset for the desired rows and columns with filter and select, then pipe the result through unlist() and then unique(). The result will be a vector with the distinct elements.

library(dplyr)

# The example data
tibble(a = c("Gene_1", "Gene_2", "Gene_1"),
b = c("Gene_2", "Gene_3", "Gene_4"),
c = c("X", "R", "X")) %>%

# Subset data for desired feature
filter(c == "X") %>%

# Select identifier columns
select(a, b) %>%

# convert to a vector
unlist() %>%

# derive unique elements
unique()

Result

[1] "Gene_1" "Gene_2" "Gene_4"

Can I get dplyr output as value, and not as data frame?

a. length() returns the number of elements in an object. When the object is a data.frame (or tibble) it will return the number of columns. The L after the number means it is an integer.

b. mean() requieres a numeric vector to work. In b you are passing a tibble to the function.

c. dplyr functions are meant to receive tibbles for input and produce tibbles for output. You can pull() a column of your tibble so it becomes a vector.

C <- flights %>% filter(carrier == "AA") %>% pull(hour) %>% mean()

Use vector of columns in custom dplyr function

You don't necessarily need the function, as you can just mutate across the columns and get sums for each category.

library(tidyverse)

dat %>%
group_by(category) %>%
mutate(across(ends_with("take"), .fns = list(count = ~sum(. == "yes"))))

Or if you have a long list, then you can use vars directly in the across statement:

vars <- c("intake", "outtake", "pretake")

dat %>%
group_by(category) %>%
mutate(across(vars, .fns = list(count = ~sum(. == "yes"))))

Output

  category intake outtake pretake intake_count outtake_count pretake_count
<chr> <fct> <fct> <fct> <int> <int> <int>
1 a no yes no 0 2 0
2 b no yes yes 0 1 2
3 c no yes no 1 1 0
4 d no yes yes 1 1 2
5 e no yes no 1 1 0
6 f no yes yes 1 1 2
7 g no yes no 1 1 0
8 h no yes yes 1 1 2
9 i no yes no 1 1 0
10 j no yes yes 1 1 2
11 a no yes no 0 2 0
12 b no no yes 0 1 2
13 c yes no no 1 1 0
14 d yes no yes 1 1 2
15 e yes no no 1 1 0
16 f yes no yes 1 1 2
17 g yes no no 1 1 0
18 h yes no yes 1 1 2
19 i yes no no 1 1 0
20 j yes no yes 1 1 2

How to select quoted columns with with vector as input using dplyr

What's wrong with select? Did you try it? Have you got a different version of dplyr (0.7.2) than me?:

> dat %>% select(pp)
# A tibble: 6 x 2
`YY_164.XXX-ad` `YY_165.XXX-ad`
<dbl> <dbl>
1 0.004 0.022
2 0.000 0.001
3 0.000 0.000
4 0.001 0.001
5 0.001 0.002
6 0.001 0.000

R using dplyr::select() in a list-column workflow

Both are list columns. We can extract by unlisting or extracting with [[ in select

dplyr::select(df_list_col$data[[1]], unlist(df_list_col$cols))

Or another option with !!!

select(df_list_col$data[[1]], !!! df_list_col$cols)

Or using the tidyverse syntax

library(dplyr)
library(purrr)
df_list_col %>%
mutate(subset = map2(data, cols, ~ .x %>% select(all_of(.y))))

-output

# A tibble: 1 x 3
# data cols subset
# <list> <list> <list>
#1 <tibble [10 × 4]> <chr [2]> <tibble [10 × 2]>

Or with pmap

df_list_col %>%
mutate(subset = pmap(cur_data(), ~ select(..1, all_of(..2 ))))


Related Topics



Leave a reply



Submit