dplyr::select one column and output as vector
The best way to do it (IMO):
library(dplyr)
df <- data_frame(x = 1:10, y = LETTERS[1:10])
df %>%
filter(x > 5) %>%
.$y
In dplyr 0.7.0, you can now use pull():
df %>% filter(x > 5) %>% pull(y)
Extract a dplyr tbl column as a vector
With dplyr >= 0.7.0, you can use pull()
to get a vector from a tbl
.
library(dplyr, warn.conflicts = FALSE)
db <- src_sqlite(tempfile(), create = TRUE)
iris2 <- copy_to(db, iris)
vec <- pull(iris2, Species)
head(vec)
#> [1] "setosa" "setosa" "setosa" "setosa" "setosa" "setosa"
dplyr r : selecting columns whose names are in an external vector
We could use any_of
with select
library(dplyr)
data %>%
select(any_of(col_names))
-output
a b
1 1 e
2 4 e
3 13 f
4 8 m
5 10 z
6 3 y
...
Convert data.frame column to a vector?
I'm going to attempt to explain this without making any mistakes, but I'm betting this will attract a clarification or two in the comments.
A data frame is a list. When you subset a data frame using the name of a column and [
, what you're getting is a sublist (or a sub data frame). If you want the actual atomic column, you could use [[
, or somewhat confusingly (to me) you could do aframe[,2]
which returns a vector, not a sublist.
So try running this sequence and maybe things will be clearer:
avector <- as.vector(aframe['a2'])
class(avector)
avector <- aframe[['a2']]
class(avector)
avector <- aframe[,2]
class(avector)
Dplyr syntax selecting columns and converting them to a single list
The unlist()
function is probably what you are looking for.
Quoting from the built in documentation for ?unlist
: "Given a list structure x, unlist simplifies it to produce a vector which contains all the atomic components which occur in x."
Since R data frames (and tibbles) are implemented as lists of column vectors with equal lengths, the unlist function will effectively convert a data frame into a vector.
Subset for the desired rows and columns with filter
and select
, then pipe the result through unlist()
and then unique()
. The result will be a vector with the distinct elements.
library(dplyr)
# The example data
tibble(a = c("Gene_1", "Gene_2", "Gene_1"),
b = c("Gene_2", "Gene_3", "Gene_4"),
c = c("X", "R", "X")) %>%
# Subset data for desired feature
filter(c == "X") %>%
# Select identifier columns
select(a, b) %>%
# convert to a vector
unlist() %>%
# derive unique elements
unique()
Result
[1] "Gene_1" "Gene_2" "Gene_4"
Can I get dplyr output as value, and not as data frame?
a. length()
returns the number of elements in an object. When the object is a data.frame (or tibble) it will return the number of columns. The L
after the number means it is an integer.
b. mean()
requieres a numeric vector to work. In b you are passing a tibble to the function.
c. dplyr
functions are meant to receive tibbles for input and produce tibbles for output. You can pull()
a column of your tibble so it becomes a vector.
C <- flights %>% filter(carrier == "AA") %>% pull(hour) %>% mean()
Use vector of columns in custom dplyr function
You don't necessarily need the function, as you can just mutate
across
the columns and get sums for each category.
library(tidyverse)
dat %>%
group_by(category) %>%
mutate(across(ends_with("take"), .fns = list(count = ~sum(. == "yes"))))
Or if you have a long list, then you can use vars
directly in the across
statement:
vars <- c("intake", "outtake", "pretake")
dat %>%
group_by(category) %>%
mutate(across(vars, .fns = list(count = ~sum(. == "yes"))))
Output
category intake outtake pretake intake_count outtake_count pretake_count
<chr> <fct> <fct> <fct> <int> <int> <int>
1 a no yes no 0 2 0
2 b no yes yes 0 1 2
3 c no yes no 1 1 0
4 d no yes yes 1 1 2
5 e no yes no 1 1 0
6 f no yes yes 1 1 2
7 g no yes no 1 1 0
8 h no yes yes 1 1 2
9 i no yes no 1 1 0
10 j no yes yes 1 1 2
11 a no yes no 0 2 0
12 b no no yes 0 1 2
13 c yes no no 1 1 0
14 d yes no yes 1 1 2
15 e yes no no 1 1 0
16 f yes no yes 1 1 2
17 g yes no no 1 1 0
18 h yes no yes 1 1 2
19 i yes no no 1 1 0
20 j yes no yes 1 1 2
How to select quoted columns with with vector as input using dplyr
What's wrong with select
? Did you try it? Have you got a different version of dplyr (0.7.2) than me?:
> dat %>% select(pp)
# A tibble: 6 x 2
`YY_164.XXX-ad` `YY_165.XXX-ad`
<dbl> <dbl>
1 0.004 0.022
2 0.000 0.001
3 0.000 0.000
4 0.001 0.001
5 0.001 0.002
6 0.001 0.000
R using dplyr::select() in a list-column workflow
Both are list
columns. We can extract by unlist
ing or extracting with [[
in select
dplyr::select(df_list_col$data[[1]], unlist(df_list_col$cols))
Or another option with !!!
select(df_list_col$data[[1]], !!! df_list_col$cols)
Or using the tidyverse
syntax
library(dplyr)
library(purrr)
df_list_col %>%
mutate(subset = map2(data, cols, ~ .x %>% select(all_of(.y))))
-output
# A tibble: 1 x 3
# data cols subset
# <list> <list> <list>
#1 <tibble [10 × 4]> <chr [2]> <tibble [10 × 2]>
Or with pmap
df_list_col %>%
mutate(subset = pmap(cur_data(), ~ select(..1, all_of(..2 ))))
Related Topics
R Random Forests Variable Importance
Assign Headers Based on Existing Row in Dataframe in R
Draw a Chronological Timeline with Ggplot2
How to Plot a Subset of a Data Frame in R
Code Organisation in R Package Development
How to Plot a Histogram of a Long-Tailed Data Using R
How to Increase Size of the Points in Ggplot2, Similar to Cex in Base Plots
How to Optimize Read and Write to Subsections of a Matrix in R (Possibly Using Data.Table)
Fastest Way for Multiplying a Matrix to a Vector
How to Combine Aes() and Aes_String() Options
What Are Some Good Books, Web Resources, and Projects for Learning R
How to Apply Function Over Each Matrix Element's Indices
R Sequence of Dates with Lubridate
How to Convert Utm Coordinates to Lat and Long in R
How to Reduce Space Gap Between Multiple Graphs in R
Stl Decomposition of Time Series with Missing Values for Anomaly Detection