Dplyr::Select() with Some Variables That May Not Exist in the Data Frame

dplyr::select() with some variables that may not exist in the data frame?

Another option is select_if:

d2 %>% select_if(names(.) %in% c('taxon', 'model', 'z'))

# # A tibble: 1 x 2
#   taxon     z
#   <dbl> <dbl>
# 1     2     3

select_if is superseded. Use any_of instead:

d2 %>% select(any_of(c('taxon', 'model', 'z')))
# # A tibble: 1 x 2
#   taxon     z
#   <dbl> <dbl>
# 1     2     3

type ?dplyr::select in R and you will find this:

These helpers select variables from a character vector:
all_of(): Matches variable names in a character vector. All names must
be present, otherwise an out-of-bounds error is thrown.
any_of(): Same as all_of(), except that no error is thrown for names
that don't exist.

How do I select columns that may or may not exist?

In the devel version of dplyr

df %>%
   select(year, contains("boo"))
#     year
#1  2000
#2  2001
#3  2002
#4  2003
#5  2004
#6  2005
#7  2006
#8  2007
#9  2008
#10 2009
#11 2010

gives the expected output

Otherwise one option would be to use one_of

df %>%
   select(one_of("year", "boo"))

It returns a warning message if the column is not available

Other option is matches

df %>%
  select(matches("year|boo"))

dplyr::select() to reorder columns which may not exist

We can use intersect

library(dplyr)
tibby %>%
     select(intersect(col_order, names(.)))
# A tibble: 10 x 3
#         a      b      d
#     <dbl>  <dbl>  <dbl>
# 1 -0.0449 0.935  -0.626
# 2 -0.0162 0.212   0.184
# 3  0.944  0.652  -0.836
# 4  0.821  0.126   1.60 
# 5  0.594  0.267   0.330
# 6  0.919  0.386  -0.820
# 7  0.782  0.0134  0.487
# 8  0.0746 0.382   0.738
# 9 -1.99   0.870   0.576
#10  0.620  0.340  -0.305

dplyr: select all variables except for those contained in vector

select(df, -any_of(excluded_vars))
is now the safest way to do this (the code will not break if a variable name that doesn't exist in df is included in excluded_vars)

dplyr r : selecting columns whose names are in an external vector

We could use any_of with select

library(dplyr)
data %>%
     select(any_of(col_names))

-output

 a b
1  1 e
2  4 e
3 13 f
4  8 m
5 10 z
6  3 y
...

dplyr::select - Including All Other Columns at End of New Data Frame (or Beginning or Middle)

Update: using dplyr::relocate()

Selected columns **at the beginning**:

flights %>%  
  relocate(carrier, tailnum, year, month, day)

Selected columns **at the end**:

flights %>%  
  relocate(carrier, tailnum, year, month, day, .after = last_col())

Old answer

>If you want to **reorder the columns**

All other columns **at the end**:

select(flights, carrier, tailnum, year, month, day, everything())

Or in two steps, to select variables provided in a character vector, one_of("x", "y", "z"):

col <- c("carrier", "tailnum", "year", "month", "day")
select(flights, one_of(col), everything())

All other columns **at the beginning**:

select(flights, -one_of(col), one_of(col))

If you want to add all the data frame again using dplyr:

All data frame at the end:

bind_cols(select(flights, one_of(col)), flights)

All data frame at the beginning:

bind_cols(flights, select(flights, one_of(col)))

dplyr::select Object not found in self-made function

There are two issues with your function. The first error arises because calendario is no column of the df passed to the function. Simply remove the df$ when specifying the aesthetics. Second. Even when removing the df$ you set the y-aesthetic equal the string in variable dato, i.e. "indice_covid" in your example. That is for every date you have the same value "indice_covid". That's why you get a flat line. To tell ggplot2 that you want a the column datoof the df you have to convert it to a symbol using sym and the bang-bang-operator !!, i.e. !!sym(dato). Try this:

library(ggplot2)
library(dplyr)

plot_by_reg <- function(df, reg, dato) {

  df %>% 
    dplyr::filter(denominazione_regione == reg) %>%
    dplyr::mutate(calendario = format(as.Date(paste(mese,giorno , sep = "-" )  , format = "%m-%d" ), "%m-%d")) %>%
    dplyr::select(c(denominazione_regione, calendario, all_of(dato))) %>%
    #ggplot(aes(x=df$calendario, y=df$dato)) +
    ggplot(aes(x = calendario, y = !!sym(dato))) +
    geom_line(aes(group = 1)) +
    theme_dark()
}

plot_by_reg(df = data.moving, reg = "Toscana", dato = "indice_covid")

Sample Image

^{Created on 2020-05-25 by the reprex package (v0.3.0)}

How can I select only the dummy variable columns?

You can pass a function (or rlang-tilde function) to select_if, and look for columns that only contain 0:1.

tribble(
    ~id, ~gender, ~height, ~smoking,
    1, 1, 170, 0,
    2, 0, 150, 0,
    3, 1, 160, 1
  ) %>%
  select_if(~ all(. %in% 0:1))
# # A tibble: 3 x 2
#   gender smoking
#    <dbl>   <dbl>
# 1      1       0
# 2      0       0
# 3      1       1

If you may have NA in a dummy-variable column, you may want to instead use %in% c(0:1, NA) in the predicate.

Dplyr::Select() with Some Variables That May Not Exist in the Data Frame