Dplyr::Select() with Some Variables That May Not Exist in the Data Frame

dplyr::select() with some variables that may not exist in the data frame?

Another option is select_if:

d2 %>% select_if(names(.) %in% c('taxon', 'model', 'z'))

# # A tibble: 1 x 2
# taxon z
# <dbl> <dbl>
# 1 2 3

select_if is superseded. Use any_of instead:

d2 %>% select(any_of(c('taxon', 'model', 'z')))
# # A tibble: 1 x 2
# taxon z
# <dbl> <dbl>
# 1 2 3

type ?dplyr::select in R and you will find this:

These helpers select variables from a character vector:

all_of(): Matches variable names in a character vector. All names must
be present, otherwise an out-of-bounds error is thrown.

any_of(): Same as all_of(), except that no error is thrown for names
that don't exist.

How do I select columns that may or may not exist?

In the devel version of dplyr

df %>%
select(year, contains("boo"))
# year
#1 2000
#2 2001
#3 2002
#4 2003
#5 2004
#6 2005
#7 2006
#8 2007
#9 2008
#10 2009
#11 2010

gives the expected output

Otherwise one option would be to use one_of

df %>%
select(one_of("year", "boo"))

It returns a warning message if the column is not available

Other option is matches

df %>%
select(matches("year|boo"))

dplyr::select() to reorder columns which may not exist

We can use intersect

library(dplyr)
tibby %>%
select(intersect(col_order, names(.)))
# A tibble: 10 x 3
# a b d
# <dbl> <dbl> <dbl>
# 1 -0.0449 0.935 -0.626
# 2 -0.0162 0.212 0.184
# 3 0.944 0.652 -0.836
# 4 0.821 0.126 1.60
# 5 0.594 0.267 0.330
# 6 0.919 0.386 -0.820
# 7 0.782 0.0134 0.487
# 8 0.0746 0.382 0.738
# 9 -1.99 0.870 0.576
#10 0.620 0.340 -0.305

dplyr: select all variables except for those contained in vector

select(df, -any_of(excluded_vars))
is now the safest way to do this (the code will not break if a variable name that doesn't exist in df is included in excluded_vars)

dplyr r : selecting columns whose names are in an external vector

We could use any_of with select

library(dplyr)
data %>%
select(any_of(col_names))

-output

 a b
1 1 e
2 4 e
3 13 f
4 8 m
5 10 z
6 3 y
...

dplyr::select - Including All Other Columns at End of New Data Frame (or Beginning or Middle)

Update: using dplyr::relocate()

  • Selected columns **at the beginning**:
  • flights %>%  
    relocate(carrier, tailnum, year, month, day)
  • Selected columns **at the end**:
  • flights %>%  
    relocate(carrier, tailnum, year, month, day, .after = last_col())

    Old answer

    >If you want to **reorder the columns**
  • All other columns **at the end**:
  • select(flights, carrier, tailnum, year, month, day, everything()) 

    Or in two steps, to select variables provided in a character vector, one_of("x", "y", "z"):

    col <- c("carrier", "tailnum", "year", "month", "day")
    select(flights, one_of(col), everything())
  • All other columns **at the beginning**:
  • select(flights, -one_of(col), one_of(col))

    If you want to add all the data frame again using dplyr:

  • All data frame at the end:
  • bind_cols(select(flights, one_of(col)), flights)
  • All data frame at the beginning:
  • bind_cols(flights, select(flights, one_of(col)))

    dplyr::select Object not found in self-made function

    There are two issues with your function. The first error arises because calendario is no column of the df passed to the function. Simply remove the df$ when specifying the aesthetics. Second. Even when removing the df$ you set the y-aesthetic equal the string in variable dato, i.e. "indice_covid" in your example. That is for every date you have the same value "indice_covid". That's why you get a flat line. To tell ggplot2 that you want a the column datoof the df you have to convert it to a symbol using sym and the bang-bang-operator !!, i.e. !!sym(dato). Try this:

    library(ggplot2)
    library(dplyr)

    plot_by_reg <- function(df, reg, dato) {

    df %>%
    dplyr::filter(denominazione_regione == reg) %>%
    dplyr::mutate(calendario = format(as.Date(paste(mese,giorno , sep = "-" ) , format = "%m-%d" ), "%m-%d")) %>%
    dplyr::select(c(denominazione_regione, calendario, all_of(dato))) %>%
    #ggplot(aes(x=df$calendario, y=df$dato)) +
    ggplot(aes(x = calendario, y = !!sym(dato))) +
    geom_line(aes(group = 1)) +
    theme_dark()
    }

    plot_by_reg(df = data.moving, reg = "Toscana", dato = "indice_covid")

    Sample Image

    Created on 2020-05-25 by the reprex package (v0.3.0)

    How can I select only the dummy variable columns?

    You can pass a function (or rlang-tilde function) to select_if, and look for columns that only contain 0:1.

    tribble(
    ~id, ~gender, ~height, ~smoking,
    1, 1, 170, 0,
    2, 0, 150, 0,
    3, 1, 160, 1
    ) %>%
    select_if(~ all(. %in% 0:1))
    # # A tibble: 3 x 2
    # gender smoking
    # <dbl> <dbl>
    # 1 1 0
    # 2 0 0
    # 3 1 1

    If you may have NA in a dummy-variable column, you may want to instead use %in% c(0:1, NA) in the predicate.



    Related Topics



    Leave a reply



    Submit