Select Multiple Columns with Dplyr::Select() with Numbers as Names

Select multiple columns with dplyr::select() with numbers as names

Column names starting with a number, such as "1" and "8" in your data, are not syntactically valid names (see ?make.names). Then see the 'Names and Identifiers' section in ?Quotes: "other [syntactically invalid] names can be used provided they are quoted. The preferred quote is the backtick".

Thus, wrap the invalid column names in backticks (`):

dd %>% dplyr::select(a:f, `1`:`8`)

#           a        a2         b        b2          f         1         4         8
# 1 0.2510023 0.4109819 0.6787226 0.4974859 0.01828614 0.7449878 0.1648462 0.5875638

Another option is to use the SE-version of select, select_:

dd %>% dplyr::select_(.dots = c("a", "a2", ..., "1", "4", "8"))

Using a variable to select multiple columns in select (dplyr)

If it is possible you can take two separate inputs from the user. We can then generate a sequence of column numbers between them using match.

For example using base R, with mtcars dataset

col1 <- 'mpg'
col2 <- 'am'
mtcars[match(col1, names(mtcars)) : match(col2, names(mtcars))]

#                     mpg cyl  disp  hp drat    wt  qsec vs am
#Mazda RX4           21.0   6 160.0 110 3.90 2.620 16.46  0  1
#Mazda RX4 Wag       21.0   6 160.0 110 3.90 2.875 17.02  0  1
#Datsun 710          22.8   4 108.0  93 3.85 2.320 18.61  1  1
#Hornet 4 Drive      21.4   6 258.0 110 3.08 3.215 19.44  1  0
#...

If you can't have separate input you could split the combined input on ":".

col <- 'mpg:am'
col <- strsplit(col, ":")[[1]]

and now you can use col[1] as col1 and col[2] as col2 in above method.

mtcars[match(col[1], names(mtcars)) : match(col[2], names(mtcars))]

Selecting with dplyr by variable name, some column names are numbers

Found out I can just use one_of:

select(df, one_of(vars))
A 2015
a    1
b    1
c    1
d    1
e    1

But probably @akrun is right that I should avoid numeric column names.

Select columns from dataframe start with number

Taking into account that your variables supposed to starting with numbers will be converted to variable names starting with X, you could do:

library(tidyverse)
df %>%
  select(matches("^X[0-9]"))

which gives:

   X1..A X2..B X3..C X4..D
1                         
2      D     A           G
3                        G
4     NA                 G
5            A           G
6      D     A           G
7            A           G
8            A           G
9      D                  
10

With the same logic you can do your counts:

df %>% 
  summarize(across(c(matches("^X[0-9]"), Concatenate), ~ sum(!is.na(.) & . != "" & . != "NA")))

which gives

  X1..A X2..B X3..C X4..D Concatenate
1     3     5     0     7           8

Although I'm not sure if you want to exclude the "NAG" value in the Concatenate column.

dplyr select column when column name is number

Use backticks to select columns with their names being number

data(ChickWeight)
library(dplyr)
library(tidyr)
chick <- ChickWeight %>% spread(Time,weight) %>% filter(Diet==2) %>% select(`0`)

R dplyr how to select variables by column number rather than column name with summarise

Making use of the .data pronoun from rlang you could write a custom function which takes a dataframe, the names of two variables and some additional grouping variables and computes your desired summary table like so:

library(dplyr)
library(Hmisc)

summary_table <- function(.data, x, y, ...) {
  .data %>%
    group_by(...) %>%                                                    # Group species
    summarise(n = n(),                                                       # number of records                  
              WtMn = wtd.mean(.data[[x]], .data[[y]]),                    # weighted mean
              WtSd = sqrt(wtd.var(.data[[x]], .data[[y]])),               # weighted SD
              WtCV = WtMn/WtSd,                                              # weighted CV
              Minm = min(.data[[x]]),                                      # minumum
              Wp05 = wtd.quantile(.data[[x]], .data[[y]] , 0.05),         # p05
              Wp50 = wtd.quantile(.data[[x]], .data[[y]] , 0.50),         # p50
              Wp95 = wtd.quantile(.data[[x]], .data[[y]] , 0.95),         # p95 
              Wp975 = wtd.quantile(.data[[x]], .data[[y]] , 0.975),       # p975
              Wp99 = wtd.quantile(.data[[x]], .data[[y]] , 0.99),         # p99
              Maxm = max(.data[[x]])                                       # maximum
    )  
}

summary_table(iris, "Sepal.Length", "Petal.Width", Species)
#> # A tibble: 3 x 12
#>   Species        n  WtMn  WtSd  WtCV  Minm  Wp05  Wp50  Wp95 Wp975  Wp99  Maxm
#>   <fct>      <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 setosa        50  5.05 0.356  14.2   4.3  4.61  5.06  5.62  5.70  5.72   5.8
#> 2 versicolor    50  5.98 0.508  11.8   4.9  5.13  6     6.80  6.97  7      7  
#> 3 virginica     50  6.61 0.626  10.6   4.9  5.8   6.5   7.7   7.7   7.9    7.9

summary_table(iris, "Sepal.Width", "Petal.Width", Species)
#> # A tibble: 3 x 12
#>   Species        n  WtMn  WtSd  WtCV  Minm  Wp05  Wp50  Wp95 Wp975  Wp99  Maxm
#>   <fct>      <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 setosa        50  3.47 0.399  8.69   2.3  3.06  3.46  4.27  4.4    4.4   4.4
#> 2 versicolor    50  2.80 0.310  9.04   2    2.3   2.86  3.20  3.37   3.4   3.4
#> 3 virginica     50  3.00 0.320  9.38   2.2  2.5   3     3.6   3.8    3.8   3.8

Averaging selected columns with head number name

You can do :

library(dplyr)

df %>% 
  type.convert(as.is = TRUE) %>%
  mutate(Column1 = rowMeans(select(., `1`:`2`), na.rm = TRUE), 
         Column2 = rowMeans(select(., `34`:`158`), na.rm = TRUE)) %>%
  select(A, Column1, Column2, Column3 = `190`)

#  A Column1 Column2 Column3
#1 A     1.9    3.40    22.1
#2 B     6.8    1.65     7.4
#3 C     4.7   23.85    56.0

select columns based on multiple strings with dplyr contains()

You can use matches

 mtcars %>%
        select(matches('m|ar')) %>%
        head(2)
 #              mpg am gear carb
 #Mazda RX4      21  1    4    4
 #Mazda RX4 Wag  21  1    4    4

According to the ?select documentation

‘matches(x, ignore.case = TRUE)’: selects all variables whose
name matches the regular expression ‘x’

Though contains work with a single string

mtcars %>% 
       select(contains('m'))

How to select columns if column names contain digits?

dplyr comes with a set of helper functions to match column names in select.

You want:

data %>% 
  select(matches("[[:digit:]]"))

The issue here is that str_detect() returns a vector of booleans but select() expects column names.

Select Multiple Columns with Dplyr::Select() with Numbers as Names