Select Multiple Columns with Dplyr::Select() with Numbers as Names

Select multiple columns with dplyr::select() with numbers as names

Column names starting with a number, such as "1" and "8" in your data, are not syntactically valid names (see ?make.names). Then see the 'Names and Identifiers' section in ?Quotes: "other [syntactically invalid] names can be used provided they are quoted. The preferred quote is the backtick".

Thus, wrap the invalid column names in backticks (`):

dd %>% dplyr::select(a:f, `1`:`8`)

# a a2 b b2 f 1 4 8
# 1 0.2510023 0.4109819 0.6787226 0.4974859 0.01828614 0.7449878 0.1648462 0.5875638

Another option is to use the SE-version of select, select_:

dd %>% dplyr::select_(.dots = c("a", "a2", ..., "1", "4", "8"))

Using a variable to select multiple columns in select (dplyr)

If it is possible you can take two separate inputs from the user. We can then generate a sequence of column numbers between them using match.

For example using base R, with mtcars dataset

col1 <- 'mpg'
col2 <- 'am'
mtcars[match(col1, names(mtcars)) : match(col2, names(mtcars))]

# mpg cyl disp hp drat wt qsec vs am
#Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1
#Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1
#Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1
#Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0
#...

If you can't have separate input you could split the combined input on ":".

col <- 'mpg:am'
col <- strsplit(col, ":")[[1]]

and now you can use col[1] as col1 and col[2] as col2 in above method.

mtcars[match(col[1], names(mtcars)) : match(col[2], names(mtcars))]

Selecting with dplyr by variable name, some column names are numbers

Found out I can just use one_of:

select(df, one_of(vars))
A 2015
a 1
b 1
c 1
d 1
e 1

But probably @akrun is right that I should avoid numeric column names.

Select columns from dataframe start with number

Taking into account that your variables supposed to starting with numbers will be converted to variable names starting with X, you could do:

library(tidyverse)
df %>%
select(matches("^X[0-9]"))

which gives:

   X1..A X2..B X3..C X4..D
1
2 D A G
3 G
4 NA G
5 A G
6 D A G
7 A G
8 A G
9 D
10

With the same logic you can do your counts:

df %>% 
summarize(across(c(matches("^X[0-9]"), Concatenate), ~ sum(!is.na(.) & . != "" & . != "NA")))

which gives

  X1..A X2..B X3..C X4..D Concatenate
1 3 5 0 7 8

Although I'm not sure if you want to exclude the "NAG" value in the Concatenate column.

dplyr select column when column name is number

Use backticks to select columns with their names being number

data(ChickWeight)
library(dplyr)
library(tidyr)
chick <- ChickWeight %>% spread(Time,weight) %>% filter(Diet==2) %>% select(`0`)

R dplyr how to select variables by column number rather than column name with summarise

Making use of the .data pronoun from rlang you could write a custom function which takes a dataframe, the names of two variables and some additional grouping variables and computes your desired summary table like so:

library(dplyr)
library(Hmisc)

summary_table <- function(.data, x, y, ...) {
.data %>%
group_by(...) %>% # Group species
summarise(n = n(), # number of records
WtMn = wtd.mean(.data[[x]], .data[[y]]), # weighted mean
WtSd = sqrt(wtd.var(.data[[x]], .data[[y]])), # weighted SD
WtCV = WtMn/WtSd, # weighted CV
Minm = min(.data[[x]]), # minumum
Wp05 = wtd.quantile(.data[[x]], .data[[y]] , 0.05), # p05
Wp50 = wtd.quantile(.data[[x]], .data[[y]] , 0.50), # p50
Wp95 = wtd.quantile(.data[[x]], .data[[y]] , 0.95), # p95
Wp975 = wtd.quantile(.data[[x]], .data[[y]] , 0.975), # p975
Wp99 = wtd.quantile(.data[[x]], .data[[y]] , 0.99), # p99
Maxm = max(.data[[x]]) # maximum
)
}

summary_table(iris, "Sepal.Length", "Petal.Width", Species)
#> # A tibble: 3 x 12
#> Species n WtMn WtSd WtCV Minm Wp05 Wp50 Wp95 Wp975 Wp99 Maxm
#> <fct> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 setosa 50 5.05 0.356 14.2 4.3 4.61 5.06 5.62 5.70 5.72 5.8
#> 2 versicolor 50 5.98 0.508 11.8 4.9 5.13 6 6.80 6.97 7 7
#> 3 virginica 50 6.61 0.626 10.6 4.9 5.8 6.5 7.7 7.7 7.9 7.9

summary_table(iris, "Sepal.Width", "Petal.Width", Species)
#> # A tibble: 3 x 12
#> Species n WtMn WtSd WtCV Minm Wp05 Wp50 Wp95 Wp975 Wp99 Maxm
#> <fct> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 setosa 50 3.47 0.399 8.69 2.3 3.06 3.46 4.27 4.4 4.4 4.4
#> 2 versicolor 50 2.80 0.310 9.04 2 2.3 2.86 3.20 3.37 3.4 3.4
#> 3 virginica 50 3.00 0.320 9.38 2.2 2.5 3 3.6 3.8 3.8 3.8

Averaging selected columns with head number name

You can do :

library(dplyr)

df %>%
type.convert(as.is = TRUE) %>%
mutate(Column1 = rowMeans(select(., `1`:`2`), na.rm = TRUE),
Column2 = rowMeans(select(., `34`:`158`), na.rm = TRUE)) %>%
select(A, Column1, Column2, Column3 = `190`)

# A Column1 Column2 Column3
#1 A 1.9 3.40 22.1
#2 B 6.8 1.65 7.4
#3 C 4.7 23.85 56.0

select columns based on multiple strings with dplyr contains()

You can use matches

 mtcars %>%
select(matches('m|ar')) %>%
head(2)
# mpg am gear carb
#Mazda RX4 21 1 4 4
#Mazda RX4 Wag 21 1 4 4

According to the ?select documentation

‘matches(x, ignore.case = TRUE)’: selects all variables whose
name matches the regular expression ‘x’

Though contains work with a single string

mtcars %>% 
select(contains('m'))

How to select columns if column names contain digits?

dplyr comes with a set of helper functions to match column names in select.

You want:

data %>% 
select(matches("[[:digit:]]"))

The issue here is that str_detect() returns a vector of booleans but select() expects column names.



Related Topics



Leave a reply



Submit