Select multiple columns with dplyr::select() with numbers as names
Column names starting with a number, such as "1" and "8" in your data, are not syntactically valid names (see ?make.names
). Then see the 'Names and Identifiers' section in ?Quotes
: "other [syntactically invalid] names can be used provided they are quoted. The preferred quote is the backtick".
Thus, wrap the invalid column names in backticks (`
):
dd %>% dplyr::select(a:f, `1`:`8`)
# a a2 b b2 f 1 4 8
# 1 0.2510023 0.4109819 0.6787226 0.4974859 0.01828614 0.7449878 0.1648462 0.5875638
Another option is to use the SE-version of select
, select_
:
dd %>% dplyr::select_(.dots = c("a", "a2", ..., "1", "4", "8"))
Using a variable to select multiple columns in select (dplyr)
If it is possible you can take two separate inputs from the user. We can then generate a sequence of column numbers between them using match
.
For example using base R, with mtcars
dataset
col1 <- 'mpg'
col2 <- 'am'
mtcars[match(col1, names(mtcars)) : match(col2, names(mtcars))]
# mpg cyl disp hp drat wt qsec vs am
#Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1
#Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1
#Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1
#Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0
#...
If you can't have separate input you could split the combined input on ":"
.
col <- 'mpg:am'
col <- strsplit(col, ":")[[1]]
and now you can use col[1]
as col1
and col[2]
as col2
in above method.
mtcars[match(col[1], names(mtcars)) : match(col[2], names(mtcars))]
Selecting with dplyr by variable name, some column names are numbers
Found out I can just use one_of
:
select(df, one_of(vars))
A 2015
a 1
b 1
c 1
d 1
e 1
But probably @akrun is right that I should avoid numeric column names.
Select columns from dataframe start with number
Taking into account that your variables supposed to starting with numbers will be converted to variable names starting with X, you could do:
library(tidyverse)
df %>%
select(matches("^X[0-9]"))
which gives:
X1..A X2..B X3..C X4..D
1
2 D A G
3 G
4 NA G
5 A G
6 D A G
7 A G
8 A G
9 D
10
With the same logic you can do your counts:
df %>%
summarize(across(c(matches("^X[0-9]"), Concatenate), ~ sum(!is.na(.) & . != "" & . != "NA")))
which gives
X1..A X2..B X3..C X4..D Concatenate
1 3 5 0 7 8
Although I'm not sure if you want to exclude the "NAG" value in the Concatenate column.
dplyr select column when column name is number
Use backticks to select columns with their names being number
data(ChickWeight)
library(dplyr)
library(tidyr)
chick <- ChickWeight %>% spread(Time,weight) %>% filter(Diet==2) %>% select(`0`)
R dplyr how to select variables by column number rather than column name with summarise
Making use of the .data
pronoun from rlang
you could write a custom function which takes a dataframe, the names of two variables and some additional grouping variables and computes your desired summary table like so:
library(dplyr)
library(Hmisc)
summary_table <- function(.data, x, y, ...) {
.data %>%
group_by(...) %>% # Group species
summarise(n = n(), # number of records
WtMn = wtd.mean(.data[[x]], .data[[y]]), # weighted mean
WtSd = sqrt(wtd.var(.data[[x]], .data[[y]])), # weighted SD
WtCV = WtMn/WtSd, # weighted CV
Minm = min(.data[[x]]), # minumum
Wp05 = wtd.quantile(.data[[x]], .data[[y]] , 0.05), # p05
Wp50 = wtd.quantile(.data[[x]], .data[[y]] , 0.50), # p50
Wp95 = wtd.quantile(.data[[x]], .data[[y]] , 0.95), # p95
Wp975 = wtd.quantile(.data[[x]], .data[[y]] , 0.975), # p975
Wp99 = wtd.quantile(.data[[x]], .data[[y]] , 0.99), # p99
Maxm = max(.data[[x]]) # maximum
)
}
summary_table(iris, "Sepal.Length", "Petal.Width", Species)
#> # A tibble: 3 x 12
#> Species n WtMn WtSd WtCV Minm Wp05 Wp50 Wp95 Wp975 Wp99 Maxm
#> <fct> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 setosa 50 5.05 0.356 14.2 4.3 4.61 5.06 5.62 5.70 5.72 5.8
#> 2 versicolor 50 5.98 0.508 11.8 4.9 5.13 6 6.80 6.97 7 7
#> 3 virginica 50 6.61 0.626 10.6 4.9 5.8 6.5 7.7 7.7 7.9 7.9
summary_table(iris, "Sepal.Width", "Petal.Width", Species)
#> # A tibble: 3 x 12
#> Species n WtMn WtSd WtCV Minm Wp05 Wp50 Wp95 Wp975 Wp99 Maxm
#> <fct> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 setosa 50 3.47 0.399 8.69 2.3 3.06 3.46 4.27 4.4 4.4 4.4
#> 2 versicolor 50 2.80 0.310 9.04 2 2.3 2.86 3.20 3.37 3.4 3.4
#> 3 virginica 50 3.00 0.320 9.38 2.2 2.5 3 3.6 3.8 3.8 3.8
Averaging selected columns with head number name
You can do :
library(dplyr)
df %>%
type.convert(as.is = TRUE) %>%
mutate(Column1 = rowMeans(select(., `1`:`2`), na.rm = TRUE),
Column2 = rowMeans(select(., `34`:`158`), na.rm = TRUE)) %>%
select(A, Column1, Column2, Column3 = `190`)
# A Column1 Column2 Column3
#1 A 1.9 3.40 22.1
#2 B 6.8 1.65 7.4
#3 C 4.7 23.85 56.0
select columns based on multiple strings with dplyr contains()
You can use matches
mtcars %>%
select(matches('m|ar')) %>%
head(2)
# mpg am gear carb
#Mazda RX4 21 1 4 4
#Mazda RX4 Wag 21 1 4 4
According to the ?select
documentation
‘matches(x, ignore.case = TRUE)’: selects all variables whose
name matches the regular expression ‘x’
Though contains
work with a single string
mtcars %>%
select(contains('m'))
How to select columns if column names contain digits?
dplyr
comes with a set of helper functions to match column names in select.
You want:
data %>%
select(matches("[[:digit:]]"))
The issue here is that str_detect()
returns a vector of booleans but select()
expects column names.
Related Topics
Pass String as Name of Attached Data Column Name
Alpha Aesthetic Shows Arrow's Skeleton Instead of Plain Shape - How to Prevent It
Apply a Function to All Variables Starting with Specific Pattern in R
Outputing N Tables in Shiny, Where N Depends on the Data
Directlabels: Avoid Clipping (Like Xpd=True)
Calculating Standard Deviation Across Rows
How to Load Xlsx File Using Fread Function
Ubuntu 16.04 R Installation: Configure: Gdal-Config Not Found or Not Executable
Incorrect Number of Subscripts on Matrix in R
Tidyr Separate Only First N Instances
How to Plot X-Axis Labels and Bars Between Tick Marks in Ggplot2 Bar Plot
R Shiny Action Button and Data Table Output
Rcurl: Url.Exists Returns False When Url Does Exist
Curly Curly Tidy Evaluation and Modifying Inputs or Their Names
Cumulative Sum in a Window (Or Running Window Sum) Based on a Condition in R
Rmarkdown::Render() in a Loop - Cannot Allocate Vector of Size