Dplyr: Nonstandard Column Names (White Space, Punctuation, Starts With Numbers)

dplyr: nonstandard column names (white space, punctuation, starts with numbers)

You may select the variable by using backticks `.

select(df, `a a`)
# a a
# 1 1
# 2 2
# 3 3

However, if your main objective is to rename the column, you may use rename in plyr package, in which you can use both "" and ``.

rename(df, replace = c("a a" = "a"))
rename(df, replace = c(`a a` = "a"))

Or in base R:

names(df)[names(df) == "a a"] <- "a"

For a more thorough description on the use of various quotes, see ?Quotes. The 'Names and Identifiers' section is especially relevant here:

other [syntactically invalid] names can be used provided they are quoted. The preferred quote is the backtick".

See also ?make.names about valid names.

See also this post about renaming in dplyr

Select columns with spaced heading in R

We can use backquotes to select those unusual names i.e. column names that doesn't start with letters

subset(df, select = c(height, `80% height`))

-output

#   height 80% height
#1 1020 816.0
#2 2053 1642.4
#3 1840 1472.0
#4 3301 2640.8
#5 2094 1675.2

Also, the dplyr use with specifying df twice is not needed. We can have select function from dplyr

library(dplyr)
df %>%
select(height, `80% height`)

-output

#   height 80% height
#1 1020 816.0
#2 2053 1642.4
#3 1840 1472.0
#4 3301 2640.8
#5 2094 1675.2

It may be also better to remove spaces and append a letter for those column names that start with numbers. clean_names from janitor does

library(janitor)
df %>%
clean_names()

Dealing with spaces and weird characters in column names with dplyr::rename()

To refer to variables that contain non-standard characters or start with a number, wrap the name in back ticks, e.g., `Instruction..Mode!`

R dplyr filter column with column name that starts with number

You can use backticks to refer to variables with non-standard names. This works whether they are columns of a data frame or not.

For this specific case

df %>% dplyr::filter(`1a`)  # note that == TRUE is never needed

Or generally,

`2b` = 1:5
mean(`2b`)
# [1] 3

Of course you shouldn't make a bad habit of this - use standard names whenever possible.


As mentioned in comments, the ?Quotes documentation is helpful. It states (in the Names and Identifiers section):

Almost always, other names can be used provided they are quoted. The preferred quote is the backtick (`), and deparse will normally use it, but under many circumstances single or double quotes can be used (as a character constant will often be converted to a name). One place where backticks may be essential is to delimit variable names in formulae: see formula.

use dplyr to combine columns of data.frame when column names are not known

With a little trial and error:

colNames_as_symbols <- syms(names(myTibble))
transmute(myTibble, concat = paste(!!!colNames_as_symbols, sep = '.'))

Here was the hint that put me on to the solution... From the documentation for !!!:

The big-bang operator !!! forces-splice a list of objects. The
elements of the list are spliced in place, meaning that they each
become one single argument.

vars <- syms(c("height", "mass"))

Force-splicing is equivalent to supplying the elements separately:

starwars %>% select(!!!vars)
starwars %>% select(height, mass)

In fact, the entire documentation entitled "Force parts of an expression" is fascinating reading. It can be accessed by issuing ?qq_show

Renaming dataframe column names which contain a space

You can use the dplyr function rename_with() to rename all columns that match a certain condition (in this case that it contains a space). In this example I replace the space in the column name with an underscore:

library(dplyr)

df <- data.frame(a = 1:2,
b = LETTERS[1:2],
c = 101:102)
names(df) <- c("a", "b b", "c e f")

df %>%
rename_with(~ gsub(" ","_", .x), contains(" "))

Replace underscore with white space in column names of datatable

You can use str_replace from stringr

names(f) <- stringr::str_replace(names(f), "_", " ")

compute sum for space string column

Does this work?

df %>% group_by(`a 1`) %>% summarise(tx = sum(`t t`))


Related Topics



Leave a reply



Submit