Variable Name Restrictions in R

Variable name restrictions in R

You might be looking for the discussion from ?make.names:

A syntactically valid name consists of letters, numbers and the dot or
underline characters and starts with a letter or the dot not followed
by a number. Names such as ".2way" are not valid, and neither are the
reserved words.

In the help file itself, there's a link to a list of reserved words, which are:

if else repeat while function for in next break

TRUE FALSE NULL Inf NaN NA NA_integer_ NA_real_ NA_complex_
NA_character_

Many other good notes from the comments include the point by James to the R FAQ addressing this issue and Josh's pointer to a related SO question dealing with checking for syntactically valid names.

Importing txt file in R --- Why does # symbol in variable names cause problems?

# symbol is used for comments by default, so change that value to something else and as @JonSpring mentioned having # symbol in column name is not syntactically valid. So to allow that use check.names = FALSE.

read.table(text = "|VAR1  |VAR2# |
|12    |F     |
|56    |B     |
|18    |A     |", header = TRUE, fill = FALSE, sep = '|', 
strip.white = TRUE, comment.char = "@", check.names = FALSE)[, 2:3]

#  VAR1 VAR2#
#1   12     F
#2   56     B
#3   18     A

Convert string to variable name in R

Firstly, it's a backtick (`), not an apostrophe ('). In R, backticks occasionally denote variable names; apostrophes work as single quotes for denoting strings.

The issue you're having is that your variables start with a number, which is not allowed in R. Since you somehow made it happen anyway, you need to use backticks to tell R not to interpret 2011_Q4 as a number, but as a variable.

From ?Quotes:

Names and Identifiers

Identifiers consist of a sequence of letters, digits, the period (.)
and the underscore. They must not start with a digit nor underscore,
nor with a period followed by a digit. Reserved words are not valid
identifiers.

The definition of a letter depends on the current locale, but only
ASCII digits are considered to be digits.

Such identifiers are also known as syntactic names and may be used
directly in R code. Almost always, other names can be used provided
they are quoted. The preferred quote is the backtick (`), and deparse
will normally use it, but under many circumstances single or double
quotes can be used (as a character constant will often be converted to
a name). One place where backticks may be essential is to delimit
variable names in formulae: see formula.

The best solution to your issue is simply to change your variable names to something that starts with a character, e.g. Y2011_Q4.

How long is too long for a variable name?

You are correct in naming variables as short as you can while retaining enough meaning to be able to describe what the variable does by just looking at its name.

No, the length of a variable name has absolutely nothing to do with performance.

Edit:

For some reason I thought you were talking about C++. If you are (or C or Delphi or another compiled language) the above is correct (barring debug information which won't appear in a release executable).

For dynamic languages such as Lua or Python or Ruby, the length of a variable name could very well affect runtime performance depending on how variable name lookups are performed. If variable names are hashed and then the hash is used to index a table of values to get the value of the variable, then natrually the more data the hash function has to process, the longer it will take.

That said, do not sacrifice meaningful variable names for short ones just because you think they'll be faster. The speed increase will usually be extremely negligible, and definitely not worth sacrificing the maintainability of your program for.

Is there maximum number of characters permissible in rownames or colnames in R?

Row and column names are attributes of a data frame or matrix object. As such, they are only limited by the system resources available to R.

x <- data.frame(col = 0)
object.size(x)
# 680 bytes

# Huge name for a column
colnames(x)=paste(rep("x",10^8),collapse="")
object.size(x)
# 100000680 bytes

R: Referring to a variable name with special characters

I've fixed it now by calling as.name the Shiny input$ variable. For the example above it would look like this.

 server <- function(input, output){

 output$chemPlot <- renderPlot({
  plot.data <- ggplot(data = dat)
    point <- plot.data + geom_point(
    aes_string(x = as.name(input$varX), y = as.name(input$varY1)))
  plot(point)

This appears to work now as intended. Thank you aocall for your efforts.

Why are names(x) -y and names - (x,y) not equivalent?

Short answer: names(x)<-y is actually sugar for x<-"names<-"(x,y) and not just "names<-"(x,y). See the the R-lang manual, pages 18-19 (pages 23-24 of the PDF), which comes to basically the same example.

For example, names(x) <- c("a","b") is equivalent to:
`*tmp*`<-x
x <- "names<-"(`*tmp*`, value=c("a","b"))
rm(`*tmp*`)

If more familiar with getter/setter, one can think that if somefunction is a getter function, somefunction<- is the corresponding setter. In R, where each object is immutable, it's more correct to call the setter a replacement function, because the function actually creates a new object identical to the old one, but with an attribute added/modified/removed and replaces with this new object the old one.

In the case example for instance, the names attribute are not just added to x; rather a new object with the same values of x but with the names is created and linked to the x symbol.

Since there are still some doubts about why the issue is discussed in the language doc instead directly on ?names, here is a small recap of this property of the R language.

You can define a function with the name you wish (there are some restrictions of course) and the name does not impact in any way if the function is called "normally".
However, if you name a function with the <- suffix, it becomes a replacement function and allows the parser to apply the function with the mechanism described at the beginning of this answer if called by the syntax foo(x)<-value. See here that you don't call explicitely foo<-, but with a slightly different syntax you obtain an object replacement (since the name).
Although there are not formal restrictions, it's common to define getter/setter in R with the same name (for instance names and names<-). In this case, the <- suffix function is the replacement function of the corresponding version without suffix.
As stated at the beginning, this behaviour is general and a property of the language, so it doesn't need to be discussed in any replacement function doc.

Check if character value is a valid R object name

Edited 2013-1-9 to fix regular expression. Previous regular expression, lifted from page 456 of John Chambers' "Software for Data Analysis", was (subtly) incomplete. (h.t. Hadley Wickham)

There are a couple of issues here. A simple regular expression can be used to identify all syntactically valid names --- but some of those names (like if and while) are 'reserved', and cannot be assigned to.

Identifying syntactically valid names:
?make.names explains that a syntactically valid name:
[...] consists of letters, numbers and the
dot or underline characters and starts with a letter or the dot
not followed by a number. Names such as '".2way"' are not valid [...]
Here is the corresponding regular expression:
```
 "^((([[:alpha:]]|[.][._[:alpha:]])[._[:alnum:]]*)|[.])$"
```
Identifying unreserved syntactically valid names
To identify unreserved names, you can take advantage of the base function make.names(), which constructs syntactically valid names from arbitrary character strings.
```
isValidAndUnreserved <- function(string) {
    make.names(string) == string
}

isValidAndUnreserved(".jjj")
# [1] TRUE
isValidAndUnreserved(" jjj")
# [1] FALSE
```

Putting it all together

isValidName <- function(string) {
    grepl("^((([[:alpha:]]|[.][._[:alpha:]])[._[:alnum:]]*)|[.])$", string)
}

isValidAndUnreservedName <- function(string) {
    make.names(string) == string
}

testValidity <- function(string) {
    valid <- isValidName(string)
    unreserved <- isValidAndUnreservedName(string)
    reserved <- (valid & ! unreserved)
    list("Valid"=valid,
         "Unreserved"=unreserved,
         "Reserved"=reserved)
}

testNames <- c("mean", ".j_j", ".", "...", "if", "while", "TRUE", "NULL",
               "_jj", "  j", ".2way") 
t(sapply(testNames, testValidity))

      Valid Unreserved Reserved
mean  TRUE  TRUE       FALSE   
.j_j  TRUE  TRUE       FALSE
.     TRUE  TRUE       FALSE     
...   TRUE  TRUE       FALSE   
if    TRUE  FALSE      TRUE    
while TRUE  FALSE      TRUE    
TRUE  TRUE  FALSE      TRUE    
NULL  TRUE  FALSE      TRUE    
_jj   FALSE FALSE      FALSE   
  j   FALSE FALSE      FALSE   # Note: these tests are for "  j", not "j"
.2way FALSE FALSE      FALSE

For more discussion of these issues, see the r-devel thread linked to by @Hadley in the comments below.

Replace all underscores in feature names with a space

What about:

example_df %>% select_all(funs(gsub("_", " ", .)))

Output:

  a nice day quick brown fox blah ha ha
1          1               A          4
2          2               B          5
3          3               C          6

You could also use rename, however in this case you'd need to call it in a different way:

example_df %>% rename_all(function(x) gsub("_", " ", x))

Or simply:

example_df %>% rename_all(~ gsub("_", " ", .))

CMD Variable name restrictions?

: is the string manipulation special character for variable expansion. Example:

%var:~0,1%

Therefore, if anything follows : in the variable name, it will try to perform string manipulation and fail. This allows for the : colon character by itself or when nothing trails it.

Rule Regarding Expanding Variable Names: Variable names must not contain : followed by any characters otherwise, the variable expansion will fail.

See set /?

set :)=123
set a)=123
set :a=123
set :=123
set )=123
echo %:)%
echo %a)%
echo %:a%
echo %:%
echo %)%

Output:

%:)%
123
%:a%
123
123

Variable Name Restrictions in R