Why is it not advisable to use attach() in R, and what should I use instead?
When to use it:
I use attach()
when I want the environment you get in most stats packages (eg Stata, SPSS) of working with one rectangular dataset at a time.
When not to use it:
However, it gets very messy and code quickly becomes unreadable when you have several different datasets, particularly if you are in effect using R as a crude relational database, where different rectangles of data, all relevant to the problem at hand and perhaps being used in various ways of matching data from the different rectangles, have variables with the same name.
The with()
function, or the data=
argument to many functions, are excellent alternatives to many instances where attach()
is tempting.
Trouble using attach() in R
Don't use attach()
. Ever. Forget it exists.
glm()
has a data
argument. Using that proves much less stressful.
glm(O.ring ~ Temp + Pressure, family = binomial(logit), data = data)
If you want to know why attach()
is not advisable, see Why is it not advisable to use attach() in R and what should I use instead?
Should attach be avoided in this situation?
I think you'd be better off using do.call
. do.call
will accept a list and convert them to arguments.
myfun <- function(x1, x2, x3){
x1 + x2 + x3
}
xlist <- list(x1 = 1, x2= 2, x3 = 3)
do.call(myfun, xlist)
This has the benefit of being explicit about what the arguments are, which makes it much easier to reason with the code, maintain it, and debug it.
The place where this gets tricky is if xlist
has more values in it than just those required by the function. For example, the following throws an error:
xlist <- list(x1 = 1, x2 = 2, x3 = 3, x4 = 4)
do.call(myfun, xlist)
You can circumvent this by matching arguments with the formals
do.call(myfun, xlist[names(xlist) %in% names(formals(myfun))])
It's still a bit of typing, but if you're talking about 10+ arguments, it's still a lot easier than xlist$x1, xlist$x2, xlist$x3
, etc.
LAP gives a useful solution as well, but would be better used to have with
outside the call.
with(xlist, myfun(x1, x2, x3))
Do you use attach() or call variables by name or slicing?
I never use attach. with
and within
are your friends.
Example code:
> N <- 3
> df <- data.frame(x1=rnorm(N),x2=runif(N))
> df$y <- with(df,{
x1+x2
})
> df
x1 x2 y
1 -0.8943125 0.24298534 -0.6513271
2 -0.9384312 0.01460008 -0.9238312
3 -0.7159518 0.34618060 -0.3697712
>
> df <- within(df,{
x1.sq <- x1^2
x2.sq <- x2^2
y <- x1.sq+x2.sq
x1 <- x2 <- NULL
})
> df
y x2.sq x1.sq
1 0.8588367 0.0590418774 0.7997948
2 0.8808663 0.0002131623 0.8806532
3 0.6324280 0.1198410071 0.5125870
Edit: hadley mentions transform in the comments. here is some code:
> transform(df, xtot=x1.sq+x2.sq, y=NULL)
x2.sq x1.sq xtot
1 0.41557079 0.021393571 0.43696436
2 0.57716487 0.266325959 0.84349083
3 0.04935442 0.004226069 0.05358049
Why is it not advisable to use attach() in R, and what should I use instead?
When to use it:
I use attach()
when I want the environment you get in most stats packages (eg Stata, SPSS) of working with one rectangular dataset at a time.
When not to use it:
However, it gets very messy and code quickly becomes unreadable when you have several different datasets, particularly if you are in effect using R as a crude relational database, where different rectangles of data, all relevant to the problem at hand and perhaps being used in various ways of matching data from the different rectangles, have variables with the same name.
The with()
function, or the data=
argument to many functions, are excellent alternatives to many instances where attach()
is tempting.
When to use 'with' function and why is it good?
with
is a wrapper for functions with no data
argument
There are many functions that work on data frames and take a data
argument so that you don't need to retype the name of the data frame for every time you reference a column. lm
, plot.formula
, subset
, transform
are just a few examples.
with
is a general purpose wrapper to let you use any function as if it had a data argument.
Using the mtcars
data set, we could fit a model with or without using the data argument:
# this is obviously annoying
mod = lm(mtcars$mpg ~ mtcars$cyl + mtcars$disp + mtcars$wt)
# this is nicer
mod = lm(mpg ~ cyl + disp + wt, data = mtcars)
However, if (for some strange reason) we wanted to find the mean
of cyl + disp + wt
, there is a problem because mean
doesn't have a data argument like lm
does. This is the issue that with
addresses:
# without with(), we would be stuck here:
z = mean(mtcars$cyl + mtcars$disp + mtcars$wt)
# using with(), we can clean this up:
z = with(mtcars, mean(cyl + disp + wt))
Wrapping foo()
in with(data, foo(...))
lets us use any function foo
as if it had a data
argument - which is to say we can use unquoted column names, preventing repetitive data_name$column_name
or data_name[, "column_name"]
.
When to use with
Use with
whenever you like interactively (R console) and in R scripts to save typing and make your code clearer. The more frequently you would need to re-type your data frame name for a single command (and the longer your data frame name is!), the greater the benefit of using with
.
Also note that with
isn't limited to data frames. From ?with
:
For the default
with
method this may be an environment, a list, a data frame, or an integer as insys.call
.
I don't often work with environments, but when I do I find with
very handy.
When you need pieces of a result for one line only
As @Rich Scriven suggests in comments, with
can be very useful when you need to use the results of something like rle
. If you only need the results once, then his example with(rle(data), lengths[values > 1])
lets you use the rle(data)
results anonymously.
When to avoid with
When there is a data
argument
Many functions that have a data
argument use it for more than just easier syntax when you call it. Most modeling functions (like lm
), and many others too (ggplot
!) do a lot with the provided data
. If you use with
instead of a data
argument, you'll limit the features available to you. If there is a data
argument, use the data
argument, not with
.
Adding to the environment
In my example above, the result was assigned to the global environment (bar = with(...)
). To make an assignment inside the list/environment/data, you can use within
. (In the case of data.frames
, transform
is also good.)
In packages
Don't use with
in R packages. There is a warning in help(subset)
that could apply just about as well to with
:
Warning This is a convenience function intended for use interactively. For programming it is better to use the standard subsetting functions like
[
, and in particular the non-standard evaluation of argument subset can have unanticipated consequences.
If you build an R package using with
, when you check it you will probably get warnings or notes about using variables without a visible binding. This will make the package unacceptable by CRAN.
Alternatives to with
Don't use attach
Many (mostly dated) R tutorials use attach
to avoid re-typing data frame names by making columns accessible to the global environment. attach
is widely considered to be bad practice and should be avoided. One of the main dangers of attach is that data columns can become out of sync if they are modified individually. with
avoids this pitfall because it is invoked one expression at a time. There are many, many questions on Stack Overflow where new users are following an old tutorial and run in to problems because of attach
. The easy solution is always don't use attach
.
Using with
all the time seems too repetitive
If you are doing many steps of data manipulation, you may find yourself beginning every line of code with with(my_data, ...
. You might think this repetition is almost as bad as not using with
. Both the data.table
and dplyr
packages offer efficient data manipulation with non-repetitive syntax. I'd encourage you to learn to use one of them. Both have excellent documentation.
selecting and handling a row by looking its name in data [R]
climate_change[climate_change$`Country Name` == 'Turkey', ]
or
subset(climate_change, `Country Name` == 'Turkey')
or
climate_change[grep('Turkey', climate_change$`Country Name`), ]
gives:
# Country Name x
# 1 Turkey 1
# 2 Turkey 2
Notes
- Beware of
attach()
! There are better alternatives - Avoid spaces in variable names to avoid using quotes or backticks after the
$
. You may easily clean your names by doingnames(climate_change <- make.names(names(climate_change))
Data:
climate_change <- structure(list(`Country Name` = c("Turkey", "Turkey", "Greece",
"Greece", "Tuvalu", "Tuvalu"), x = c(1L, 2L, 1L, 2L, 1L, 2L)), class = "data.frame", row.names = c(NA,
-6L))
R // Recognizing variables in a data frame
Don't use attach! Never ever! You will smash all your dataframes as you have several, so the chances of rewriting things are quite high.
What I'd try to do is the following:
eleven$stage[eleven$locprim < 9 && stadpt == 6 && eleven$stadpn == 0 && eleven$stadpm == 0] <-0
So do it like you say you know, by writing eleven$ before every variable. You can see more deeply why: here , here , here and here
Hope it clarifies a little bit! :)
Related Topics
R Reshape Data Frame from Long to Wide Format
How to Plot All the Columns of a Data Frame in R
Create Counter Within Consecutive Runs of Certain Values
Fitting a Linear Model With Multiple Lhs
Generate Multiple Graphics from Within an R Function
Changing Column Names of a Data Frame
Subset/Filter Rows in a Data Frame Based on a Condition in a Column
Plot Multiple Boxplot in One Graph
Reorder Levels of a Factor Without Changing Order of Values
Generate List of All Possible Combinations of Elements of Vector
Add a Common Legend For Combined Ggplots
Find Indices of Duplicated Rows
Create Stacked Barplot Where Each Stack Is Scaled to Sum to 100%