Writing functions in R, keeping scoping in mind
If I know that I'm going to need a function parametrized by some values and called repeatedly, I avoid globals by using a closure:
make.fn2 <- function(a, b) {
fn2 <- function(x) {
return( x + a + b )
}
return( fn2 )
}
a <- 2; b <- 3
fn2.1 <- make.fn2(a, b)
fn2.1(3) # 8
fn2.1(4) # 9
a <- 4
fn2.2 <- make.fn2(a, b)
fn2.2(3) # 10
fn2.1(3) # 8
This neatly avoids referencing global variables, instead using the enclosing environment of the function for a and b. Modification of globals a and b doesn't lead to unintended side effects when fn2 instances are called.
Passing arguments to functions, and variable scopes in R
You would generally read your data outside of any function, like so:
outcome.data <- read.csv("outcome-of-care-measures.csv", colClasses = "character")
Otherwise, since a function has its own namespace
, all the variables defined inside of it will vanish upon its return, unless they themselves are returned by the function with return(...)
. Several objects can be returned by putting them in a list: return(list(item1=var1, item2=var2))
.
Some functions, such as assign
, have the envir
parameter that can be set to .GlobalEnv
to change this behavior. Altering an object can also be done inside a function using the <<-
operator instead of <-
, although this practice is generally recommended against.
As a side note, when using a function, you need to define clearly:
- What are its inputs
- What does it do
- What does it return
It's not useful, for instance, to use outcome
as a function parameter and then read into a variable named income
the content of a csv file. Your argument is then useless as it will be written over. That's why you had to comment out the line defining your state
variable inside the function to actually be able to use state
as it was received by the function.
This surely won't answer all your questions, but hopefully it can help you clarify certain things. For the rest there are plenty of good tutorials to learn further on how to program in R and how/when to use functions. Best of luck and happy learning!
Making a Variable Constant in a Function in R
A possible solution is to define your function within another function:
g <- function( index ){
function( x ) x + index
}
index <- 3
f <- g( index )
f(4)
index<-20
f(4)
Now the output of g( index )
is a function which is defined within the (execution) environment of g
. This function (f
) will look at the value of index
in this environment, where it is fixed to 3. That's why it works, but maybe there is a simpler solution.
Is it possible to make functions recognize variables in scopes above them?
Maybe
printx <- function() {
x <- 1
printy()
return(x)
}
printy <- function() {
print(get('x',envir=parent.frame()))
}
> x<-0
> printy()
[1] 0
> printx()
[1] 1
[1] 1
This would use the x
to be printed by printy
which was associated with the environment the function was called in.
One other possibility would be to create a new environment
e1<-new.env(parent = baseenv())
> assign('x',12,envir=e1)
> x
[1] 0
> get('x',e1)
[1] 12
writing functions vs. line-by-line interpretation in an R workflow
I don't think there is a single answer. The best thing to do is keep the relative merits in mind and then pick an approach for that situation.
1) functions. The advantage of not using functions is that all your variables are left in the workspace and you can examine them at the end. That may help you figure out what is going on if you have problems.
On the other hand, the advantage of well designed functions is that you can unit test them. That is you can test them apart from the rest of the code making them easier to test. Also when you use a function, modulo certain lower level constructs, you know that the results of one function won't affect the others unless they are passed out and this may limit the damage that one function's erroneous processing can do to another's. You can use the debug
facility in R to debug your functions and being able to single step through them is an advantage.
2) LCFD. Regarding whether you should use a decomposition of load/clean/func/do regardless of whether its done via source
or functions is a second question. The problem with this decomposition regardless of whether its done via source
or functions is that you need to run one just to be able to test out the next so you can't really test them independently. From that viewpoint its not the ideal structure.
On the other hand, it does have the advantage that you may be able to replace the load step independently of the other steps if you want to try it on different data and can replace the other steps independently of the load and clean steps if you want to try different processing.
3) No. of Files There may be a third question implicit in what you are asking whether everything should be in one or multiple source files. The advantage of putting things in different source files is that you don't have to look at irrelevant items. In particular if you have routines that are not being used or not relevant to the current function you are looking at they won't interrupt the flow since you can arrange that they are in other files.
On the other hand, there may be an advantage in putting everything in one file from the viewpoint of (a) deployment, i.e. you can just send someone that single file, and (b) editing convenience as you can put the entire program in a single editor session which, for example, facilitates searching since you can search the entire program using the editor's functions as you don't have to determine which file a routine is in. Also successive undo commands will allow you to move backward across all units of your program and a single save will save the current state of all modules since there is only one. (c) speed, i.e. if you are working over a slow network it may be faster to keep a single file in your local machine and then just write it out occasionally rather than having to go back and forth to the slow remote.
Note: One other thing to think about is that using packages may be superior for your needs relative to sourcing files in the first place.
Scoping: Local vs Var
They are very similar, but not exactly the same. Both only exist inside of a function but they work slightly differently.
The var
version works it way through all the default variable scopes. See http://help.adobe.com/en_US/ColdFusion/9.0/Developing/WSc3ff6d0ea77859461172e0811cbec09af4-7fdf.html
Local will match only a variable in a local scope. Consider the following
<cffunction name="himom">
<cfoutput>
<p><b>try 0:</b> #request_method#</p>
<!--- you might think that the variable does not exist,
but it does because it came from cgi scope --->
</cfoutput>
<cfquery name="myData" datasource="Scorecard3">
SELECT 'This is via query' AS request_method
</cfquery>
<!--- Data is now being loaded into a query --->
<cfoutput query="myData">
<p><b>try 1:</b> #request_method#</p>
</cfoutput>
<!--- This one now came from the query --->
<cfset var request_method = "This is Var">
<!--- We now declare via var --->
<cfoutput query="myData">
<p><b>try 2:</b> #request_method#</p>
</cfoutput>
<!--- the query version disappears and now
the var version takes precedence --->
<cfset local.request_method = "This is local">
<!--- now we declare via local --->
<cfoutput query="myData">
<p><b>try 3:</b> #request_method#</p>
</cfoutput>
<!--- The local method takes precedence --->
<cfoutput>
<p><b>try 4:</b> #request_method#</p>
<!--- in fact it even takes precedence over the var --->
<p><b>try 5:</b> #local.request_method#</p>
<!--- there is no question where this comes from --->
</cfoutput>
</cffunction>
<cfset himom()>
Results of the above
try 0: GET
try 1: This is via query
try 2: This is Var
try 3: This is local
try 4: This is local
try 5: This is local
In summary
When developing, you could use either to make sure that variables only exist inside of a function, but always prefixing your variables with local
goes a long way in making sure that your code is clearly understood
How does local function myFunc() works in lua?
In Lua, when you write:
local function myFunc()
--...
end
It is essentially the same thing as:
local myFunc = function()
--...
end
In the same manner, the following:
function myFunc()
--...
end
Is the same as:
myFunc = function()
--...
end
It's simply a shortcut for variable declaration. That's because in Lua, functions are first class objects, there is no special place where declared functions are stored, they are held in variables the same as any other data type.
Caveat
It's worth noting that there is a very small difference in behavior when using local function myFunc()
instead of local myFunc = function()
.
When you declare the function using the former syntax, code inside the function has access to the variable myFunc
, so the function can refer to itself. With the latter syntax, accessing myFunc
inside of myFunc will return nil - it's not in scope.
So that means the following code:
local function myFunc()
--...
end
Is actually more accurately represented as:
local myFunc
myFunc = function()
--..
end
This is a small difference, but may be worth keeping in mind e.g. if you need to write a recursive function.
Related Topics
How to Plot the Results of a Mixed Model
Xgboost in R: How Does Xgb.Cv Pass the Optimal Parameters into Xgb.Train
R: Sourcing Files Using a Relative Path
Increase Legend Font Size Ggplot2
How to Use the Box-Cox Power Transformation in R
Lapply Function /Loops on List of Lists R
Warning: Non-Integer #Successes in a Binomial Glm! (Survey Packages)
Dplyr::Select One Column and Output as Vector
Hyperlinking Text in a Ggplot2 Visualization
R Cmd Check Note: Found No Calls To: 'R_Registerroutines', 'R_Usedynamicsymbols'
Writings Functions (Procedures) for Data.Table Objects
Remove All Variables Except Functions
Simple Manual Rmarkdown Tables That Look Good in HTML, PDF and Docx
Merge Two Dataframes If Timestamp of X Is Within Time Interval of Y