How to Prevent Functions Polluting Global Namespace

How to prevent functions polluting global namespace?

Firstly, as @Spacedman has said, you'll be best served by a package but there are other options.

S3 Methods

R's original "object orientation" is known as S3. The majority of R's code base uses this particular paradigm. It is what makes plot() work for all kinds of objects. plot() is a generic function and the R Core Team and package developers can and have written their own methods for plot(). Strictly these methods might have names like plot.foo() where foo is a class of object for which the function defines a plot() method. The beauty of S3 is that you don't (hardly) ever need to know or call plot.foo() you just use plot(bar) and R works out which plot() method to dispatch to based on the class of object bar.

In your comments on your Question you mention that you have a function populate() that has methods (in effect) for classes "crossvalidate" and "prod" which you keep in separate .r files. The S3 way to set this up is to do:

populate <- function(x, ...) { ## add whatever args you want/need
    UseMethod("populate")
}

populate.crossvalidate <-
    function(x, y, z, ...) { ## add args but must those of generic
    ## function code here
}

populate.prod <-
    function(x, y, z, ...) { ## add args but must have those of generic
    ## function code here
}

The given some object bar with class "prod", calling

populate(bar)

will result in R calling populate() (the generic), it then looks for a function with name populate.prod because that is the class of bar. It finds our populate.prod() and so dispatches that function passing on to it the arguments we initially specified.

So you see that you only ever refer to the methods using the name of the generic, not the full function name. R works out for you what method needs to be called.

The two populate() methods can have very different arguments, with exception that strictly they should have the same arguments as the generic function. So in the example above, all methods should have arguments x and .... (There is an exception for methods that employ formula objects but we don't need to worry about that here.)

Package Namespaces

Since R 2.14.0, all R packages have had their own namespace, even if one were not provided by the package author, although namespaces have been around for a lot longer in R than that.

In your example, we wish to register the populate() generic and it's two methods with the S3 system. We also wish to export the generic function. Usually we don't want or need to export the individual methods. So, pop your functions into .R files in the R folder of the package sources and then in the top level of the package sources create a file named NAMESPACE and add the following statements:

export(populate) ## export generic

S3method(populate, crossvalidate) ## register methods
S3method(populate, prod)

Then once you have installed your package, you will note that you can call populate() but R will complain if you try to call populate.prod() etc directly by name from the prompt or in another function. This is because the functions that are the individual methods have not been exported from the namespace and thence are not visible outside it. Any function in your package that call populate() will be able to access the methods you have defined, but any functions or code outside your package can't see the methods at all. If you want, you can call non-exported functions using the ::: operator, i.e.

mypkg:::populate.crossvalidate(foo, bar)

will work, where mypkg is the name of your package.

To be honest, you don't even need a NAMESPACE file as R will auto generate one when you install the package, one that automatically exports all functions. That way your two methods will be visible as populate.xxx() (where xxx is the particular method) and will operate as S3 methods.

Read Section 1 Creating R Packages in the Writing R Extensions manual for details of what is involved, but yuo won't need to do half of this if you don't want too, especially if the package is for your own use. Just create the appropriate package folders (i.e. R and man), stick your .R files in R. Write a single .Rd file in man where you add

\name{Misc Functions}
\alias{populate}
\alias{populate.crossvalidate}
\alias{populate.prod}

at the top of the file. Add \alias{} for any other functions you have. Then you'll need to build and install the package.

Alternative using `sys.source()`

Although I don't (can't!) really recommend what I mention below as a long-term viable option here, there is an alternative that will allow you to isolate the functions from individual .r files as you initially requested. This is achieved through the use of environments not namespaces and doesn't involve creating a package.

The sys.source() function can be used to source R code/functions from a .R file and evaluate it in an environment. As you .R file is creating/defining functions, if you source it inside another environment then those will functions will be defined there, in that environment. They won't be visible on the standard search path by default and hence a populate() function defined in crossvalidate.R will not clash with a populate() defined in prod.R as long as you use two separate environments. When you need to use one set of functions you can assign the environment to the search path, upon which it will then be miraculously visible to everything, and when you are done you can detach it. The attach the other environment, use it, detach etc. Or you can arrange for R code to be evaluated in a specific environment using things like eval().

Like I said, this isn't a recommended solution but it will work, after a fashion, in the manner you describe. For example

## two source files that both define the same function
writeLines("populate <- function(x) 1:10", con = "crossvalidate.R")
writeLines("populate <- function(x) letters[1:10]", con = "prod.R")

## create two environments
crossvalidate <- new.env()
prod <- new.env()

## source the .R files into their respective environments
sys.source("crossvalidate.R", envir = crossvalidate)
sys.source("prod.R", envir = prod)

## show that there are no populates find-able on the search path

> ls()
[1] "crossvalidate" "prod" 
> find("populate")
character(0)

Now, attach one of the environments and call populate():

> attach(crossvalidate)
> populate()
 [1]  1  2  3  4  5  6  7  8  9 10
> detach(crossvalidate)

Now call the function in the other environment

> attach(prod)
> populate()
 [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j"
> detach(prod)

Clearly, each time you want to use a particular function, you need to attach() its environment and then call it, followed by a detach() call. Which is a pain.

I did say you can arrange for R code (expressions really) to be evaluated in a stated environment. You can use eval() of with() for this for example.

> with(crossvalidate, populate())
[1]  1  2  3  4  5  6  7  8  9 10

At least now you only need a single call to run the version of populate() of your choice. However, if calling the functions by their full name, e.g. populate.crossvalidate() is too much effort (as per your comments) then I dare say that even the with() idea will be too much hassle? And anyway, why would you use this when you can quite easily have your own R package.

How not to pollute the global namespace with declarations of a C header?

In the comments of the question @AnalPhabet suggested sarcastically, that one should use #include of a C header inside a namespace. @n.m. confirmed, that it is actually a working solution, and now I tested it on my own setup, and fortunately it is working pretty fine.

(Although I have no idea, if this is implementation specific or not, but I tested on both g++ and clang++ and it is working.)

It does not solve the opaqueness problem, but at least it makes a bit harder to access to the raw C data directly as it is living in a separate namespace now, therefore the user can't accidentaly access, but willingly.

So, the my_header.hpp should look like this:

namespace my
{
    extern "C"
    {
        #include "my_header.h"
    }

    enum class Consts
    {
        ALPHA = my_Consts_ALPHA,
        BETA  = my_Consts_BETA,
    };

    class Type : public my_Type
    {
        public:
            void
            method(Consts constant);
    };
}

So wherever my_header.hpp is #include'd, the user can only access to the C values as follows:

my::my_Consts_ALPHA       // The wrapped value is => my::Consts::ALPHA
my::my_Type               // The wrapped value is => my::Type
my::my_Type_method(t,..)  // The wrapped value is => t.method(..)

How do i avoid polluting global namespace with this function?

You are not polluting global namespace by declaring variable var loopBoolean insdie your function.

The problem would be if you didn't use var keyword.

A better way to rewrite would be:

 while(prompt("type something").toLowerCase() != "gogo"){    
   // do sth if you need
 }
 alert("good answer!");

Can I use $(document).ready(function() { ... }); to prevent global namespace pollution?

It's a good start, but you also need to ensure you don't create any global variables in the function. Setting strict mode (with "use strict";) should go most of the way for doing that. Checking every mention of window will go the rest of the way.

It does have the side effect of delaying execution of the code until the DOM is ready. That isn't always desirable. You might want to use a simple IIFE instead.

What is namespace pollution?

A namespace is simply the space in which names exist (seems obvious enough now).

Let's say you have two pieces of code, one to handle linked lists, the other to handle trees. Now both of these pieces of code would benefit from a getNext() function, to assist in traversal of the data structure.

However, if they both define that function with the same name, you may have a clash. What will your compiler do when you enter the following code?

xyzzy = getNext (xyzzy);

In other words, which getNext() do you actually want to use? There are numerous ways to solve this, such as with object-oriented code, where you would use:

xyzzy = xyzzy.getNext();

and that would auto-magically select the correct one by virtue of the fact you've specified the type via the variable xyzzy itself.

But, even with mostly-OO code, there may be situations where you have a conflict, and that's where namespaces enter the picture. They allow you to place the names into their own area so as to distinguish them.

C++, as one example, places all its standard library stuff into the std namespace. If, for some reason, you need an fopen() or rand() function that works differently from the one in the library, you can place it in your own namespace to keep them separate.

Now that describes namespace clashes. Technically, namespace pollution is simply leaving your symbols in a namespace where they shouldn't really be. This doesn't necessarily lead to clashes but it makes it more likely.

The reason why making a method static (in C-like languages) has to do with the names being made available to the world outside the given translation unit (when linking, for example). With the code:

int get42 (void) { return 42; }
int main (void) { return get42(); }

both of those functions are made available to the linker.

Unless you have a need to call get42() from somewhere else, making it static:

static int get42 (void) { return 42; }
int main (void) { return get42(); }

will prevent it from polluting the namespace maintained by the linker – in C, applying the static qualifier to a file-level object or function gives it internal linkage.

It's similar to the C++ namespaces in that you can have a static int get42() in four hundred different source files and they won't interfere with each other.

What does it mean global namespace would be polluted?

Quick Note On Garbage Collection

As variables lose scope, they will be eligible for garbage collection. If they are scoped globally, then they will not be eligible for collection until the global namespace loses scope.

Here is an example:

var arra = [];
for (var i = 0; i < 2003000; i++) {
 arra.push(i * i + i);
}

Adding this to your global namespace (at least for me) should ad 10,000 kb of memory usage (win7 firefox) which will not be collected. Other browsers may handle this differently.

Whereas having that same code in a scope which goes out of scope like this:

(function(){
 var arra = [];
 for (var i = 0; i < 2003000; i++) {
  arra.push(i * i + i);
 }
})();

Will allow arra to lose scope after the closure executes and be eligible for garbage collection.

Global Namespace Is Your Friend

Despite the many claims against using the global namespace, it is your friend. And like a good friend, you should not abuse your relationship.

Be Gentle

Don't abuse (usually referred to as "polluting") the global namespace. And what I mean by do not abuse the global namespace is - do not create multiple global variables. Here is a bad example of using the global namespace.

var x1 = 5;
var x2 = 20;
var y1 = 3
var y2 = 16;

var rise = y2 - y1;
var run = x2 - x1;

var slope = rise / run;

var risesquared = rise * rise;
var runsquared = run * run;

var distancesquared = risesquared + runsquared;

var distance = Math.sqrt(dinstancesquared);

This is going to create 11 global variables which could possibly be overwritten or misconstrued somewhere.

Be Resourceful

A more resourceful approach, which does not pollute the global namespace, would be to wrap this all in the module pattern and only use one global variable while exposing multiple variables.

Here is an example: (Please note this is simple and there is no error handling)

//Calculate is the only exposed global variable
var Calculate = function () {
 //all defintions in this closure are local, and will not be exposed to the global namespace
 var Coordinates = [];//array for coordinates
 var Coordinate = function (xcoord, ycoord) {//definition for type Coordinate
   this.x = xcoord;//assign values similar to a constructor
   this.y = ycoord;
  };

  return {//these methods will be exposed through the Calculate object
   AddCoordinate: function (x, y) {
   Coordinates.push(new Coordinate(x, y));//Add a new coordinate
  },

  Slope: function () {//Calculates slope and returns the value
   var c1 = Coordinates[0];
   var c2 = Coordinates[1];
   return c2.y - c1.y / c2.x - c1.x;//calculates rise over run and returns result
  },

  Distance: function () {
   //even with an excessive amount of variables declared, these are all still local
   var c1 = Coordinates[0];
   var c2 = Coordinates[1];

   var rise = c2.y - c1.y;
   var run = c2.x - c1.x;

   var risesquared = rise * rise;
   var runsquared = run * run;

   var distancesquared = risesquared + runsquared;

   var distance = Math.sqrt(distancesquared);

   return distance;
  }
 };
};

//this is a "self executing closure" and is used because these variables will be
//scoped to the function, and will not be available globally nor will they collide
//with any variable names in the global namespace
(function () {
 var calc = Calculate();
 calc.AddCoordinate(5, 20);
 calc.AddCoordinate(3, 16);
 console.log(calc.Slope());
 console.log(calc.Distance());
})();

How to Prevent Functions Polluting Global Namespace