When to Use 'With' Function and Why Is It Good

When to use 'with' function and why is it good?

with is a wrapper for functions with no data argument

There are many functions that work on data frames and take a data argument so that you don't need to retype the name of the data frame for every time you reference a column. lm, plot.formula, subset, transform are just a few examples.

with is a general purpose wrapper to let you use any function as if it had a data argument.

Using the mtcars data set, we could fit a model with or without using the data argument:

# this is obviously annoying
mod = lm(mtcars$mpg ~ mtcars$cyl + mtcars$disp + mtcars$wt)

# this is nicer
mod = lm(mpg ~ cyl + disp + wt, data = mtcars)

However, if (for some strange reason) we wanted to find the mean of cyl + disp + wt, there is a problem because mean doesn't have a data argument like lm does. This is the issue that with addresses:

# without with(), we would be stuck here:
z = mean(mtcars$cyl + mtcars$disp + mtcars$wt)

# using with(), we can clean this up:
z = with(mtcars, mean(cyl + disp + wt))

Wrapping foo() in with(data, foo(...)) lets us use any function foo as if it had a data argument - which is to say we can use unquoted column names, preventing repetitive data_name$column_name or data_name[, "column_name"].

When to use with

Use with whenever you like interactively (R console) and in R scripts to save typing and make your code clearer. The more frequently you would need to re-type your data frame name for a single command (and the longer your data frame name is!), the greater the benefit of using with.

Also note that with isn't limited to data frames. From ?with:

For the default with method this may be an environment, a list, a data frame, or an integer as in sys.call.

I don't often work with environments, but when I do I find with very handy.

When you need pieces of a result for one line only

As @Rich Scriven suggests in comments, with can be very useful when you need to use the results of something like rle. If you only need the results once, then his example with(rle(data), lengths[values > 1]) lets you use the rle(data) results anonymously.

When to avoid with

When there is a data argument

Many functions that have a data argument use it for more than just easier syntax when you call it. Most modeling functions (like lm), and many others too (ggplot!) do a lot with the provided data. If you use with instead of a data argument, you'll limit the features available to you. If there is a data argument, use the data argument, not with.

Adding to the environment

In my example above, the result was assigned to the global environment (bar = with(...)). To make an assignment inside the list/environment/data, you can use within. (In the case of data.frames, transform is also good.)

In packages

Don't use with in R packages. There is a warning in help(subset) that could apply just about as well to with:

Warning This is a convenience function intended for use interactively. For programming it is better to use the standard subsetting functions like [, and in particular the non-standard evaluation of argument subset can have unanticipated consequences.

If you build an R package using with, when you check it you will probably get warnings or notes about using variables without a visible binding. This will make the package unacceptable by CRAN.

Alternatives to with

Don't use attach

Many (mostly dated) R tutorials use attach to avoid re-typing data frame names by making columns accessible to the global environment. attach is widely considered to be bad practice and should be avoided. One of the main dangers of attach is that data columns can become out of sync if they are modified individually. with avoids this pitfall because it is invoked one expression at a time. There are many, many questions on Stack Overflow where new users are following an old tutorial and run in to problems because of attach. The easy solution is always don't use attach.

Using with all the time seems too repetitive

If you are doing many steps of data manipulation, you may find yourself beginning every line of code with with(my_data, .... You might think this repetition is almost as bad as not using with. Both the data.table and dplyr packages offer efficient data manipulation with non-repetitive syntax. I'd encourage you to learn to use one of them. Both have excellent documentation.

In Python, when should I use a function instead of a method?

My general rule is this - is the operation performed on the object or by the object?

if it is done by the object, it should be a member operation. If it could apply to other things too, or is done by something else to the object then it should be a function (or perhaps a member of something else).

When introducing programming, it is traditional (albeit implementation incorrect) to describe objects in terms of real-world objects such as cars. You mention a duck, so let's go with that.

class duck: 
def __init__(self):pass
def eat(self, o): pass
def crap(self) : pass
def die(self)
....

In the context of the "objects are real things" analogy, it is "correct" to add a class method for anything which the object can do. So say I want to kill off a duck, do I add a
.kill() to the duck? No... as far as I know animals do not commit suicide. Therefore if I want to kill a duck I should do this:

def kill(o):
if isinstance(o, duck):
o.die()
elif isinstance(o, dog):
print "WHY????"
o.die()
elif isinstance(o, nyancat):
raise Exception("NYAN "*9001)
else:
print "can't kill it."

Moving away from this analogy, why do we use methods and classes? Because we want to contain data and hopefully structure our code in a manner such that it will be reusable and extensible in the future. This brings us to the notion of encapsulation which is so dear to OO design.

The encapsulation principal is really what this comes down to: as a designer you should hide everything about the implementation and class internals which it is not absolutely necessarily for any user or other developer to access. Because we deal with instances of classes, this reduces to "what operations are crucial on this instance". If an operation is not instance specific, then it should not be a member function.

TL;DR:
what @Bryan said. If it operates on an instance and needs to access data which is internal to the class instance, it should be a member function.

Classes vs. Functions

Create a function. Functions do specific things, classes are specific things.

Classes often have methods, which are functions that are associated with a particular class, and do things associated with the thing that the class is - but if all you want is to do something, a function is all you need.

Essentially, a class is a way of grouping functions (as methods) and data (as properties) into a logical unit revolving around a certain kind of thing. If you don't need that grouping, there's no need to make a class.

Is it a good practice to use a function call inside a function?

Q1. I want to know if there is any way to check how much stack memory my code is going to use, performance check of the code(optimization), memory leakage.

Not really. The C standard does not even mention the stack, and any C compiler can create binaries that does not use the stack and still conform to the C standard.

However, in reality a stack is almost always used and the overhead is very small. Just sum up all the local variables and you will get a good estimate.

Performance should not be predicted. It should be measured, and then you optimize if necessary.

It's very hard, if not impossible, for a compiler to detect memory leaks in a reliable way. You can use programs like Valgrind for that.

Q2. if the structure of my code is like,nested functions,
because of the function2 call inside a for loop is it going to use more stack memory than using once only?

No. Each time function2() is called, a new stackframe is created with enough space for 100000 doubles. But it will be released immediately when the function returns. The problem here is not calling a function in a loop. The problem is that you're allocating HUGE arrays on the stack, which might become a problem. You should consider allocating them dynamically instead. Basically, it will look like this:

void function2
{
double *var = malloc(100000*sizeof(*var));
/* Code */
free(var);
}

If you are using recursive functions, the stack might become a problem. Let's consider this sum function that sums all natural numbers up to num:

unsigned long long sum(unsigned long long num)
{
if(num == 0) return num;
return num + sum(n-1);
}

A long long is typically 8 bytes, so if you use this function for very large num (maybe 100000 or 1000000) you might encounter stack problems.

When should I be using classes in Python?

Classes are the pillar of Object Oriented Programming. OOP is highly concerned with code organization, reusability, and encapsulation.

First, a disclaimer: OOP is partially in contrast to Functional Programming, which is a different paradigm used a lot in Python. Not everyone who programs in Python (or surely most languages) uses OOP. You can do a lot in Java 8 that isn't very Object Oriented. If you don't want to use OOP, then don't. If you're just writing one-off scripts to process data that you'll never use again, then keep writing the way you are.

However, there are a lot of reasons to use OOP.

Some reasons:

  • Organization:
    OOP defines well known and standard ways of describing and defining both data and procedure in code. Both data and procedure can be stored at varying levels of definition (in different classes), and there are standard ways about talking about these definitions. That is, if you use OOP in a standard way, it will help your later self and others understand, edit, and use your code. Also, instead of using a complex, arbitrary data storage mechanism (dicts of dicts or lists or dicts or lists of dicts of sets, or whatever), you can name pieces of data structures and conveniently refer to them.

  • State: OOP helps you define and keep track of state. For instance, in a classic example, if you're creating a program that processes students (for instance, a grade program), you can keep all the info you need about them in one spot (name, age, gender, grade level, courses, grades, teachers, peers, diet, special needs, etc.), and this data is persisted as long as the object is alive, and is easily accessible. In contrast, in pure functional programming, state is never mutated in place.

  • Encapsulation:
    With encapsulation, procedure and data are stored together. Methods (an OOP term for functions) are defined right alongside the data that they operate on and produce. In a language like Java that allows for access control, or in Python, depending upon how you describe your public API, this means that methods and data can be hidden from the user. What this means is that if you need or want to change code, you can do whatever you want to the implementation of the code, but keep the public APIs the same.

  • Inheritance:
    Inheritance allows you to define data and procedure in one place (in one class), and then override or extend that functionality later. For instance, in Python, I often see people creating subclasses of the dict class in order to add additional functionality. A common change is overriding the method that throws an exception when a key is requested from a dictionary that doesn't exist to give a default value based on an unknown key. This allows you to extend your own code now or later, allow others to extend your code, and allows you to extend other people's code.

  • Reusability: All of these reasons and others allow for greater reusability of code. Object oriented code allows you to write solid (tested) code once, and then reuse over and over. If you need to tweak something for your specific use case, you can inherit from an existing class and overwrite the existing behavior. If you need to change something, you can change it all while maintaining the existing public method signatures, and no one is the wiser (hopefully).

Again, there are several reasons not to use OOP, and you don't need to. But luckily with a language like Python, you can use just a little bit or a lot, it's up to you.

An example of the student use case (no guarantee on code quality, just an example):

Object Oriented

class Student(object):
def __init__(self, name, age, gender, level, grades=None):
self.name = name
self.age = age
self.gender = gender
self.level = level
self.grades = grades or {}

def setGrade(self, course, grade):
self.grades[course] = grade

def getGrade(self, course):
return self.grades[course]

def getGPA(self):
return sum(self.grades.values())/len(self.grades)

# Define some students
john = Student("John", 12, "male", 6, {"math":3.3})
jane = Student("Jane", 12, "female", 6, {"math":3.5})

# Now we can get to the grades easily
print(john.getGPA())
print(jane.getGPA())

Standard Dict

def calculateGPA(gradeDict):
return sum(gradeDict.values())/len(gradeDict)

students = {}
# We can set the keys to variables so we might minimize typos
name, age, gender, level, grades = "name", "age", "gender", "level", "grades"
john, jane = "john", "jane"
math = "math"
students[john] = {}
students[john][age] = 12
students[john][gender] = "male"
students[john][level] = 6
students[john][grades] = {math:3.3}

students[jane] = {}
students[jane][age] = 12
students[jane][gender] = "female"
students[jane][level] = 6
students[jane][grades] = {math:3.5}

# At this point, we need to remember who the students are and where the grades are stored. Not a huge deal, but avoided by OOP.
print(calculateGPA(students[john][grades]))
print(calculateGPA(students[jane][grades]))

Should a function be created only if I use it more than once?

This is totally up to you.

However,

Separating code blocks into different functions can make the code more readable (when it's not done too excessively). Functions are not only meant for repeated use of code, they're also intended to make the code more orginized and easier to understand. You might get lost if you try to read through a long function that does a lot of tasks in parallel however if you take this function and break some parts of it into smaller functions with proper naming the function will be much shorter and clearer for you to maintain in the future or for the next programmer working on your project to understand what you've done.

Also, a good practice will be to create objects that will deal with certain more-specific tasks. This will allow (among many other benefits) to update the code by extending the classes without having to harm the original functionality.

As per your edit, a good way to determine whether or not you should split you function to pieces is found in the "function summary" you've written. When you have more than 1-2 tasks it will be a good idea to break into separate functions. I recommend writing a function for each of the following:

  • Fill arrays with info of files in directories
  • Processes TXT line by line, looks if the ID in TXT matches
    "Completed" files array
  • Publish array in an external product
  • Check in the other arrays to make a report of what is missing.
  • Saves the errors found in an array, then saves the array to an
    errors.txt
  • Ofcourse the function that wraps everything together and when done, returns the report.

Is it bad practice to use a function that changes stuff inside a condition, making the condition order-dependent?

Conditions are order-dependent whether you change the variables used in the condition or not. The two if statements that you used as an example are different and will be different whether you use myFunction() or not. They are equivalent to:

if (myFunction()) {
if (a === 2) {
alert("Hello, world!")
}
}

// Alert does not pop up.
if (a === 3) {
if (myFunction()) {
alert("Hello, universe!")
}
}

In my opinion, the bad practice in your code is not the fact that you change the condition's operands value inside the condition, but the fact that your application state is exposed and manipulated inside a function that does not even accept this state changing variable as a parameter. We usually try to isolate the functions from the code outside their scope and use their return value to affect the rest of the code. Global variables are 90% of the time a bad idea and as your code base gets larger and larger they tend to create problems that are difficult to trace, debug and solve.

When to use a Class vs. Function in PHP

Classes are used for representing data as objects. If you're representing something like a user's data, or an auction bid, creating a User object or AuctionBid object makes it easier to keep that data together, pass it around in your code, and make it more easily understandable to readers. These classes would have attributes (data fields like numbers, strings, or other objects) as well as methods (functions that you can operate on any class).

Classes don't usually offer any benefits in terms of performance, but they very rarely have any negative effects either. Their real benefit is in making the code clearer.

I recommend you read the PHP5 Object-Oriented Programming guide and the Wikipedia OOP entry.



Related Topics



Leave a reply



Submit