How are Python in-place operator functions different than the standard operator functions?
First, you need to understand the difference between __add__
and __iadd__
.
An object's __add__
method is regular addition: it takes two parameters, returns their sum, and doesn't modify either parameter.
An object's __iadd__
method also takes two parameters, but makes the change in-place, modifying the contents of the first parameter. Because this requires object mutation, immutable types (like the standard number types) shouldn't have an __iadd__
method.
a + b
uses __add__
. a += b
uses __iadd__
if it exists; if it doesn't, it emulates it via __add__
, as in tmp = a + b; a = tmp
. operator.add
and operator.iadd
differ in the same way.
To the other question: operator.iadd(x, y)
isn't equivalent to z = x; z += y
, because if no __iadd__
exists __add__
will be used instead. You need to assign the value to ensure that the result is stored in both cases: x = operator.iadd(x, y)
.
You can see this yourself easily enough:
import operator
a = 1
operator.iadd(a, 2)
# a is still 1, because ints don't have __iadd__; iadd returned 3
b = ['a']
operator.iadd(b, ['b'])
# lists do have __iadd__, so b is now ['a', 'b']
Difference between operators and methods
If I understand question currectly...
In nutshell, everything is a method of object. You can find "expression operators" methods in python magic class methods, in the operators.
So, why python has "sexy" things like [x:y]
, [x]
, +
, -
? Because it is common things to most developers, even to unfamiliar with development people, so math functions like +
, -
will catch human eye and he will know what happens. Similar with indexing - it is common syntax in many languages.
But there is no special ways to express upper
, replace
, strip
methods, so there is no "expression operators" for it.
So, what is different between "expression operators" and methods, I'd say just the way it looks.
+ and += operators are different?
docs explain it very well, I think:
__iadd__()
, etc.
These methods are called to implement the augmented arithmetic assignments (+=, -=, *=, /=, //=, %=, **=, <<=, >>=, &=, ^=, |=
). These methods should attempt to do the operation in-place (modifyingself
) and return the result (which could be, but does not have to be,self
). If a specific method is not defined, the augmented assignment falls back to the normal methods. For instance, to execute the statementx += y
, wherex
is an instance of a class that has an__iadd__()
method,x.__iadd__(y)
is called.
+=
are designed to implement in-place modification. in case of simple addition, new object created and it's labelled using already-used name (c
).
Also, you'd notice that such behaviour of +=
operator only possible because of mutable nature of lists. Integers - an immutable type - won't produce the same result:
>>> c = 3
>>> print(c, id(c))
3 505389080
>>> c += c
>>> print(c, id(c))
6 505389128
Difference between assignment and compound operators in Python
This is a difference between mutable and immutable objects. A mutable object can implement obj *= something
by actually modifying the object in place; an immutable object can only return a new object with the updated value (in which case the result is identical to obj = obj * something
). The compound assignment statements can handle either case, it's entirely up to the object's implementation.
Which operator (+ vs +=) should be used for performance? (In-place Vs not-in-place)
x = x + 1 vs x += 1
Performance
It seems that you understand the semantical difference between x += 1
and x = x + 1
.
For benchmarking, you can use timeit in IPython.
After defining those functions:
import numpy as np
def in_place(n):
x = np.arange(n)
x += 1
def not_in_place(n):
x = np.arange(n)
x = x + 1
def in_place_no_broadcast(n):
x = np.arange(n)
x += np.ones(n, dtype=np.int)
You can simply use the %timeit
syntax to compare performances:
%timeit in_place(10**7)
20.3 ms ± 81.4 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit not_in_place(10**7)
30.4 ms ± 253 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit in_place_no_broadcast(10**7)
35.4 ms ± 101 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
not_in_place
is 50% slower than in_place
.
Note that broadcasting also makes a huge difference : numpy understands x += 1
as adding a 1
to every single element of x
, without having to create yet another array.
Warning
in_place
should be the preferred function: it's faster and uses less memory. You might run into bugs if you use and mutate this object at different places in your code, though. The typical example would be :
x = np.arange(5)
y = [x, x]
y[0][0] = 10
y
# [array([10, 1, 2, 3, 4]), array([10, 1, 2, 3, 4])]
Sorting
Your understanding of the advantages of in-place sorting is correct. It can make a huge difference in memory requirements when sorting large data sets.
There are other desirable features for a sorting algorithm (stable, acceptable worst-case complexity, ...) and it looks like the standard Python algorithm (Timsort) has many of them.
Timsort is an hybrid algorithm. Some parts of it are in-place and some require extra memory. It will never use more than n/2
though.
difference between adding lists in python with + and +=
p = p + test1
assigns a new value to variable p
, while p += test1
extends the list stored in variable p
. And since the list in p
is the same list as in test
, appending to p
also appends to test
, while assigning a new value to the variable p
does not change the value assigned to test
in any way.
Inplace functions in Python
You can't have a-priory knowledge about the operation for a given function. You need to either look at the source and deduce this information, or, examine the doc-string for it and hope the developer documents this behavior.
For example, in list.sort
:
help(list.sort)
Help on method_descriptor:
sort(...)
L.sort(key=None, reverse=False) -> None -- stable sort *IN PLACE*
For functions operating on certain types, their mutability generally lets you extract some knowledge about the operation. You can be certain, for example, that all functions operating on strings will eventually return a new one, meaning, they can't perform in-place operations. This is because you should be aware that strings in Python are immutable objects.
The difference between x += y and x = x + y
The object "on the left" handles the operator (usually, see the r-operator forms); in this case it is an Inplace Operator.
10.3.2. Inplace Operators
Many operations have an “in-place” version. Listed below are functions providing a more primitive access to in-place operators than the usual syntax does; for example, the statement
x += y
is equivalent tox = operator.iadd(x, y)
..
The actual result is determined by the "x" object and if it handles __iadd__
(eg. mutated as with lists) or just __add__
(eg. a new result object, as with strings) - the selection of which protocol to use, and what value to return for the assignment, is determined by operator.iadd
itself1.
So the shorthand of x += y ~~ x = x + y
is only true for some objects - notably those that are immutable and [only] implement __add__
.
See How are Python in-place operator functions different than the standard operator functions?
1 Semantically the operator.iadd function works about like:
if x.__iadd__:
x.__iadd__(y) # side-effect performed on x,
return x # returns original-but-modified object
else
return x.__add__(y) # return new object,
# __add__ should not have side-effects
Is making in-place operations return the object a bad idea?
Yes, it is a bad idea. The reason is that if in-place and non-in-place operations have apparently identical output, then programmers will frequently mix up in-place operations and non-in-place operations (List.sort()
vs. sorted()
) and that results in hard-to-detect errors.
In-place operations returning themselves can allow you to perform "method chaining", however, this is bad practice because you may bury functions with side-effects in the middle of a chain by accident.
To prevent errors like this, method chains should only have one method with side-effects, and that function should be at the end of the chain. Functions before that in the chain should transform the input without side-effects (for instance, navigating a tree, slicing a string, etc.). If in-place operations return themselves then a programmer is bound to accidentally use it in place of an alternative function that returns a copy and therefore has no side effects (again, List.sort()
vs. sorted()
) which may result in an error that is difficult to debug.
This is the reason Python standard library functions always either return a copy or return None
and modify objects in-place, but never modify objects in-place and also return themselves. Other Python libraries like Django also follow this practice (see this very similar question about Django).
Related Topics
Stopping a Thread After a Certain Amount of Time
Restart Python-Script from Within Itself
Difference Between Type(Obj) and Obj._Class_
How to Call an External Program in Python and Retrieve the Output and Return Code
How to Parse Somewhat Wrong JSON with Python
Tensorflow: How to Replace or Modify Gradient
Differencebetween 'Transform' and 'Fit_Transform' in Sklearn
Trailing Slash Triggers 404 in Flask Path Rule
How to Execute a Command Prompt Command from Python
Type Object 'Datetime.Datetime' Has No Attribute 'Datetime'
Python (And Python C API): _New_ Versus _Init_
How to Compare Dates in Django Templates
Using Cprofile Results with Kcachegrind