python pandas dataframe, is it pass-by-value or pass-by-reference
The short answer is, Python always does pass-by-value, but every Python variable is actually a pointer to some object, so sometimes it looks like pass-by-reference.
In Python every object is either mutable or non-mutable. e.g., lists, dicts, modules and Pandas data frames are mutable, and ints, strings and tuples are non-mutable. Mutable objects can be changed internally (e.g., add an element to a list), but non-mutable objects cannot.
As I said at the start, you can think of every Python variable as a pointer to an object. When you pass a variable to a function, the variable (pointer) within the function is always a copy of the variable (pointer) that was passed in. So if you assign something new to the internal variable, all you are doing is changing the local variable to point to a different object. This doesn't alter (mutate) the original object that the variable pointed to, nor does it make the external variable point to the new object. At this point, the external variable still points to the original object, but the internal variable points to a new object.
If you want to alter the original object (only possible with mutable data types), you have to do something that alters the object without assigning a completely new value to the local variable. This is why letgo()
and letgo3()
leave the external item unaltered, but letgo2()
alters it.
As @ursan pointed out, if letgo()
used something like this instead, then it would alter (mutate) the original object that df
points to, which would change the value seen via the global a
variable:
def letgo(df):
df.drop('b', axis=1, inplace=True)
a = pd.DataFrame({'a':[1,2], 'b':[3,4]})
letgo(a) # will alter a
In some cases, you can completely hollow out the original variable and refill it with new data, without actually doing a direct assignment, e.g. this will alter the original object that v
points to, which will change the data seen when you use v
later:def letgo3(x):
x[:] = np.array([[3,3],[3,3]])
v = np.empty((2, 2))
letgo3(v) # will alter v
Notice that I'm not assigning something directly to x
; I'm assigning something to the entire internal range of x
.If you absolutely must create a completely new object and make it visible externally (which is sometimes the case with pandas), you have two options. The 'clean' option would be just to return the new object, e.g.,
def letgo(df):
df = df.drop('b',axis=1)
return df
a = pd.DataFrame({'a':[1,2], 'b':[3,4]})
a = letgo(a)
Another option would be to reach outside your function and directly alter a global variable. This changes a
to point to a new object, and any function that refers to a
afterward will see that new object:def letgo():
global a
a = a.drop('b',axis=1)
a = pd.DataFrame({'a':[1,2], 'b':[3,4]})
letgo() # will alter a!
Directly altering global variables is usually a bad idea, because anyone who reads your code will have a hard time figuring out how a
got changed. (I generally use global variables for shared parameters used by many functions in a script, but I don't let them alter those global variables.) When does pandas do pass-by-reference Vs pass-by-value when passing dataframe to a function?
By default python does pass by reference. Only if a explicit copy is made in the function like assignment or a copy() function is used the original object passed is unchanged.
Example with explicit copy :
#1. Assignment
def dropdf_copy1(df):
df = df.drop('y',axis=1)
#2. copy()
def dropdf_copy2(df):
df = df.copy()
df.drop('y',axis=1,inplace = True)
If explicit copy is not done then original object passed is changed. def dropdf_inplace(df):
df.drop('y',axis=1,inplace = True)
How to pass and return dataframe by reference in Python
Solution:
Use the drop()
method and set inplace=True
:
def ChangeDF(df):
df.drop(["Col3"], axis=1, inplace=True)
df = pd.DataFrame([[1, "One", "Hello"], [2, "Two", "Hi"]], columns=["Col1", "Col2", "Col3"])
ChangeDF(df)
print(df)
Pandas DataFrame as an Argument to a Function - Python
If a function parameter is a mutable object (e.g. a DataFrame
), then any changes you make in the function will be applied to the object.
E.g.
In [200]: df = pd.DataFrame({1:[1,2,3]})
In [201]: df
Out[201]:
1
0 1
1 2
2 3
In [202]: def f(frame):
...: frame['new'] = 'a'
...:
In [203]: f(df)
In [204]: df
Out[204]:
1 new
0 1 a
1 2 a
2 3 a
See this article for a good explanation on how Python passes function parameters. Best practice for passing Pandas DataFrame to functions
I use a lot of DataFrame.pipe
to organize my code so, I'm going to say option 2. pipe
takes and returns a DataFrame and you can chain multiple steps together.
def step1(main_df):
df = main_df.copy()
df['col1'] = df['col1']+1
return df
def step2(main_df):
df = main_df.copy()
df['col1'] = df['col1']+1
return df
def setp3(main_df):
df = main_df.copy()
df['col1'] = df['col1']+1
return df
main_df = (main_df.pipe(step1)
.pipe(step2)
.pipe(step3)
)
Related Topics
Meaning of Using Commas and Underscores with Python Assignment Operator
Python Argparse: Default Value or Specified Value
Numpy Argsort - What Is It Doing
Python MySQL Connector - Unread Result Found When Using Fetchone
Getting Gradient of Model Output W.R.T Weights Using Keras
Meaning of Using Commas and Underscores with Python Assignment Operator
Source Interface with Python and Urllib2
Python: Why Does ("Hello" Is "Hello") Evaluate as True
How to Make a Local Variable (Inside a Function) Global
Splitting List Based on Missing Numbers in a Sequence
How to Plot Only a Table in Matplotlib
Changing the Options of a Optionmenu When Clicking a Button
Counting Letter Frequency in a String (Python)
In Tensorflow, Get the Names of All the Tensors in a Graph
Overflowerror: Long Int Too Large to Convert to Float in Python