Pandas: Subindexing dataframes: Copies vs views
Your answer lies in the pandas docs: returning-a-view-versus-a-copy.
In your example,Whenever an array of labels or a boolean vector are involved
in the indexing operation, the result will be a copy.
With single label / scalar indexing and slicing,
e.g. df.ix[3:6] or df.ix[:, 'A'], a view will be returned.
bar
is a view of slices of foo
. If you wanted a copy, you could have used the copy
method. Modifying bar
also modifies foo
. pandas does not appear to have a copy-on-write mechanism.See my code example below to illustrate:
In [1]: import pandas as pd
...: import numpy as np
...: foo = pd.DataFrame(np.random.random((10,5)))
...:
In [2]: pd.__version__
Out[2]: '0.12.0.dev-35312e4'
In [3]: np.__version__
Out[3]: '1.7.1'
In [4]: # DataFrame has copy method
...: foo_copy = foo.copy()
In [5]: bar = foo.iloc[3:5,1:4]
In [6]: bar == foo.iloc[3:5,1:4] == foo_copy.iloc[3:5,1:4]
Out[6]:
1 2 3
3 True True True
4 True True True
In [7]: # Changing the view
...: bar.ix[3,1] = 5
In [8]: # View and DataFrame still equal
...: bar == foo.iloc[3:5,1:4]
Out[8]:
1 2 3
3 True True True
4 True True True
In [9]: # It is now different from a copy of original
...: bar == foo_copy.iloc[3:5,1:4]
Out[9]:
1 2 3
3 False True True
4 True True True
What rules does Pandas use to generate a view vs a copy?
Here's the rules, subsequent override:
All operations generate a copy
If
inplace=True
is provided, it will modify in-place; only some operations support thisAn indexer that sets, e.g.
.loc/.iloc/.iat/.at
will set inplace.An indexer that gets on a single-dtyped object is almost always a view (depending on the memory layout it may not be that's why this is not reliable). This is mainly for efficiency. (the example from above is for
.query
; this will always return a copy as its evaluated bynumexpr
)An indexer that gets on a multiple-dtyped object is always a copy.
chained indexing
df[df.C <= df.B].loc[:,'B':'E']
is not guaranteed to work (and thus you shoulld never do this). Instead do:
df.loc[df.C <= df.B, 'B':'E']
as this is faster and will always workThe chained indexing is 2 separate python operations and thus cannot be reliably intercepted by pandas (you will oftentimes get a SettingWithCopyWarning
, but that is not 100% detectable either). The dev docs, which you pointed, offer a much more full explanation.
Checking whether data frame is copy or view in Pandas
Answers from HYRY and Marius in comments!
One can check either by:
testing equivalence of the
values.base
attribute rather than thevalues
attribute, as in:df.values.base is df2.values.base
instead ofdf.values is df2.values
.or using the (admittedly internal)
_is_view
attribute (df2._is_view
isTrue
).
Pandas: What is a view?
To understand what a View is, you have to know what an arrays is. An array is not only the "stuff" (items) you put in it. It needs (besides others) also information about the number of elements, the shape of your array and how to interpret the elements.
So an array would be an object at least containing these attributes:
class Series:
data # A pointer to where your array is stored
size # The number of items in your array
shape # The shape of your array
dtype # How to interpret the array
So when you create a view a new array object is created but (and that's important) the View's data
pointer points to the original array. It could be offset but it still points to one memory location that belongs to the original array. But even though it shares some data with the original the size, shape, dtype (, ...) might have changed so it requires a new object. That's why they have different id
s.Think of it like windows. You have a garden (the array) and you have several windows, each window is a different object but all of them look out at the same (your) garden. Ok, granted, with some slicing operations you would have more escher-like windows but a metaphor always lacks some details :-)
How to create a view of dataframe in pandas?
You generally can't return a view.
Your answer lies in the pandas docs:
returning-a-view-versus-a-copy.
This answer was found in the following post: Link.Whenever an array of labels or a boolean vector are involved in the
indexing operation, the result will be a copy. With single label /
scalar indexing and slicing, e.g. df.ix[3:6] or df.ix[:, 'A'], a view
will be returned.
Related Topics
How to Get a Gcp Bearer Token Programmatically with Python
Pandas Groupby and Sum Only One Column
Looping from 1 to Infinity in Python
What Does 'Wb' Mean in This Code, Using Python
Passing Double Quote Shell Commands in Python to Subprocess.Popen()
Preventing Python Code from Importing Certain Modules
Use Index in Pandas to Plot Data
Statistics: Combinations in Python
Shell Script: Execute a Python Program from Within a Shell Script
Why Isn't Assigning to an Empty List (E.G. [] = "") an Error
Is There a Library Function for Root Mean Square Error (Rmse) in Python
Why Is the Apt-Get Function Not Working in the Terminal on MAC Os X V10.9 (Mavericks)
How to Convert a List into a String with Spaces in Python
Is It Better to Use "Is" or "==" for Number Comparison in Python