pandas loc vs. iloc vs. at vs. iat?
loc: only work on index
iloc: work on position
at: get scalar values. It's a very fast loc
iat: Get scalar values. It's a very fast iloc
Also,
at
andiat
are meant to access a scalar, that is, a single element
in the dataframe, whileloc
andiloc
are ments to access several
elements at the same time, potentially to perform vectorized
operations.
http://pyciencia.blogspot.com/2015/05/obtener-y-filtrar-datos-de-un-dataframe.html
Difference between pandas .iloc and .iat?
iat
and at
working with scalar only, so very fast. Slower, more general functions are iloc
and loc
.
You can check docs:
Since indexing with [] must handle a lot of cases (single-label access, slicing, boolean indexing, etc.), it has a bit of overhead in order to figure out what you’re asking for. If you only want to access a scalar value, the fastest way is to use the at and iat methods, which are implemented on all of the data structures.
Similarly to loc, at provides label based scalar lookups, while, iat provides integer based lookups analogously to iloc.
Is .ix() always better than .loc() and .iloc() since it is faster and supports integer and label access?
Please refer to the doc Different Choices for Indexing, it states clearly when and why you should use .loc, .iloc over .ix, it's about explicit use case:
.ix supports mixed integer and label based access. It is primarily
label based, but will fall back to integer positional access unless
the corresponding axis is of integer type. .ix is the most general and
will support any of the inputs in .loc and .iloc. .ix also supports
floating point label schemes. .ix is exceptionally useful when dealing
with mixed positional and label based hierachical indexes.However, when an axis is integer based, ONLY label based access and
not positional access is supported. Thus, in such cases, it’s usually
better to be explicit and use .iloc or .loc.
Update 22 Mar 2017
Thanks to comment from @Alexander, Pandas is going to deprecate ix
in 0.20, details in here.
One of the strong reason behind is because mixing indexes -- positional and label (effectively using ix
) has been a significant source of problems for users.
It is expected to migrate to use iloc
and loc
instead, here is a link on how to convert code.
Do loc and iloc methods behave differently in assignment?
You can read more about the differences between .iloc
and .loc
here, but for your particular case, the reason you're getting NaN is because of what you're assigning. With .iloc
, it completely ignores the index of the value that you're assigning (which is pd.Series([order_id])
), so it works fine and doesn't produce NaN.
With .loc
, however, it does respect the index. In your example, pd.Series([order_id])
has an index of [0]
, as you can see:
>>> order_id = '123'
>>> pd.Series([order_id])
0 123
dtype: object
Now look at the index of the row where the NaN is occuring. It's 1
. But the index of the value you're trying to assign to it is 0
, as shown above. Mismatched indexes! What happens? The missing value - NaN.
If you want use .loc
instead of .iloc
, you can avoid this mismatched-index problem by converting the Series object to a numpy array (using .to_numpy()
) before assigning:
b.loc[b.customer_id == customer_id, 'order_id'] = pd.Series([order_id]).to_numpy()
That will work as expected.
Pandas iloc returns different range than loc
As it mentioned in docs for loc
:
Warning: Note that contrary to usual python slices, both the start and
the stop are included
On the other hand, iloc
do selects based on integer-location based indexing, so it doesn't include stop index.
Why do double square brackets create a DataFrame with loc or iloc?
in principle when it's a list, it can be a list of more than one column's names, so it's natural for pandas to give you a DataFrame because only DataFrame can host more than one column. However, when it's a string instead of a list, pandas can safely say that it's just one column, and thus giving you a Series won't be a problem. Take the two formats and two outcomes as a reasonable flexibility to get whichever you need, a series or a dataframe. sometimes you just need specifically one of the two.
Getting Scalar Value with pandas loc/iloc/at/iat/ix
You're getting an error because the only index in b
is 24. You could use that or (more easily) index by location using,
b.iloc[0]
This is a common gotcha for new Pandas users. Indices are preserved when pulling data out of a Series or DataFrame. They do not, in general, run from 0 -> N-1 where N is the length of the Series or the number of rows in the DataFrame.
This will help a bit http://pandas.pydata.org/pandas-docs/stable/indexing.html although I admit it was confusing for me at first as well.
Related Topics
Extracting Just Month and Year Separately from Pandas Datetime Column
Using Pandas to Pd.Read_Excel() for Multiple Worksheets of the Same Workbook
How to Keep Python Print from Adding Newlines or Spaces
Apply Function to Each Element of a List
Integer Division in Python 2 and Python 3
Getting an "Invalid Syntax" When Trying to Perform String Interpolation
How to Upload File with Python Requests
How to Run Python Script in Cron
Cannot Open Include File: 'Io.H': No Such File or Directory
Is There a Built-In Function to Print All the Current Properties and Values of an Object
How to Print the Full Numpy Array, Without Truncation
What Is the Python Equivalent for a Case/Switch Statement
How to Protect My Python Scripts on Raspberry Pi
Request Uac Elevation from Within a Python Script
Multi Platform Portable Python