Pandas Loc VS. Iloc VS. at VS. Iat

pandas loc vs. iloc vs. at vs. iat?

loc: only work on index

iloc: work on position

at: get scalar values. It's a very fast loc

iat: Get scalar values. It's a very fast iloc

Also,

at and iat are meant to access a scalar, that is, a single element
in the dataframe, while loc and iloc are ments to access several
elements at the same time, potentially to perform vectorized
operations.

http://pyciencia.blogspot.com/2015/05/obtener-y-filtrar-datos-de-un-dataframe.html

Difference between pandas .iloc and .iat?

iat and at working with scalar only, so very fast. Slower, more general functions are iloc and loc.

You can check docs:

Since indexing with [] must handle a lot of cases (single-label access, slicing, boolean indexing, etc.), it has a bit of overhead in order to figure out what you’re asking for. If you only want to access a scalar value, the fastest way is to use the at and iat methods, which are implemented on all of the data structures.

Similarly to loc, at provides label based scalar lookups, while, iat provides integer based lookups analogously to iloc.

Is .ix() always better than .loc() and .iloc() since it is faster and supports integer and label access?

Please refer to the doc Different Choices for Indexing, it states clearly when and why you should use .loc, .iloc over .ix, it's about explicit use case:

.ix supports mixed integer and label based access. It is primarily
label based, but will fall back to integer positional access unless
the corresponding axis is of integer type. .ix is the most general and
will support any of the inputs in .loc and .iloc. .ix also supports
floating point label schemes. .ix is exceptionally useful when dealing
with mixed positional and label based hierachical indexes.

However, when an axis is integer based, ONLY label based access and
not positional access is supported. Thus, in such cases, it’s usually
better to be explicit and use .iloc or .loc.

Update 22 Mar 2017

Thanks to comment from @Alexander, Pandas is going to deprecate ix in 0.20, details in here.

One of the strong reason behind is because mixing indexes -- positional and label (effectively using ix) has been a significant source of problems for users.

It is expected to migrate to use iloc and loc instead, here is a link on how to convert code.

Do loc and iloc methods behave differently in assignment?

You can read more about the differences between .iloc and .loc here, but for your particular case, the reason you're getting NaN is because of what you're assigning. With .iloc, it completely ignores the index of the value that you're assigning (which is pd.Series([order_id])), so it works fine and doesn't produce NaN.

With .loc, however, it does respect the index. In your example, pd.Series([order_id]) has an index of [0], as you can see:

>>> order_id = '123'
>>> pd.Series([order_id])
0 123
dtype: object

Now look at the index of the row where the NaN is occuring. It's 1. But the index of the value you're trying to assign to it is 0, as shown above. Mismatched indexes! What happens? The missing value - NaN.


If you want use .loc instead of .iloc, you can avoid this mismatched-index problem by converting the Series object to a numpy array (using .to_numpy()) before assigning:

b.loc[b.customer_id == customer_id, 'order_id'] = pd.Series([order_id]).to_numpy()

That will work as expected.

Pandas iloc returns different range than loc

As it mentioned in docs for loc:

Warning: Note that contrary to usual python slices, both the start and
the stop are included

On the other hand, iloc do selects based on integer-location based indexing, so it doesn't include stop index.

Why do double square brackets create a DataFrame with loc or iloc?

in principle when it's a list, it can be a list of more than one column's names, so it's natural for pandas to give you a DataFrame because only DataFrame can host more than one column. However, when it's a string instead of a list, pandas can safely say that it's just one column, and thus giving you a Series won't be a problem. Take the two formats and two outcomes as a reasonable flexibility to get whichever you need, a series or a dataframe. sometimes you just need specifically one of the two.

Getting Scalar Value with pandas loc/iloc/at/iat/ix

You're getting an error because the only index in b is 24. You could use that or (more easily) index by location using,

b.iloc[0]

This is a common gotcha for new Pandas users. Indices are preserved when pulling data out of a Series or DataFrame. They do not, in general, run from 0 -> N-1 where N is the length of the Series or the number of rows in the DataFrame.

This will help a bit http://pandas.pydata.org/pandas-docs/stable/indexing.html although I admit it was confusing for me at first as well.



Related Topics



Leave a reply



Submit