Pandas .At Versus .Loc

pandas .at versus .loc

Update: df.get_value is deprecated as of version 0.21.0. Using df.at or df.iat is the recommended method going forward.


df.at can only access a single value at a time.

df.loc can select multiple rows and/or columns.

Note that there is also df.get_value, which may be even quicker at accessing single values:

In [25]: %timeit df.loc[('a', 'A'), ('c', 'C')]
10000 loops, best of 3: 187 µs per loop

In [26]: %timeit df.at[('a', 'A'), ('c', 'C')]
100000 loops, best of 3: 8.33 µs per loop

In [35]: %timeit df.get_value(('a', 'A'), ('c', 'C'))
100000 loops, best of 3: 3.62 µs per loop

Under the hood, df.at[...] calls df.get_value, but it also does some type checking on the keys.

pandas loc vs. iloc vs. at vs. iat?

loc: only work on index

iloc: work on position

at: get scalar values. It's a very fast loc

iat: Get scalar values. It's a very fast iloc

Also,

at and iat are meant to access a scalar, that is, a single element
in the dataframe, while loc and iloc are ments to access several
elements at the same time, potentially to perform vectorized
operations.

http://pyciencia.blogspot.com/2015/05/obtener-y-filtrar-datos-de-un-dataframe.html

What is the difference between using loc and using just square brackets to filter for columns in Pandas/Python?

In the following situations, they behave the same:

  1. Selecting a single column (df['A'] is the same as df.loc[:, 'A'] -> selects column A)
  2. Selecting a list of columns (df[['A', 'B', 'C']] is the same as df.loc[:, ['A', 'B', 'C']] -> selects columns A, B and C)
  3. Slicing by rows (df[1:3] is the same as df.iloc[1:3] -> selects rows 1 and 2. Note, however, if you slice rows with loc, instead of iloc, you'll get rows 1, 2 and 3 assuming you have a RangeIndex. See details here.)

However, [] does not work in the following situations:

  1. You can select a single row with df.loc[row_label]
  2. You can select a list of rows with df.loc[[row_label1, row_label2]]
  3. You can slice columns with df.loc[:, 'A':'C']

These three cannot be done with [].
More importantly, if your selection involves both rows and columns, then assignment becomes problematic.

df[1:3]['A'] = 5

This selects rows 1 and 2 then selects column 'A' of the returning object and assigns value 5 to it. The problem is, the returning object might be a copy so this may not change the actual DataFrame. This raises SettingWithCopyWarning. The correct way of making this assignment is:

df.loc[1:3, 'A'] = 5

With .loc, you are guaranteed to modify the original DataFrame. It also allows you to slice columns (df.loc[:, 'C':'F']), select a single row (df.loc[5]), and select a list of rows (df.loc[[1, 2, 5]]).

Also note that these two were not included in the API at the same time. .loc was added much later as a more powerful and explicit indexer. See unutbu's answer for more detail.


Note: Getting columns with [] vs . is a completely different topic. . is only there for convenience. It only allows accessing columns whose names are valid Python identifiers (i.e. they cannot contain spaces, they cannot be composed of numbers...). It cannot be used when the names conflict with Series/DataFrame methods. It also cannot be used for non-existing columns (i.e. the assignment df.a = 1 won't work if there is no column a). Other than that, . and [] are the same.

Python: Pandas Series - Why use loc?

  • Explicit is better than implicit.

    df[boolean_mask] selects rows where boolean_mask is True, but there is a corner case when you might not want it to: when df has boolean-valued column labels:

    In [229]: df = pd.DataFrame({True:[1,2,3],False:[3,4,5]}); df
    Out[229]:
    False True
    0 3 1
    1 4 2
    2 5 3

    You might want to use df[[True]] to select the True column. Instead it raises a ValueError:

    In [230]: df[[True]]
    ValueError: Item wrong length 1 instead of 3.

    Versus using loc:

    In [231]: df.loc[[True]]
    Out[231]:
    False True
    0 3 1

    In contrast, the following does not raise ValueError even though the structure of df2 is almost the same as df1 above:

    In [258]: df2 = pd.DataFrame({'A':[1,2,3],'B':[3,4,5]}); df2
    Out[258]:
    A B
    0 1 3
    1 2 4
    2 3 5

    In [259]: df2[['B']]
    Out[259]:
    B
    0 3
    1 4
    2 5

    Thus, df[boolean_mask] does not always behave the same as df.loc[boolean_mask]. Even though this is arguably an unlikely use case, I would recommend always using df.loc[boolean_mask] instead of df[boolean_mask] because the meaning of df.loc's syntax is explicit. With df.loc[indexer] you know automatically that df.loc is selecting rows. In contrast, it is not clear if df[indexer] will select rows or columns (or raise ValueError) without knowing details about indexer and df.

  • df.loc[row_indexer, column_index] can select rows and columns. df[indexer] can only select rows or columns depending on the type of values in indexer and the type of column values df has (again, are they boolean?).

    In [237]: df2.loc[[True,False,True], 'B']
    Out[237]:
    0 3
    2 5
    Name: B, dtype: int64
  • When a slice is passed to df.loc the end-points are included in the range. When a slice is passed to df[...], the slice is interpreted as a half-open interval:

    In [239]: df2.loc[1:2]
    Out[239]:
    A B
    1 2 4
    2 3 5

    In [271]: df2[1:2]
    Out[271]:
    A B
    1 2 4

How are iloc and loc different?

Label vs. Location

The main distinction between the two methods is:

  • loc gets rows (and/or columns) with particular labels.

  • iloc gets rows (and/or columns) at integer locations.

To demonstrate, consider a series s of characters with a non-monotonic integer index:

>>> s = pd.Series(list("abcdef"), index=[49, 48, 47, 0, 1, 2]) 
49 a
48 b
47 c
0 d
1 e
2 f

>>> s.loc[0] # value at index label 0
'd'

>>> s.iloc[0] # value at index location 0
'a'

>>> s.loc[0:1] # rows at index labels between 0 and 1 (inclusive)
0 d
1 e

>>> s.iloc[0:1] # rows at index location between 0 and 1 (exclusive)
49 a

Here are some of the differences/similarities between s.loc and s.iloc when passed various objects:









































































<object>descriptions.loc[<object>]s.iloc[<object>]
0single itemValue at index label 0 (the string 'd')Value at index location 0 (the string 'a')
0:1sliceTwo rows (labels 0 and 1)One row (first row at location 0)
1:47slice with out-of-bounds endZero rows (empty Series)Five rows (location 1 onwards)
1:47:-1slice with negative stepthree rows (labels 1 back to 47)Zero rows (empty Series)
[2, 0]integer listTwo rows with given labelsTwo rows with given locations
s > 'e'Bool series (indicating which values have the property)One row (containing 'f')NotImplementedError
(s>'e').valuesBool arrayOne row (containing 'f')Same as loc
999int object not in indexKeyErrorIndexError (out of bounds)
-1int object not in indexKeyErrorReturns last value in s
lambda x: x.index[3]callable applied to series (here returning 3rd item in index)s.loc[s.index[3]]s.iloc[s.index[3]]

Change values in DataFrame - .iloc vs .loc

First of all. Don't use a for loop with dataframes if you really really have to.
Just use a boolean array to filter your dataframe with loc and assign your values that way.
You can do what you want with a simple merge.

df1 = df1.merge(df2, on='KEY', how='left').rename(columns={'value_alternative': 'value 2'})
df1.loc[df1['value 2'].isna(), 'value 2'] = df1['value']

Reason for iloc not working with assignment is in pandas you can't set a value in a copy of a dataframe. Pandas does this in order to work fast. To have access to the underlying data you need to use loc for filtering. Don't forget loc and iloc do different things. loc looks at the lables of the index while iloc looks at the index number.

In order for this to work you also have to delete the

df1["value 2"] = "nothing"

line from your program

Set value for particular cell in pandas DataFrame using index

RukTech's answer, df.set_value('C', 'x', 10), is far and away faster than the options I've suggested below. However, it has been slated for deprecation.

Going forward, the recommended method is .iat/.at.


Why df.xs('C')['x']=10 does not work:

df.xs('C') by default, returns a new dataframe with a copy of the data, so

df.xs('C')['x']=10

modifies this new dataframe only.

df['x'] returns a view of the df dataframe, so

df['x']['C'] = 10

modifies df itself.

Warning: It is sometimes difficult to predict if an operation returns a copy or a view. For this reason the docs recommend avoiding assignments with "chained indexing".


So the recommended alternative is

df.at['C', 'x'] = 10

which does modify df.


In [18]: %timeit df.set_value('C', 'x', 10)
100000 loops, best of 3: 2.9 µs per loop

In [20]: %timeit df['x']['C'] = 10
100000 loops, best of 3: 6.31 µs per loop

In [81]: %timeit df.at['C', 'x'] = 10
100000 loops, best of 3: 9.2 µs per loop

What would be the syntactical classification of 'loc' and 'iloc' in pandas?

Both LOC and ILOC are methods as they're associated with the Pandas module.

To access values from rows and columns within a Dataframe, both LOC and ILOC are used. One can use these methods to filter and modify values within DF.

LOC - loc() is a label-based data selecting method which means that we have to pass the name of the row or column which we want to select. This method includes the last element of the range passed in it, unlike iloc().

ILOC - iloc() is an indexed-based selecting method which means that we have to pass integer index in the method to select a specific row/column. This method does not include the last element of the range passed in it unlike loc()

Example:

import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.randint(10,100, (5, 4)), columns = list("ABCD"))

df.loc[1:3, "A":"C"]

before the comma, the colon takes row selections and after the comma, the colon takes column selections, here we've to specify the labels of the rows as well as the columns

df.iloc[1:3, 1:3] 

before the comma, the colon takes row selections and after a comma, the colon takes column selections, here we've to specify the index positions of the rows as well as the columns



Related Topics



Leave a reply



Submit