Accessing Pandas Column Using Squared Brackets VS Using a Dot (Like an Attribute)

Accessing Pandas column using squared brackets vs using a dot (like an attribute)

The "dot notation", i.e. df.col2 is the attribute access that's exposed as a convenience.

You may access an index on a Series, column on a DataFrame, and an item on a Panel directly as an attribute:

df['col2'] does the same: it returns a pd.Series of the column.

A few caveats about attribute access:

  • you cannot add a column (df.new_col = x won't work, worse: it will silently actually create a new attribute rather than a column - think monkey-patching here)
  • it won't work if you have spaces in the column name or if the column name is an integer.

Speed difference between bracket notation and dot notation for accessing columns in pandas

df['CID'] delegates to NDFrame.__getitem__ and it is more obvious you are performing an indexing operation.

On the other hand, df.CID delegates to NDFrame.__getattr__, which has to do some additional heavy lifting, mainly to determine whether 'CID' is an attribute, a function, or a column you're calling using the attribute access (a convenience, but not recommended for production code).


Now, why is it not recommended? Consider,

df = pd.DataFrame({'A': [1, 2, 3]})
df.A

0 1
1 2
2 3
Name: A, dtype: int64

There are no issues referring to column "A" as df.A, because it does not conflict with any attribute or function namings in pandas. However, consider the pop function (just as an example).

df.pop
# <bound method NDFrame.pop of ...>

df.pop is a bound method of df. Now, I'd like to create a column called "pop" for various reasons.

df['pop'] = [4, 5, 6]
df
A pop
0 1 4
1 2 5
2 3 6

Great, but,

df.pop
# <bound method NDFrame.pop of ...>

I cannot use the attribute notation to access this column. However...

df['pop']

0 4
1 5
2 6
Name: pop, dtype: int64

Bracket notation still works. That's why this is better.

pandas dataframe where clause with dot versus brackets column selection

The dot notation is just a convenient shortcut for accessing things vs. the standard brackets. Notably, they don't work when the column name is something like sum that is already a DataFrame method. My bet would be that the column name in your real example is running into that issue, and so it works fine with the bracket selection but is otherwise testing whether a method is equal to 'blah'.

Quick example below:

In [67]: df = pd.DataFrame(np.arange(10).reshape(5,2), columns=["number", "sum"])

In [68]: df
Out[68]:
number sum
0 0 1
1 2 3
2 4 5
3 6 7
4 8 9

In [69]: df.number == 0
Out[69]:
0 True
1 False
2 False
3 False
4 False
Name: number, dtype: bool

In [70]: df.sum == 0
Out[70]: False

In [71]: df['sum'] == 0
Out[71]:
0 False
1 False
2 False
3 False
4 False
Name: sum, dtype: bool

What is the difference between using loc and using just square brackets to filter for columns in Pandas/Python?

In the following situations, they behave the same:

  1. Selecting a single column (df['A'] is the same as df.loc[:, 'A'] -> selects column A)
  2. Selecting a list of columns (df[['A', 'B', 'C']] is the same as df.loc[:, ['A', 'B', 'C']] -> selects columns A, B and C)
  3. Slicing by rows (df[1:3] is the same as df.iloc[1:3] -> selects rows 1 and 2. Note, however, if you slice rows with loc, instead of iloc, you'll get rows 1, 2 and 3 assuming you have a RangeIndex. See details here.)

However, [] does not work in the following situations:

  1. You can select a single row with df.loc[row_label]
  2. You can select a list of rows with df.loc[[row_label1, row_label2]]
  3. You can slice columns with df.loc[:, 'A':'C']

These three cannot be done with [].
More importantly, if your selection involves both rows and columns, then assignment becomes problematic.

df[1:3]['A'] = 5

This selects rows 1 and 2 then selects column 'A' of the returning object and assigns value 5 to it. The problem is, the returning object might be a copy so this may not change the actual DataFrame. This raises SettingWithCopyWarning. The correct way of making this assignment is:

df.loc[1:3, 'A'] = 5

With .loc, you are guaranteed to modify the original DataFrame. It also allows you to slice columns (df.loc[:, 'C':'F']), select a single row (df.loc[5]), and select a list of rows (df.loc[[1, 2, 5]]).

Also note that these two were not included in the API at the same time. .loc was added much later as a more powerful and explicit indexer. See unutbu's answer for more detail.


Note: Getting columns with [] vs . is a completely different topic. . is only there for convenience. It only allows accessing columns whose names are valid Python identifiers (i.e. they cannot contain spaces, they cannot be composed of numbers...). It cannot be used when the names conflict with Series/DataFrame methods. It also cannot be used for non-existing columns (i.e. the assignment df.a = 1 won't work if there is no column a). Other than that, . and [] are the same.

What's the difference between the square bracket and dot notations in Python?

The dot operator is used for accessing attributes of any object. For example, a complex number

>>> c = 3+4j

has (among others) the two attributes real and imag:

>>> c.real
3.0
>>> c.imag
4.0

As well as those, it has a method, conjugate(), which is also an attribute:

>>> c.conjugate
<built-in method conjugate of complex object at 0x7f4422d73050>
>>> c.conjugate()
(3-4j)

Square bracket notation is used for accessing members of a collection, whether that's by key in the case of a dictionary or other mapping:

>>> d = {'a': 1, 'b': 2}
>>> d['a']
1

... or by index in the case of a sequence like a list or string:

>>> s = ['x', 'y', 'z']
>>> s[2]
'z'
>>> t = 'Kapow!'
>>> t[3]
'o'

These collections also, separately, have attributes:

>>> d.pop
<built-in method pop of dict object at 0x7f44204068c8>
>>> s.reverse
<built-in method reverse of list object at 0x7f4420454d08>
>>> t.lower
<built-in method lower of str object at 0x7f4422ce2688>

... and again, in the above cases, these attributes happen to be methods.

While all objects have some attributes, not all objects have members. For example, if we try to use square bracket notation to access a member of our complex number c:

>>> c[0]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'complex' object is not subscriptable

... we get an error (which makes sense, since there's no obvious way for a complex number to have members).

It's possible to define how [] and . access work in a user-defined class, using the special methods __getitem__() and __getattr__() respectively. Explaining how to do so is beyond the scope of this question, but you can read more about it in the Python Tutorial.

Propagation in Python - Pandas Series TypeError

Two options come to mind.

Option 1: use numpy.sqrt:

import numpy as np

joinedDF['combined_error'] = np.sqrt((joinedDF['error1']**2 +
joinedDF['error2']**2))

Option 2: if you want/need to avoid numpy for some reason, you can apply math.sqrt to the numeric column. This is likely slower than Option 1, but works in my testing:

   joinedDF['combined_error'] = (joinedDF['error1']**2 + 
joinedDF['error2']**2).apply(math.sqrt)

Minor style comment: it's generally recommended to refer to DataFrame columns using indexing (square brackets) rather than attribute access (dot notation), so I modified your code accordingly. More reading: What is the difference between using squared brackets or dot to access a column?

The difference between double brace `[[...]]` and single brace `[..]` indexing in Pandas

Consider this:

Source DF:

In [79]: df
Out[79]:
Brains Bodies
0 42 34
1 32 23

Selecting one column - results in Pandas.Series:

In [80]: df['Brains']
Out[80]:
0 42
1 32
Name: Brains, dtype: int64

In [81]: type(df['Brains'])
Out[81]: pandas.core.series.Series

Selecting subset of DataFrame - results in DataFrame:

In [82]: df[['Brains']]
Out[82]:
Brains
0 42
1 32

In [83]: type(df[['Brains']])
Out[83]: pandas.core.frame.DataFrame

Conclusion: the second approach allows us to select multiple columns from the DataFrame. The first one just for selecting single column...

Demo:

In [84]: df = pd.DataFrame(np.random.rand(5,6), columns=list('abcdef'))

In [85]: df
Out[85]:
a b c d e f
0 0.065196 0.257422 0.273534 0.831993 0.487693 0.660252
1 0.641677 0.462979 0.207757 0.597599 0.117029 0.429324
2 0.345314 0.053551 0.634602 0.143417 0.946373 0.770590
3 0.860276 0.223166 0.001615 0.212880 0.907163 0.437295
4 0.670969 0.218909 0.382810 0.275696 0.012626 0.347549

In [86]: df[['e','a','c']]
Out[86]:
e a c
0 0.487693 0.065196 0.273534
1 0.117029 0.641677 0.207757
2 0.946373 0.345314 0.634602
3 0.907163 0.860276 0.001615
4 0.012626 0.670969 0.382810

and if we specify only one column in the list we will get a DataFrame with one column:

In [87]: df[['e']]
Out[87]:
e
0 0.487693
1 0.117029
2 0.946373
3 0.907163
4 0.012626

serier to tolist() gives elements in squared brackets when appended python

ratio = df_fd.loc[(df_fd['variable'] == col) & (df_fd['Value'] == val)]['ratio'].values[0]

remove

  • ratio=list(ratio)


Related Topics



Leave a reply



Submit