What Does Axis in Pandas Mean

What does axis in pandas mean?

It specifies the axis along which the means are computed. By default axis=0. This is consistent with the numpy.mean usage when axis is specified explicitly (in numpy.mean, axis==None by default, which computes the mean value over the flattened array) , in which axis=0 along the rows (namely, index in pandas), and axis=1 along the columns. For added clarity, one may choose to specify axis='index' (instead of axis=0) or axis='columns' (instead of axis=1).

+------------+---------+--------+
| | A | B |
+------------+---------+---------
| 0 | 0.626386| 1.52325|----axis=1----->
+------------+---------+--------+
| |
| axis=0 |
↓ ↓

what is meaning of axis=1 in pandas sort_values function?

The parameter axis=1 refer to columns, while 0 refers to rows. In this case you are sorting by columns, specifically index 1, which is col2 (indexing in python starts at 0).

Some good examples here: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.sort_values.html

Ambiguity in Pandas Dataframe / Numpy Array axis definition

It's perhaps simplest to remember it as 0=down and 1=across.

This means:

  • Use axis=0 to apply a method down each column, or to the row labels (the index).
  • Use axis=1 to apply a method across each row, or to the column labels.

Here's a picture to show the parts of a DataFrame that each axis refers to:

It's also useful to remember that Pandas follows NumPy's use of the word axis. The usage is explained in NumPy's glossary of terms:

Axes are defined for arrays with more than one dimension. A 2-dimensional array has two corresponding axes: the first running vertically downwards across rows (axis 0), and the second running horizontally across columns (axis 1). [my emphasis]

So, concerning the method in the question, df.mean(axis=1), seems to be correctly defined. It takes the mean of entries horizontally across columns, that is, along each individual row. On the other hand, df.mean(axis=0) would be an operation acting vertically downwards across rows.

Similarly, df.drop(name, axis=1) refers to an action on column labels, because they intuitively go across the horizontal axis. Specifying axis=0 would make the method act on rows instead.

definition of axis in pandas any

Axis zero is the index. Axis one are the columns. That’s it.

The interpretation of why the different axis choices behave the way they do is confusing. It is my belief that it is consistent though.

For dropna it refers to the axis from which keys will be dropped.

For any, sum, mean, and many more, it refers to the axis over which we will evaluate the reduction function.

For apply it refers to the axis that is used in each of the series objects that get passed to the function being applied.

For add, mul, etc. it refers to the axis that is used as a reference when adding a series to a dataframe.

You can make arguments why you may have made different choices. But I think the developers made good choices. If something specific confuses you, ask a question.

Why is the axes for the .mean() method in pandas the opposite in this scenario?

Just need to tell mean to work across columns with axis=1

df = pd.DataFrame({"height_1":[1.78,1.7,1.74,1.66],"height_2":[1.8,1.7,1.75,1.68],"height_3":[1.8,1.69,1.73,1.67]})
df = df.assign(height_mean=df.mean(axis=1))
df = df.assign(height_mean=df.loc[:,['height_1','height_2','height_3']].mean(axis=1))
print(df.to_string(index=False))

output

 height_1  height_2  height_3  height_mean
1.78 1.80 1.80 1.793333
1.70 1.70 1.69 1.696667
1.74 1.75 1.73 1.740000
1.66 1.68 1.67 1.670000

why pandas.DataFrame.sum(axis=0) returns sum of values in each column where axis =0 represent rows?

I think the right way to interpret the axis parameter is what axis you sum 'over' (or 'across'), rather than the 'direction' the sum is computed in. Specifying axis = 0 computes the sum over the rows, giving you a total for each column; axis = 1 computes the sum across the columns, giving you a total for each row.

Pandas Mean Axis Argument

You can use .mean(axis=1) on the selected columns. Adding axis=1 means it will be applied on horizontal axis, or row-wise:

a = {'A':[1],'B':[2],'C':[3],'D':[4],'E':['AE']}
df = pd.DataFrame(a)
cols = ['B','C','D']
df['Average'] = df[cols].mean(axis=1)
print(df)

Output:

   A  B  C  D   E  Average
0 1 2 3 4 AE 3.0


Related Topics



Leave a reply



Submit