What does axis in pandas mean?
It specifies the axis along which the means are computed. By default axis=0
. This is consistent with the numpy.mean
usage when axis
is specified explicitly (in numpy.mean
, axis==None by default, which computes the mean value over the flattened array) , in which axis=0
along the rows (namely, index in pandas), and axis=1
along the columns. For added clarity, one may choose to specify axis='index'
(instead of axis=0
) or axis='columns'
(instead of axis=1
).
+------------+---------+--------+
| | A | B |
+------------+---------+---------
| 0 | 0.626386| 1.52325|----axis=1----->
+------------+---------+--------+
| |
| axis=0 |
↓ ↓
what is meaning of axis=1 in pandas sort_values function?
The parameter axis=1
refer to columns, while 0 refers to rows. In this case you are sorting by columns, specifically index 1, which is col2
(indexing in python starts at 0).
Some good examples here: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.sort_values.html
Ambiguity in Pandas Dataframe / Numpy Array axis definition
It's perhaps simplest to remember it as 0=down and 1=across.
This means:
- Use
axis=0
to apply a method down each column, or to the row labels (the index). - Use
axis=1
to apply a method across each row, or to the column labels.
Here's a picture to show the parts of a DataFrame that each axis refers to:
It's also useful to remember that Pandas follows NumPy's use of the word axis
. The usage is explained in NumPy's glossary of terms:
Axes are defined for arrays with more than one dimension. A 2-dimensional array has two corresponding axes: the first running vertically downwards across rows (axis 0), and the second running horizontally across columns (axis 1). [my emphasis]
So, concerning the method in the question, df.mean(axis=1)
, seems to be correctly defined. It takes the mean of entries horizontally across columns, that is, along each individual row. On the other hand, df.mean(axis=0)
would be an operation acting vertically downwards across rows.
Similarly, df.drop(name, axis=1)
refers to an action on column labels, because they intuitively go across the horizontal axis. Specifying axis=0
would make the method act on rows instead.
definition of axis in pandas any
Axis zero is the index. Axis one are the columns. That’s it.
The interpretation of why the different axis choices behave the way they do is confusing. It is my belief that it is consistent though.
For dropna
it refers to the axis from which keys will be dropped.
For any
, sum
, mean
, and many more, it refers to the axis over which we will evaluate the reduction function.
For apply
it refers to the axis that is used in each of the series objects that get passed to the function being applied.
For add
, mul
, etc. it refers to the axis that is used as a reference when adding a series to a dataframe.
You can make arguments why you may have made different choices. But I think the developers made good choices. If something specific confuses you, ask a question.
Why is the axes for the .mean() method in pandas the opposite in this scenario?
Just need to tell mean to work across columns with axis=1
df = pd.DataFrame({"height_1":[1.78,1.7,1.74,1.66],"height_2":[1.8,1.7,1.75,1.68],"height_3":[1.8,1.69,1.73,1.67]})
df = df.assign(height_mean=df.mean(axis=1))
df = df.assign(height_mean=df.loc[:,['height_1','height_2','height_3']].mean(axis=1))
print(df.to_string(index=False))
output
height_1 height_2 height_3 height_mean
1.78 1.80 1.80 1.793333
1.70 1.70 1.69 1.696667
1.74 1.75 1.73 1.740000
1.66 1.68 1.67 1.670000
why pandas.DataFrame.sum(axis=0) returns sum of values in each column where axis =0 represent rows?
I think the right way to interpret the axis
parameter is what axis you sum 'over' (or 'across'), rather than the 'direction' the sum is computed in. Specifying axis = 0
computes the sum over the rows, giving you a total for each column; axis = 1
computes the sum across the columns, giving you a total for each row.
Pandas Mean Axis Argument
You can use .mean(axis=1)
on the selected columns. Adding axis=1
means it will be applied on horizontal axis, or row-wise:
a = {'A':[1],'B':[2],'C':[3],'D':[4],'E':['AE']}
df = pd.DataFrame(a)
cols = ['B','C','D']
df['Average'] = df[cols].mean(axis=1)
print(df)
Output:
A B C D E Average
0 1 2 3 4 AE 3.0
Related Topics
Calculating a Directory's Size Using Python
How to Access the Ith Column of a Numpy Multidimensional Array
Generate Random Numbers Summing to a Predefined Value
What Rules Does Pandas Use to Generate a View VS a Copy
How to Print Original Variable's Name in Python After It Was Returned from a Function
Differencebetween a Function, an Unbound Method and a Bound Method
Nested Arguments Not Compiling
Why Is Python 3.X's Super() Magic
What's the Fastest Way of Checking If a Point Is Inside a Polygon in Python
How to Test That a Python Function Throws an Exception
Temporarily Redirect Stdout/Stderr
How to Avoid Circular Imports in Python
Why Does Checking a Variable Against Multiple Values with 'Or' Only Check the First Value
Resetting Generator Object in Python