Ambiguity in Pandas Dataframe / Numpy Array axis definition
It's perhaps simplest to remember it as 0=down and 1=across.
This means:
- Use
axis=0
to apply a method down each column, or to the row labels (the index). - Use
axis=1
to apply a method across each row, or to the column labels.
Here's a picture to show the parts of a DataFrame that each axis refers to:
It's also useful to remember that Pandas follows NumPy's use of the word axis
. The usage is explained in NumPy's glossary of terms:
Axes are defined for arrays with more than one dimension. A 2-dimensional array has two corresponding axes: the first running vertically downwards across rows (axis 0), and the second running horizontally across columns (axis 1). [my emphasis]
So, concerning the method in the question, df.mean(axis=1)
, seems to be correctly defined. It takes the mean of entries horizontally across columns, that is, along each individual row. On the other hand, df.mean(axis=0)
would be an operation acting vertically downwards across rows.
Similarly, df.drop(name, axis=1)
refers to an action on column labels, because they intuitively go across the horizontal axis. Specifying axis=0
would make the method act on rows instead.
Why is the axes for the .mean() method in pandas the opposite in this scenario?
Just need to tell mean to work across columns with axis=1
df = pd.DataFrame({"height_1":[1.78,1.7,1.74,1.66],"height_2":[1.8,1.7,1.75,1.68],"height_3":[1.8,1.69,1.73,1.67]})
df = df.assign(height_mean=df.mean(axis=1))
df = df.assign(height_mean=df.loc[:,['height_1','height_2','height_3']].mean(axis=1))
print(df.to_string(index=False))
output
height_1 height_2 height_3 height_mean
1.78 1.80 1.80 1.793333
1.70 1.70 1.69 1.696667
1.74 1.75 1.73 1.740000
1.66 1.68 1.67 1.670000
numpy maximum reduce error for pandas series and int
See the docs for ufunc.reduce
.reduce(array, axis=0, dtype=None, out=None, keepdims=False, initial=<no value>, where=True)
Reduces array’s dimension by one, by applying ufunc along one axis.
[df['a'], 2]
is not an array with a well-defined 0th axis. I’m not sure how this could work? The other operations are clear element-wise max operations which will operate on each argument after broadcasting against each other but numpy ufunc reduction operates on a single array.
Filtering byte stream efficiently before converting to numpy array / pandas dataframe
You can specify an offset for each field during dtype construction:
struct_dtypes = np.dtype({'names': ['n1', 'n2'], 'formats': ['d', 'd'], 'offsets': [0, 16]})
or
struct_dtypes = np.dtype({'n1': ('d', 0), 'n2': ('d', 16)})
Update (see comments below):
If you don't read the last element in the record, you need to specify the itemsize
:
struct_dtypes = np.dtype({'names': ['n1', 'ch'],
'formats': ['d', '8V'],
'offsets': [0, 8],
'itemsize': 24})
Related Topics
Return Multiple Columns from Pandas Apply()
Remove Non-Ascii Characters from Pandas Column
@Csrf_Exempt Does Not Work on Generic View Based Class
Convert Year/Month/Day to Day of Year in Python
What Exactly Is the Point of Memoryview in Python
Pip Install Gives Error: Unable to Find Vcvarsall.Bat
How to Return a String from a Regex Match in Python
List All Base Classes in a Hierarchy of Given Class
Safe Way to Parse User-Supplied Mathematical Formula in Python
Namespaces with Module Imports
Get Files Names Inside a Zip File on Ftp Server Without Downloading Whole Archive
How to Get Md5 Sum of a String Using Python
How to Make Sessions Timeout in Flask
Python: Fastest Way to Create a List of N Lists
How to Assign a Variable in an If Condition, and Then Return It
Django 1.7 - "No Migrations to Apply" When Run Migrate After Makemigrations
Putting Many Python Pandas Dataframes to One Excel Worksheet