Dataframe Standard Deviation issue due to a single column of text
Try:
df.iloc[:, :-1].std()
In english, this means use all rows, and use all but the last column.
If you want a standard deviations per row, then you will need:
df.iloc[:, :-1].std(axis=1)
Value error when calculating standard deviation on dataframe
Why is it not working?
Because axis=1
is for std
per columns, but you count Series
, df_stats.distance
, there is no columns so error raised.
If use std
of column, output is scalar:
print (df_stats.distance.std())
df_stats['std'] = df_stats.distance.std()
If need processing per multiple columns then axis=1
count std
per rows:
df_stats['std'] = df_stats[['distance','a1/a2','mean_distance']].std(axis=1)
If need std
per some datetimes, e.g. days:
df_stats['std'] = df_stats.groupby(pd.Grouper(freq='d')).distance.transform('std')
Standard deviation of dataframe?
I think there is a misunderstanding of the docs.
What pandas is deprecating is specifically the level
parameter in favor of its groupby counterpart (the link you shared). Nowhere it says pandas.Series.std
is deprecated as a whole:
level: int or level name, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a scalar.Deprecated since version 1.3.0: The level keyword is deprecated. Use groupby instead.
and:
numeric_only: bool, default None
Include only float, int, boolean columns. If None, will attempt to use everything, then use only numeric data. Not implemented for Series.Deprecated since version 1.5.0: Specifying numeric_only=None is deprecated. The default value will be False in a future version of pandas.
Given the line of code you propose, I see no reason for changing it. Keep using:
df['col'].std()
Python dataframe: Standard deviation of last one year of data
It looks like you are trying to calculate a rolling standard deviation, with the rolling window consisting of previous 252 rows.
Pandas has many .rolling()
methods, including one for standard deviation:
df['Daily_SD'] = df['Interday_Close_change'].rolling(252).std().shift()
If there is less than 252 rows available from which to calculate the standard deviation, the result for the row will be a null value (NaN
). Think about whether you really want to apply the .fillna('')
method to fill null values, as you are doing. That will convert the entire column from a numeric (float
) data type to object
data type.
Without the .shift()
method, the current row's value will be included in calculations. The .shift()
method will shift all rolling standard deviation values down by 1 row, so the current row's result will be the standard deviation of the previous 252 rows, as you want.
with pandas
version >= 1.2 you can use this instead:
df['Daily_SD'] = df['Interday_Close_change'].rolling(252, closed='left').std()
The closed=left
parameter will exclude the last point in the window from calculations.
standard deviation on dataframe does not work
sd
on data.frames has been defunct since R-3.0.0:
> ## Build a db of all R news entries.
> db <- news()
> ## sd
> news(grepl("sd", Text), db=db)
Changes in version 3.0.3:
PACKAGE INSTALLATION
o The new field SysDataCompression in the DESCRIPTION file allows
user control over the compression used for sysdata.rda objects in
the lazy-load database.
Changes in version 3.0.0:
DEPRECATED AND DEFUNCT
o mean() for data frames and sd() for data frames and matrices are
defunct.
Use sapply(x, sd)
instead.
Standard deviation of lists in pandas columns
The type of columns shows up as Object
because you have lists in your cells and a list is indeed an object in Python.
You can easily compute the standard deviation of each cell with df_final.applymap(lambda x: np.std(x))
.
How to get standard deviation of multiple columns in R?
With help of @shghm I found a way:
sd_list <- as.list(unname(apply(data[specific_variables], 2, sd, na.rm = TRUE)))
Pandas Standard Deviation returns NaN
You could fillna
to replace the missing values - passing in a DataFrame
with the last value of each group.
In [86]: (df.groupby('Category').std()
...: .fillna(df.groupby('Category').last()))
Out[86]:
A B C D E F
Category
A 0.500200 0.791039 0.498083 0.360320 0.965992 0.537068
B 0.714371 0.636975 0.153347 0.936872 0.000649 0.692558
C 0.295330 0.638823 0.133570 0.272600 0.647285 0.737942
D 0.350996 0.276052 0.389051 0.275708 0.269005 0.074137
E 0.639271 0.486151 0.860172 0.870838 0.831571 0.404813
F 0.407883 0.180813 0.091941 0.155699 0.501884 0.271024
G 0.384157 0.858391 0.278563 0.677627 0.998458 0.829019
H 0.109465 0.085861 0.440557 0.925500 0.767791 0.626924
Standard deviation and mean of complete pandas dataframe
To my knowledge, there is no direct way to do it in pandas. You have two options:
- Get the underlying numpy array and calculate mean or std on it. In contrast to pandas this will evaluate the function across all dimentions by default. For example, you can do
df.values.mean()
ordf.to_numpy().mean()
in pandas 0.24+. - Transform the table into a single column and then run the desired operation on that column
Related Topics
Ggplot2: Making Changes to Symbols in The Legend
Aggregating Rows for Multiple Columns in R
Rselenium on Docker: Where Are Files Downloaded
Error in Xj[I]: Invalid Subscript Type 'List'
Small Ggplot Object (1 Mb) Turns into 7 Gigabyte .Rdata Object When Saved
Importing an Excel File with Greek Characters into R in The Correct Encoding
How to Calculate Euclidean Distance Between Two Matrices in R
R Mlogit Model, Computationally Singular
Calculate a 2D Spline Curve in R
How to Keep Track of Total Transaction Amount Sent from an Account Each Last 6 Month
How to Find All Possible Subsets of a Set Iteratively in R
When/How/Where Is Parent.Frame in a Default Argument Interpreted
Error: C Stack Usage Is Too Close to The Limit in R
Tiff Plot Generation and Compression: R VS. Gimp VS. Irfanview VS. Photoshop File Sizes
Reconstruct Symmetric Matrix from Values in Long-Form
Make List of Vectors by Joining Pair-Corresponding Elements of 2 Vectors Efficiently in R