Standard Deviation on Dataframe Does Not Work

Dataframe Standard Deviation issue due to a single column of text

Try:

df.iloc[:, :-1].std()

In english, this means use all rows, and use all but the last column.

If you want a standard deviations per row, then you will need:

df.iloc[:, :-1].std(axis=1)

Value error when calculating standard deviation on dataframe

Why is it not working?

Because axis=1 is for std per columns, but you count Series, df_stats.distance, there is no columns so error raised.

If use std of column, output is scalar:

print (df_stats.distance.std()) 

df_stats['std'] = df_stats.distance.std()

If need processing per multiple columns then axis=1 count std per rows:

df_stats['std'] = df_stats[['distance','a1/a2','mean_distance']].std(axis=1)

If need std per some datetimes, e.g. days:

df_stats['std'] = df_stats.groupby(pd.Grouper(freq='d')).distance.transform('std')

Standard deviation of dataframe?

I think there is a misunderstanding of the docs.

What pandas is deprecating is specifically the level parameter in favor of its groupby counterpart (the link you shared). Nowhere it says pandas.Series.std is deprecated as a whole:

level: int or level name, default None
If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a scalar.

Deprecated since version 1.3.0: The level keyword is deprecated. Use groupby instead.

and:

numeric_only: bool, default None
Include only float, int, boolean columns. If None, will attempt to use everything, then use only numeric data. Not implemented for Series.

Deprecated since version 1.5.0: Specifying numeric_only=None is deprecated. The default value will be False in a future version of pandas.

Given the line of code you propose, I see no reason for changing it. Keep using:

df['col'].std()

Python dataframe: Standard deviation of last one year of data

It looks like you are trying to calculate a rolling standard deviation, with the rolling window consisting of previous 252 rows.

Pandas has many .rolling() methods, including one for standard deviation:

df['Daily_SD'] = df['Interday_Close_change'].rolling(252).std().shift()

If there is less than 252 rows available from which to calculate the standard deviation, the result for the row will be a null value (NaN). Think about whether you really want to apply the .fillna('') method to fill null values, as you are doing. That will convert the entire column from a numeric (float) data type to object data type.

Without the .shift() method, the current row's value will be included in calculations. The .shift() method will shift all rolling standard deviation values down by 1 row, so the current row's result will be the standard deviation of the previous 252 rows, as you want.

with pandas version >= 1.2 you can use this instead:

df['Daily_SD'] = df['Interday_Close_change'].rolling(252, closed='left').std()

The closed=left parameter will exclude the last point in the window from calculations.

standard deviation on dataframe does not work

sd on data.frames has been defunct since R-3.0.0:

> ## Build a db of all R news entries.
> db <- news()
> ## sd
> news(grepl("sd", Text), db=db)
Changes in version 3.0.3:

PACKAGE INSTALLATION

o The new field SysDataCompression in the DESCRIPTION file allows
user control over the compression used for sysdata.rda objects in
the lazy-load database.

Changes in version 3.0.0:

DEPRECATED AND DEFUNCT

o mean() for data frames and sd() for data frames and matrices are
defunct.

Use sapply(x, sd) instead.

Standard deviation of lists in pandas columns

The type of columns shows up as Object because you have lists in your cells and a list is indeed an object in Python.

You can easily compute the standard deviation of each cell with df_final.applymap(lambda x: np.std(x)).

How to get standard deviation of multiple columns in R?

With help of @shghm I found a way:

sd_list <- as.list(unname(apply(data[specific_variables], 2, sd, na.rm = TRUE)))

Pandas Standard Deviation returns NaN

You could fillna to replace the missing values - passing in a DataFrame with the last value of each group.

In [86]: (df.groupby('Category').std()
...: .fillna(df.groupby('Category').last()))

Out[86]:
A B C D E F
Category
A 0.500200 0.791039 0.498083 0.360320 0.965992 0.537068
B 0.714371 0.636975 0.153347 0.936872 0.000649 0.692558
C 0.295330 0.638823 0.133570 0.272600 0.647285 0.737942
D 0.350996 0.276052 0.389051 0.275708 0.269005 0.074137
E 0.639271 0.486151 0.860172 0.870838 0.831571 0.404813
F 0.407883 0.180813 0.091941 0.155699 0.501884 0.271024
G 0.384157 0.858391 0.278563 0.677627 0.998458 0.829019
H 0.109465 0.085861 0.440557 0.925500 0.767791 0.626924

Standard deviation and mean of complete pandas dataframe

To my knowledge, there is no direct way to do it in pandas. You have two options:

  1. Get the underlying numpy array and calculate mean or std on it. In contrast to pandas this will evaluate the function across all dimentions by default. For example, you can do df.values.mean() or df.to_numpy().mean() in pandas 0.24+.
  2. Transform the table into a single column and then run the desired operation on that column


Related Topics



Leave a reply



Submit