Calculating standard deviation of each row
You can use apply
and transform
functions
set.seed(007)
X <- data.frame(matrix(sample(c(10:20, NA), 100, replace=TRUE), ncol=10))
transform(X, SD=apply(X,1, sd, na.rm = TRUE))
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 SD
1 NA 12 17 18 19 16 12 13 20 14 3.041381
2 14 12 13 13 14 18 16 17 20 10 3.020302
3 11 19 NA 12 19 19 19 20 12 20 3.865805
4 10 11 20 12 15 17 18 17 18 12 3.496029
5 12 15 NA 14 20 18 16 11 14 18 2.958040
6 19 11 10 20 13 14 17 16 10 16 3.596294
7 14 16 17 15 10 11 15 15 11 16 2.449490
8 NA 10 15 19 19 12 15 15 19 14 3.201562
9 11 NA NA 20 20 14 14 17 14 19 3.356763
10 15 13 14 15 NA 13 15 NA 15 12 1.195229
From ?apply
you can see ...
which allows using optional arguments to FUN, in this case you can use na.rm=TRUE
to omit NA
values.
Using rowSds
from matrixStats package also requires setting na.rm=TRUE
to omit NA
library(matrixStats)
transform(X, SD=rowSds(X, na.rm=TRUE)) # same result as before.
How do you recalculate Standard Deviation at each row in a Dataframe?
This would do the trick:
df["Standard Deviation"] = df.groupby("Client ID")["Cost"].expanding(2).std(ddof=0).reset_index()["Cost"]
Client ID Session Cost Standard Deviation
0 1 0 10 NaN
1 1 1 11 0.500000
2 1 2 14 1.699673
3 2 0 15 NaN
4 2 1 16 0.500000
5 2 2 14 0.816497
6 2 3 22 3.112475
Explanation
You can rephrase your problem as:
Finding the cumulative standard deviation of the "Cost" column grouped by the "Client ID" column.
Pandas conveniently has built-in functions that handle both cumulative and group by computations.
Group By
A group by to compute the standard deviation looks like this:
df.groupby("Client ID")["Cost"].std()
Client ID
1 2.081666
2 3.593976
Cumulative
The cumulative standard deviation can be computed like this (note, we use ddof=0
to get the standard deviation of the population as a whole, which is what we want. we also use min_periods=2
, otherwise the first row would have a value of 0.0
instead of NaN
):
df.expanding(min_periods=2)["Cost"].std(ddof=0)
0 NaN
1 0.707107
2 2.081666
3 2.380476
4 2.588436
5 2.338090
6 3.909695
Group By + Cumulative
Combining the two, we get our result (note, we need to reset the index to lose the group by indexing and use the original index):
df.groupby("Client ID")["Cost"].expanding(2).std(ddof=0).reset_index()["Cost"]
0 NaN
1 0.500000
2 1.699673
3 NaN
4 0.500000
5 0.816497
6 3.112475
Calculating standard deviation across rows
Try this (using), withrowSds
from the matrixStats
package,
library(dplyr)
library(matrixStats)
columns <- c('colB', 'colC', 'colD')
df %>%
mutate(Mean= rowMeans(.[columns]), stdev=rowSds(as.matrix(.[columns])))
Returns
colA colB colC colD Mean stdev
1 SampA 21 15 10 15.33333 5.507571
2 SampB 20 14 22 18.66667 4.163332
3 SampC 30 12 18 20.00000 9.165151
Your data
colA <- c("SampA", "SampB", "SampC")
colB <- c(21, 20, 30)
colC <- c(15, 14, 12)
colD <- c(10, 22, 18)
df <- data.frame(colA, colB, colC, colD)
df
How to calculate standard deviation per row?
apply
lets you apply a function to all rows of your data:
apply(values_for_all, 1, sd, na.rm = TRUE)
To compute the standard deviation for each column instead, replace the 1
by 2
.
How to calculate standard deviation with pandas for each row?
You can use .std(axis=1)
[pandas-doc] instead, this will result in a Series
with as indices the indices of your dataframe, and as values, the standard deviation of the two values in the corresponding columns:
>>> df.std(axis=1)
0 1.414214
1 2.687006
2 1.626346
3 1.223295
4 1.025305
5 1.732412
6 1.965757
dtype: float64
Related Topics
Adding Curved Flight Path Using R's Leaflet Package
Force Ggplot2 Scatter Plot to Be Square Shaped
Daily Time Series with Ts.. How to Specify Start and End
How to Display the Median Value in a Boxplot in Ggplot
How to Get Geom_Vline to Honor Facet_Wrap
Fastest Way to Read in 100,000 .Dat.Gz Files
Can Ggplot Theme Formatting Be Saved as an Object
Plots with Good Resolution for Printing and Screen Display
Create Sections Through a Loop with Knitr
How to Use the Row.Names Attribute to Order the Rows of My Dataframe in R
How Make 2 Column Layout in R Markdown When Rendering PDF
Include Data Examples in Developing R Packages
Run a Bash Script from an R Script
Does Converting Character Columns to Factors Save Memory
R Install Package Loaded Namespace
Developing Geographic Thematic Maps with R
Creating R Package, Warning: Package '---' Was Built Under R Version 3.1.2