Calculating Standard Deviation of Each Row

Calculating standard deviation of each row

You can use apply and transform functions

set.seed(007)
X <- data.frame(matrix(sample(c(10:20, NA), 100, replace=TRUE), ncol=10))
transform(X, SD=apply(X,1, sd, na.rm = TRUE))
   X1 X2 X3 X4 X5 X6 X7 X8 X9 X10       SD
1  NA 12 17 18 19 16 12 13 20  14 3.041381
2  14 12 13 13 14 18 16 17 20  10 3.020302
3  11 19 NA 12 19 19 19 20 12  20 3.865805
4  10 11 20 12 15 17 18 17 18  12 3.496029
5  12 15 NA 14 20 18 16 11 14  18 2.958040
6  19 11 10 20 13 14 17 16 10  16 3.596294
7  14 16 17 15 10 11 15 15 11  16 2.449490
8  NA 10 15 19 19 12 15 15 19  14 3.201562
9  11 NA NA 20 20 14 14 17 14  19 3.356763
10 15 13 14 15 NA 13 15 NA 15  12 1.195229

From ?apply you can see ... which allows using optional arguments to FUN, in this case you can use na.rm=TRUE to omit NA values.

Using rowSds from matrixStats package also requires setting na.rm=TRUE to omit NA

library(matrixStats)
transform(X, SD=rowSds(X, na.rm=TRUE)) # same result as before.

How do you recalculate Standard Deviation at each row in a Dataframe?

This would do the trick:

df["Standard Deviation"] = df.groupby("Client ID")["Cost"].expanding(2).std(ddof=0).reset_index()["Cost"]

   Client ID  Session  Cost  Standard Deviation
0          1        0    10                 NaN
1          1        1    11            0.500000
2          1        2    14            1.699673
3          2        0    15                 NaN
4          2        1    16            0.500000
5          2        2    14            0.816497
6          2        3    22            3.112475

Explanation

You can rephrase your problem as:

Finding the cumulative standard deviation of the "Cost" column grouped by the "Client ID" column.

Pandas conveniently has built-in functions that handle both cumulative and group by computations.

Group By

A group by to compute the standard deviation looks like this:

df.groupby("Client ID")["Cost"].std()

Client ID
1    2.081666
2    3.593976

Cumulative

The cumulative standard deviation can be computed like this (note, we use ddof=0 to get the standard deviation of the population as a whole, which is what we want. we also use min_periods=2, otherwise the first row would have a value of 0.0 instead of NaN):

df.expanding(min_periods=2)["Cost"].std(ddof=0)

0         NaN
1    0.707107
2    2.081666
3    2.380476
4    2.588436
5    2.338090
6    3.909695

Group By + Cumulative

Combining the two, we get our result (note, we need to reset the index to lose the group by indexing and use the original index):

df.groupby("Client ID")["Cost"].expanding(2).std(ddof=0).reset_index()["Cost"]

0         NaN
1    0.500000
2    1.699673
3         NaN
4    0.500000
5    0.816497
6    3.112475

Calculating standard deviation across rows

Try this (using), withrowSds from the matrixStats package,

library(dplyr)
library(matrixStats)

columns <- c('colB', 'colC', 'colD')

df %>% 
  mutate(Mean= rowMeans(.[columns]), stdev=rowSds(as.matrix(.[columns])))

Returns

   colA colB colC colD     Mean    stdev
1 SampA   21   15   10 15.33333 5.507571
2 SampB   20   14   22 18.66667 4.163332
3 SampC   30   12   18 20.00000 9.165151

Your data

colA <- c("SampA", "SampB", "SampC")
colB <- c(21, 20, 30)
colC <- c(15, 14, 12)
colD <- c(10, 22, 18)
df <- data.frame(colA, colB, colC, colD)
df

How to calculate standard deviation per row?

apply lets you apply a function to all rows of your data:

apply(values_for_all, 1, sd, na.rm = TRUE)

To compute the standard deviation for each column instead, replace the 1 by 2.

How to calculate standard deviation with pandas for each row?

You can use .std(axis=1) [pandas-doc] instead, this will result in a Series with as indices the indices of your dataframe, and as values, the standard deviation of the two values in the corresponding columns:

>>> df.std(axis=1)
0    1.414214
1    2.687006
2    1.626346
3    1.223295
4    1.025305
5    1.732412
6    1.965757
dtype: float64

Calculating Standard Deviation of Each Row