How to Determine the Standard Deviation (Stddev) of a Set of Values

How do I determine the standard deviation (stddev) of a set of values?

While the sum of squares algorithm works fine most of the time, it can cause big trouble if you are dealing with very large numbers. You basically may end up with a negative variance...

Plus, don't never, ever, ever, compute a^2 as pow(a,2), a * a is almost certainly faster.

By far the best way of computing a standard deviation is Welford's method. My C is very rusty, but it could look something like:

public static double StandardDeviation(List<double> valueList)
{
double M = 0.0;
double S = 0.0;
int k = 1;
foreach (double value in valueList)
{
double tmpM = M;
M += (value - tmpM) / k;
S += (value - tmpM) * (value - M);
k++;
}
return Math.Sqrt(S / (k-2));
}

If you have the whole population (as opposed to a sample population), then use return Math.Sqrt(S / (k-1));.

EDIT: I've updated the code according to Jason's remarks...

EDIT: I've also updated the code according to Alex's remarks...

Finding the standard deviation from a list of numbers (user input)

Sure - this will do it.

package statistics;

/**
* Statistics
* @author Michael
* @link http://stackoverflow.com/questions/11978667/online-algorithm-for-calculating-standrd-deviation/11978689#11978689
* @link http://mathworld.wolfram.com/Variance.html
* @since 8/15/12 7:34 PM
*/
public class Statistics {

private int n;
private double sum;
private double sumsq;

public void reset() {
this.n = 0;
this.sum = 0.0;
this.sumsq = 0.0;
}

public synchronized void addValue(double x) {
++this.n;
this.sum += x;
this.sumsq += x*x;
}

public synchronized double calculateMean() {
double mean = 0.0;
if (this.n > 0) {
mean = this.sum/this.n;
}
return mean;
}

public synchronized double calculateVariance() {
double variance = 0.0;
if (this.n > 0) {
variance = Math.sqrt(this.sumsq-this.sum*this.sum/this.n)/this.n;
}
return variance;
}

public synchronized double calculateStandardDeviation() {
double deviation = 0.0;
if (this.n > 1) {
deviation = Math.sqrt((this.sumsq-this.sum*this.sum/this.n)/(this.n-1));
}
return deviation;
}
}

Is standard deviation (STDDEV) the right function for the job?

Standard deviation is just a way of characterizing how much a set of values spreads away from its average (i.e. mean). In a sense, it's an "average deviation from average", though a little more complicated than that. It is true that values which differ from the mean by many times the standard deviation tend to be rare, but that doesn't mean the standard deviation is a good benchmark for identifying anomalous values that might indicate something is wrong.

For one thing, if you set your acceptable range at the average plus or minus one standard deviation, you're probably going to get very frequent results outside that range! You could use the average plus or minus two standard deviations, or three, or however many you want to reduce the number of notifications/error conditions as low as you want, but there's no telling whether any of this actually helps you identify error conditions.

I think your main problem is not statistics. Your problem is that you don't know what kinds of results actually indicate an error. So before you program in any acceptable range, just let the system run for a while and collect some calibration data showing what kinds of values you see when it's running normally, and what kinds of values you see when it's not running normally. Make sure you have some way to tell which are which. Once you have a good amount of data for both conditions, you can analyze it (start with a simple histogram) and see what kinds of values are characteristic of normal operation and what kinds are characteristics of error conditions. Then you can set your acceptable range based on that.

If you want to get fancy, there is a statistical technique called likelihood ratio testing that can help you evaluate just how likely it is that your system is working properly. But I think it's probably overkill. Monitoring systems don't need to be super-precise about this stuff; just show a cautionary notice whenever the readings start to seem abnormal.

Finding standard deviation/variance in data values

Given your above, if you want the population variance and standard deviation, you could do:

// ... code from above
double averageMaximumX = query.Average(t => double.Parse(t.XMax));
double varianceMaximumX = query.Sum(t =>
Math.Pow(double.Parse(t.XMax) - averageMaximumX, 2)));

double stdDevMaximumX = Math.Sqrt(varianceMaximumX);
varianceMaximumX /= query.Count();

Calculate standard deviation and average for an item from the dataset in excel

Use

=STDEV(IF($A$2:$A$19=G4,$C$2:$C$19))

And

=AVERAGE(IF($A$2:$A$19=G4,$C$2:$C$19))

Enter using Ctrl + Shift + Enter as array formula and drag down for the number of names.

Names are in G4:G6. Formula in H and I between rows 4 and 6. Data in A2:C19.

Data as follows:

Data

Wrap in IFERROR if you wish to suppress any errors e.g.

=IFERROR(AVERAGE(IF($A$2:$A$19=G4,$C$2:$C$19)),"")


Related Topics



Leave a reply



Submit