Sql - Stdevp or Stdev and How to Use It

SQL - STDEVP or STDEV and how to use it?

To use it, simply:

SELECT STDEVP(OriginalValue)
FROM yourTable

From below, you probably want STDEVP.

From here:

STDEV is used when the group of numbers being evaluated are only a partial sampling of the whole population. The denominator for dividing the sum of squared deviations is N-1, where N is the number of observations ( a count of items in the data set ). Technically, subtracting the 1 is referred to as "non-biased."

STDEVP is used when the group of numbers being evaluated is complete - it's the entire population of values. In this case, the 1 is NOT subtracted and the denominator for dividing the sum of squared deviations is simply N itself, the number of observations ( a count of items in the data set ). Technically, this is referred to as "biased." Remembering that the P in STDEVP stands for "population" may be helpful. Since the data set is not a mere sample, but constituted of ALL the actual values, this standard deviation function can return a more precise result.

SQL STDEVP as analytical function (calculate standard deviation per account)

Use aggregation, not window functions:

SELECT pa.AccountID, STDEVP(pa.amountPerday) AS AmountSTDDev,
AVG(pa.amountPerday) as AmountAvg
FROM prevActivity pa
GROUP BY pa.AccountID;

I threw in the average as well, because the two often go together.

Also note that you are using CONVERT(VARCHAR(10), T.TransactionDate, 101) to extract the date. The logic CONVERT(DATE, T.TransactionDate) is not only simpler, but the intent is also much clearer.

Understanding T-SQL stdev, stdevp, var, and varp

In statistics Standard Deviation and Variance are measures of how much a metric in a population deviate from the mean (usually the average.)
The Standard Deviation is defined as the square root of the Variance and the Variance is defined as the average of the squared difference from the mean, i.e.:

For a population of size n: x1, x2, ..., xn
with mean: xmean

Stdevp = sqrt( ((x1-xmean)^2 + (x2-xmean)^2 + ... + (xn-xmean)^2)/n )

When values for the whole population are not available (most of the time) it is customary to apply Bessel's correction to get a better estimate of the actual standard deviation for the whole population. Bessel's correction is merely dividing by n-1 instead of by n when computing the variance, i.e:

Stdev = sqrt( ((x1-xmean)^2 + (x2-xmean)^2 + ... + (xn-xmean)^2)/(n-1) )

Note that for large enough data-sets it won't really matter which function is used.

You can verify my answer by running the following T-SQL script:

-- temporary data set with values 2, 3, 4
declare @t table([val] int);

insert into @t values
(2),(3),(4);

select avg(val) as [avg], -- equals to 3.0
-- Estimation of the population standard devisation using a sample and Bessel's Correction:
-- ((x1 - xmean)^2 + (x2 - xmean)^2 + ... + (xn-xmean)^2)/(n-1)
stdev(val) as [stdev],
sqrt( (square(2-3.0) + square(3-3) + square(4-3))/2) as [stdev calculated], -- calculated with value 2, 3, 4

-- Population standard deviation:
-- ((x1 - xmean)^2 + (x2 - xmean)^2 + ... + (xn-xmean)^2)/n
stdevp(val) as [stdevp],
sqrt( (square(2-3.0) + square(3-3) + square(4-3))/3) as [stdevp calculated] -- calculated with value 2, 3, 4
from @t;

Further reading wikipedia articles for: standard deviation and Bessel's Correction.

Algorithm of STDEV and STDEVP in SQL Server

STDEVP is Population Standard Deviation, STDEV is Sample Standard Deviation. Knowing this, take a look at this explanation: https://math.stackexchange.com/questions/15098/sample-standard-deviation-vs-population-standard-deviation

STDEVP for calculated fields

Your query doesn't match the sample data.

I can see the problem, though. The SUM() are calculating a single value for each group, and then you are taking the standard deviation of that value. Because you cannot nest aggregation functions, you have turned it into a window function.

Get rid of the sum(). The following should work in SQL Server:

SELECT ID, CHANNEL, VENDOR, 
STDEVP(COALESCE(SALES.A, 0) + COALESCE(QSALES.B, 0)) as STDEV_SALES
FROM SALES . . .
QSALES
GROUP BY ID, CHANNEL, VENDOR;

I would also return the COUNT(*) . . . the standard deviation doesn't make sense if you have fewer than 3 rows. (Okay, it is defined for two values, but not very useful.)



Related Topics



Leave a reply



Submit