SQL - STDEVP or STDEV and how to use it?
To use it, simply:
SELECT STDEVP(OriginalValue)
FROM yourTable
From below, you probably want STDEVP
.
From here:
STDEV is used when the group of numbers being evaluated are only a partial sampling of the whole population. The denominator for dividing the sum of squared deviations is N-1, where N is the number of observations ( a count of items in the data set ). Technically, subtracting the 1 is referred to as "non-biased."
STDEVP is used when the group of numbers being evaluated is complete - it's the entire population of values. In this case, the 1 is NOT subtracted and the denominator for dividing the sum of squared deviations is simply N itself, the number of observations ( a count of items in the data set ). Technically, this is referred to as "biased." Remembering that the P in STDEVP stands for "population" may be helpful. Since the data set is not a mere sample, but constituted of ALL the actual values, this standard deviation function can return a more precise result.
SQL STDEVP as analytical function (calculate standard deviation per account)
Use aggregation, not window functions:
SELECT pa.AccountID, STDEVP(pa.amountPerday) AS AmountSTDDev,
AVG(pa.amountPerday) as AmountAvg
FROM prevActivity pa
GROUP BY pa.AccountID;
I threw in the average as well, because the two often go together.
Also note that you are using CONVERT(VARCHAR(10), T.TransactionDate, 101)
to extract the date. The logic CONVERT(DATE, T.TransactionDate)
is not only simpler, but the intent is also much clearer.
Understanding T-SQL stdev, stdevp, var, and varp
In statistics Standard Deviation and Variance are measures of how much a metric in a population deviate from the mean (usually the average.)
The Standard Deviation is defined as the square root of the Variance and the Variance is defined as the average of the squared difference from the mean, i.e.:
For a population of size n: x1, x2, ..., xn
with mean: xmean
Stdevp = sqrt( ((x1-xmean)^2 + (x2-xmean)^2 + ... + (xn-xmean)^2)/n )
When values for the whole population are not available (most of the time) it is customary to apply Bessel's correction to get a better estimate of the actual standard deviation for the whole population. Bessel's correction is merely dividing by n-1 instead of by n when computing the variance, i.e:
Stdev = sqrt( ((x1-xmean)^2 + (x2-xmean)^2 + ... + (xn-xmean)^2)/(n-1) )
Note that for large enough data-sets it won't really matter which function is used.
You can verify my answer by running the following T-SQL script:
-- temporary data set with values 2, 3, 4
declare @t table([val] int);
insert into @t values
(2),(3),(4);
select avg(val) as [avg], -- equals to 3.0
-- Estimation of the population standard devisation using a sample and Bessel's Correction:
-- ((x1 - xmean)^2 + (x2 - xmean)^2 + ... + (xn-xmean)^2)/(n-1)
stdev(val) as [stdev],
sqrt( (square(2-3.0) + square(3-3) + square(4-3))/2) as [stdev calculated], -- calculated with value 2, 3, 4
-- Population standard deviation:
-- ((x1 - xmean)^2 + (x2 - xmean)^2 + ... + (xn-xmean)^2)/n
stdevp(val) as [stdevp],
sqrt( (square(2-3.0) + square(3-3) + square(4-3))/3) as [stdevp calculated] -- calculated with value 2, 3, 4
from @t;
Further reading wikipedia articles for: standard deviation and Bessel's Correction.
Algorithm of STDEV and STDEVP in SQL Server
STDEVP is Population Standard Deviation, STDEV is Sample Standard Deviation. Knowing this, take a look at this explanation: https://math.stackexchange.com/questions/15098/sample-standard-deviation-vs-population-standard-deviation
STDEVP for calculated fields
Your query doesn't match the sample data.
I can see the problem, though. The SUM()
are calculating a single value for each group, and then you are taking the standard deviation of that value. Because you cannot nest aggregation functions, you have turned it into a window function.
Get rid of the sum()
. The following should work in SQL Server:
SELECT ID, CHANNEL, VENDOR,
STDEVP(COALESCE(SALES.A, 0) + COALESCE(QSALES.B, 0)) as STDEV_SALES
FROM SALES . . .
QSALES
GROUP BY ID, CHANNEL, VENDOR;
I would also return the COUNT(*)
. . . the standard deviation doesn't make sense if you have fewer than 3 rows. (Okay, it is defined for two values, but not very useful.)
Related Topics
Oracle SQL "Select Date from Datetime Field "
Freetds and Unixodbc Character Converting
Exporting Binary File Data (Images) from SQL via a Stored Procedure
Freetds - Tsql Connects, Isql Fails
How to Avoid SQL Query Timeout
Get The Difference Between Two Dates Both in Months and Days in Sql
Solution for Speeding Up a Slow Select Distinct Query in Postgres
Can an SQL Procedure Return a Table
What Is The T-Sql Equivalent of MySQL Syntax Limit X, Y
Create View Using Linked Server Db in SQL Server
Is Limit Clause in Hive Really Random
Return SQL Rows Where Field Contains Only Non-Alphanumeric Characters
How to Get the Sum of All Column Values in the Last Row of a Resultset
Can You Replace or Update a SQL Constraint
Selecting Multiple Rows by Id, Is There a Faster Way Than Where In
Create a Sqlite View Where a Row Depends on The Previous Row