Quartile/Percentile in MS Access via SQL with a GROUP BY when some values can be NULL
Just a missing WHERE clause near the bottom:
SELECT T.IU, 0.75*(SELECT Max(GM) FROM tblFirst250
WHERE tblFirst250.GM IN (SELECT TOP 25 PERCENT GM FROM tblFirst250 WHERE tblFirst250.IU = 1 AND GM Is Not Null ORDER BY GM)) + 0.25*(SELECT Min(GM) FROM tblFirst250 WHERE tblFirst250.GM IN (SELECT TOP 75 PERCENT GM FROM tblFirst250 WHERE tblFirst250.IU = 1 AND GM Is Not Null ORDER BY GM DESC)) AS 25Percentile
FROM tblFirst250 AS T
WHERE T.IU = 1
GROUP BY T.IU;
Add percentile (or quartile) calculation to existing SQL query
I am not sure about MS Access, but those replacements should do the trick:
WHERE tp5.[Cc EV PS] Is Not Null ->
WHERE tp5.[Cc EV PS] Is Not Null and tp5.[GICS Sector] = t1.[GICS Sector]
WHERE tp3.[Cc EV PS] Is Not Null ->
WHERE tp3.[Cc EV PS] Is Not Null and tp3.[GICS Sector] = t1.[GICS Sector]
Interquartile Range - Lower, Upper and Median
There might be an easier way, but to get Quartiles, you can use NTILE (Transact-SQL)
Distributes the rows in an ordered partition into a specified number of groups. The groups are numbered, starting at one. For each row, NTILE returns the number of the group to which the row belongs.
So for your data:
SELECT 1 Val
INTO #temp
UNION ALL
SELECT 1
UNION ALL
SELECT 5
UNION ALL
SELECT 6
UNION ALL
SELECT 7
UNION ALL
SELECT 8
UNION ALL
SELECT 2
UNION ALL
SELECT 4
UNION ALL
SELECT 7
UNION ALL
SELECT 9
UNION ALL
SELECT 9
UNION ALL
SELECT 9
UNION ALL
SELECT 9
-- NTILE(4) specifies you require 4 partitions (quartiles)
SELECT NTILE(4) OVER ( ORDER BY Val ) AS Quartile ,
Val
INTO #tempQuartiles
FROM #temp
SELECT *
FROM #tempQuartiles
DROP TABLE #temp
DROP TABLE #tempQuartiles
This would produce:
Quartile Val
1 1
1 1
1 2
1 4
2 5
2 6
2 7
3 7
3 8
3 9
4 9
4 9
4 9
From this you can work out what you're after.
So modifying the SELECT
you can do this:
SELECT Quartile, MAX(Val) MaxVal
FROM #tempQuartiles
WHERE Quartile <= 3
GROUP BY Quartile
To produce:
Quartile MaxVal
1 4
2 7
3 9
Function to Calculate Median in SQL Server
2019 UPDATE: In the 10 years since I wrote this answer, more solutions have been uncovered that may yield better results. Also, SQL Server releases since then (especially SQL 2012) have introduced new T-SQL features that can be used to calculate medians. SQL Server releases have also improved its query optimizer which may affect perf of various median solutions. Net-net, my original 2009 post is still OK but there may be better solutions on for modern SQL Server apps. Take a look at this article from 2012 which is a great resource: https://sqlperformance.com/2012/08/t-sql-queries/median
This article found the following pattern to be much, much faster than all other alternatives, at least on the simple schema they tested. This solution was 373x faster (!!!) than the slowest (PERCENTILE_CONT
) solution tested. Note that this trick requires two separate queries which may not be practical in all cases. It also requires SQL 2012 or later.
DECLARE @c BIGINT = (SELECT COUNT(*) FROM dbo.EvenRows);
SELECT AVG(1.0 * val)
FROM (
SELECT val FROM dbo.EvenRows
ORDER BY val
OFFSET (@c - 1) / 2 ROWS
FETCH NEXT 1 + (1 - @c % 2) ROWS ONLY
) AS x;
Of course, just because one test on one schema in 2012 yielded great results, your mileage may vary, especially if you're on SQL Server 2014 or later. If perf is important for your median calculation, I'd strongly suggest trying and perf-testing several of the options recommended in that article to make sure that you've found the best one for your schema.
I'd also be especially careful using the (new in SQL Server 2012) function PERCENTILE_CONT
that's recommended in one of the other answers to this question, because the article linked above found this built-in function to be 373x slower than the fastest solution. It's possible that this disparity has been improved in the 7 years since, but personally I wouldn't use this function on a large table until I verified its performance vs. other solutions.
ORIGINAL 2009 POST IS BELOW:
There are lots of ways to do this, with dramatically varying performance. Here's one particularly well-optimized solution, from Medians, ROW_NUMBERs, and performance. This is a particularly optimal solution when it comes to actual I/Os generated during execution – it looks more costly than other solutions, but it is actually much faster.
That page also contains a discussion of other solutions and performance testing details. Note the use of a unique column as a disambiguator in case there are multiple rows with the same value of the median column.
As with all database performance scenarios, always try to test a solution out with real data on real hardware – you never know when a change to SQL Server's optimizer or a peculiarity in your environment will make a normally-speedy solution slower.
SELECT
CustomerId,
AVG(TotalDue)
FROM
(
SELECT
CustomerId,
TotalDue,
-- SalesOrderId in the ORDER BY is a disambiguator to break ties
ROW_NUMBER() OVER (
PARTITION BY CustomerId
ORDER BY TotalDue ASC, SalesOrderId ASC) AS RowAsc,
ROW_NUMBER() OVER (
PARTITION BY CustomerId
ORDER BY TotalDue DESC, SalesOrderId DESC) AS RowDesc
FROM Sales.SalesOrderHeader SOH
) x
WHERE
RowAsc IN (RowDesc, RowDesc - 1, RowDesc + 1)
GROUP BY CustomerId
ORDER BY CustomerId;
Related Topics
Trying to Split One Column to Multiple Columns Using Snowflake SQL
Oracle Insert Select with Order By
How to Use % Operator from the Extension Pg_Trgm
SQL Server 2008: Bulk Datatype Change
SQL Order by Total Within Group By
SQL Server - Group Records by N Minutes Interval
How to Extract Values from Column and Update Result in Another Column
How to Skip Comma from CSV Using Double Quotes
Query for Searching the Name Alphabetically
Fill in the Date Gaps with Date Table
Cursor in Procedure Returning More Values Than Query
Combine Two SQL Queries in One Statement
Using a Single Row Configuration Table in SQL Server Database. Bad Idea
Dynamic Table Name in Select Statement
How to Perform the Same Aggregation on Every Column, Without Listing the Columns