How do I select TOP 5 PERCENT from each group?
You could use a CTE (Common Table Expression) paired with the NTILE
windowing function - this will slice up your data into as many slices as you need, e.g. in your case, into 20 slices (each 5%).
;WITH SlicedData AS
(
SELECT Category, Name, COUNT(Name) Total,
NTILE(20) OVER(PARTITION BY Category ORDER BY COUNT(Name) DESC) AS 'NTile'
FROM #TEMP
GROUP BY Category, Name
)
SELECT *
FROM SlicedData
WHERE NTile > 1
This basically groups your data by Category,Name
, orders by something else (not sure if COUNT(Name)
is really the thing you want here), and then slices it up into 20 pieces, each representing 5% of your data partition. The slice with NTile = 1
is the top 5% slice - just ignore that when selecting from the CTE.
See:
- MSDN docs on NTILE
- SQL Server 2005 ranking functions
- SQL SERVER – 2005 – Sample Example of RANKING Functions – ROW_NUMBER, RANK, DENSE_RANK, NTILE
for more info
select top 5 group by and order by
Maybe you want something like this?
select top 5 CITY, QNT, EXP, RATE
from (
select *, row_number() over (partition by CITY order by RATE desc) AS RN
from (
select CITY, QNT, EXP, (QNT-EXP)*100/EXP as RATE
from tbl_city
) X
) Y
where RN = 1
order by RATE desc
I didn't test this, but it should take first the row for the city with biggest rate, and then take top 5 rows so that that the same city is not duplicated
Postgresql : How do I select top n percent(%) entries from each group/category
To retrieve the rows based on the percentage of the number of rows in each group you can use two window functions: one to count the rows and one to give them a unique number.
select gp,
val
from (
select gp,
val,
count(*) over (partition by gp) as cnt,
row_number() over (partition by gp order by val desc) as rn
from temp
) t
where rn / cnt <= 0.75;
SQLFiddle example: http://sqlfiddle.com/#!15/94fdd/1
Btw: using char
is almost always a bad idea because it is a fixed-length data type that is padded to the defined length. I hope you only did that for setting up the example and don't use it in your real table.
SELECT TOP 10 of each group of a certain field with data across 2 tables
Use APPLY
. Simmiliar to JOIN
, but the applied sub-select (TopCaptures
) is executed once for every row in Sources
. So you can get top 10 captures per source.
Variant A: Using a CTE:
; WITH Sources AS (
SELECT SourceId
FROM Source
WHERE Type = 1
AND State = 'TX'
)
SELECT *
FROM Sources
OUTER APPLY (
SELECT TOP 10 *
FROM Captures
WHERE Captures.SourceId = Sources.SourceId
) AS TopCaptures
;
Variant B: Using another Sub-Select
SELECT *
FROM (
SELECT SourceId
FROM Source
WHERE Type = 1
AND State = 'TX'
) AS Sources
OUTER APPLY (
SELECT TOP 10 *
FROM Captures
WHERE Captures.SourceId = Sources.SourceId
) AS TopCaptures
;
Edit: If you want INNER JOIN
-like behaviour, use CROSS APPLY
instead of OUTER APPLY
: Using CROSS APPLY
, no Sources-rows will be returned, that do not have at least 1 Capture.
select top 30 percent of the entries for each day
You can use row_number()
with partition by date
and check against the 30% number of total count of each day.
select date,receipt,total
from (select *,
ceiling(tc * 30.00 / 100.00) as under30
from (select date,
receipt,
total,
row_number() over(partition by date order by (select null)) rn,
count(*) over(partition by date order by (select null)) tc
from sales) t
) t1
where rn <= under30
DEMO
Output:
+------------+---------+-------+
| date | receipt | total |
+------------+---------+-------+
| 2018-04-21 | 325 | 600 |
+------------+---------+-------+
| 2018-04-21 | 326 | 800 |
+------------+---------+-------+
| 2018-04-26 | 330 | 600 |
+------------+---------+-------+
| 2018-04-26 | 331 | 1080 |
+------------+---------+-------+
| 2018-04-29 | 334 | 600 |
+------------+---------+-------+
| 2018-05-01 | 336 | 1500 |
+------------+---------+-------+
Note: If you want 30% of of total count in that case you need to change your count calculation logic like following in the above query.
count(*) over(order by (select null)) tc
Get top n records for each group of grouped results
Here is one way to do this, using UNION ALL
(See SQL Fiddle with Demo). This works with two groups, if you have more than two groups, then you would need to specify the group
number and add queries for each group
:
(
select *
from mytable
where `group` = 1
order by age desc
LIMIT 2
)
UNION ALL
(
select *
from mytable
where `group` = 2
order by age desc
LIMIT 2
)
There are a variety of ways to do this, see this article to determine the best route for your situation:
http://www.xaprb.com/blog/2006/12/07/how-to-select-the-firstleastmax-row-per-group-in-sql/
Edit:
This might work for you too, it generates a row number for each record. Using an example from the link above this will return only those records with a row number of less than or equal to 2:
select person, `group`, age
from
(
select person, `group`, age,
(@num:=if(@group = `group`, @num +1, if(@group := `group`, 1, 1))) row_number
from test t
CROSS JOIN (select @num:=0, @group:=null) c
order by `Group`, Age desc, person
) as x
where x.row_number <= 2;
See Demo
How to extract the top x% of rows by group and number in R?
Here is a solution. It selects the top 30% values by groups of name
and then counts the rows that were selected in each group.
library(dplyr)
data %>%
group_by(name) %>%
arrange(name, value) %>%
top_frac(0.30) %>%
count(name)
#Selecting by value
## A tibble: 4 x 2
## Groups: name [4]
# name n
# <chr> <int>
#1 A 150
#2 B 300
#3 C 6
#4 D 30
It is possible to see that these numbers are in fact 30% of each group of name
with
data %>% count(name) %>% mutate(n = n*0.3)
# name n
#1 A 150
#2 B 300
#3 C 6
#4 D 30
If you want the top 30% values, without considering the group the top values come from, then the above must be changed to the following code.
data %>%
arrange(name, value) %>%
top_frac(0.30) %>%
count(name)
#Selecting by value
# name n
#1 A 46
#2 B 420
#3 C 20
Related Topics
Adding Extra Column to View, Which Is Not Present in Table
Update Multiple Records in Multiple Nested Tables in Oracle
Understanding Bitmap Indexes in Postgresql
Get Only Date Without Time in Oracle
Sql Azure Backup & Restore Strategy
What Is Wrong with My Update Statement with a Join in Oracle
Ibm Db2: Generate List of Dates Between Two Dates
How to Change Default Systemdate from Ymd to Dmy
Displaying Columns as Rows in SQL Server 2005
Sql Selecting "Window" Around Particular Row
Presto Sql: Changing Time Zones Using Time Zone String Coming as a Result of a Query Is Not Working
Does SQL Server 2008 Support The Create Assertion Syntax
In SQL Server, How to Convert Binary Strings to Binary
Generate_Series() Equivalent in Db2