Postgresql:How to Select Top N Percent(%) Entries from Each Group/Category

Postgresql : How do I select top n percent(%) entries from each group/category

To retrieve the rows based on the percentage of the number of rows in each group you can use two window functions: one to count the rows and one to give them a unique number.

select gp,
val
from (
select gp,
val,
count(*) over (partition by gp) as cnt,
row_number() over (partition by gp order by val desc) as rn
from temp
) t
where rn / cnt <= 0.75;

SQLFiddle example: http://sqlfiddle.com/#!15/94fdd/1


Btw: using char is almost always a bad idea because it is a fixed-length data type that is padded to the defined length. I hope you only did that for setting up the example and don't use it in your real table.

In Postgresql, how to select top n percent of rows by a column?

I would use a subquery:

select student_id, student_name, avg_grade, rank() over (order by avg_grade desc)
from (select s.student_id,
s.student_name,
avg(ce.grade) as avg_grade,
rank() over (order by avg(ce.grade) desc nulls last) as seqnum,
count(*) over () as cnt
from students s
left join
course_enrollment ce
on s.student_id = ce.student_id
group by s.student_id
) as ce_avg
where seqnum <= cnt * 0.1;

There are other window functions you can use instead, such as NTILE() and PERCENTILE_DISC(). I prefer the direct calculation because it gives more control over how ties are handled.

Postgresql count the value for top n % rows

If I understand correctly, you can use ntile() or a similar function:

select to_char(dt, 'dd.MM.yyyy') as "Date",
round(100.0*sum(case when len is null and t.doctype = 260 then 1 else 0 end)/SUM(CASE WHEN t.doctype = 260 THEN 1 ELSE 0 END)) as "% for 260",
round(100.0*sum(case when len is null and t.doctype = 980 then 1 else 0 end)/SUM(CASE WHEN t.doctype = 980 THEN 1 ELSE 0 END)) as "% for 980",
to_char(avg(len), 'HH24:MI:SS.MS') as "Avg answer time",
min(len) filter (where tiling = 10)
from (select date_trunc('day', sent_time) as dt, received_time-sent_time as len, doctype,
ntile(10) over (partition by date_trunc('day', sent_time), doctype order by received_time-sent_time) as tiling
from etp_msg_log
) t
group by dt
order by dt;

How do I select TOP 5 PERCENT from each group?

You could use a CTE (Common Table Expression) paired with the NTILE windowing function - this will slice up your data into as many slices as you need, e.g. in your case, into 20 slices (each 5%).

;WITH SlicedData AS
(
SELECT Category, Name, COUNT(Name) Total,
NTILE(20) OVER(PARTITION BY Category ORDER BY COUNT(Name) DESC) AS 'NTile'
FROM #TEMP
GROUP BY Category, Name
)
SELECT *
FROM SlicedData
WHERE NTile > 1

This basically groups your data by Category,Name, orders by something else (not sure if COUNT(Name) is really the thing you want here), and then slices it up into 20 pieces, each representing 5% of your data partition. The slice with NTile = 1 is the top 5% slice - just ignore that when selecting from the CTE.

See:

  • MSDN docs on NTILE
  • SQL Server 2005 ranking functions
  • SQL SERVER – 2005 – Sample Example of RANKING Functions – ROW_NUMBER, RANK, DENSE_RANK, NTILE

for more info

I execute following query but query is not running properly?

Your output tells me that you have one or more rows where the subcategory is null. To find them,

select *
from "Recon".fk_starchi
where sub_category is null;

To fix that data, you need to either update those rows with a valid subcategory, or delete those rows.

If you don't want to fix the data, you can suppress that unwanted row like this.

select sub_category, sum(quantity) 
from "Recon".fk_starchi
where sub_category is not null
group by sub_category;

Finding percent of total within a column

SELECT
*,
total_users * 100.0 / SUM(total_users) OVER () AS percentage_of_total
FROM
(
select source, count(*) as total_users
from table
where is_active = 1
and source in ('web','mobile')
group by source
)
totals_by_source

https://dbfiddle.uk/?rdbms=sqlserver_2017&fiddle=6c0af52dcb10b072b876ae593773e148

Select flag to indicate if each field in bottom 10 percent in access sql

Consider using derived tables, one for each KPI. You can even save each derived table as a separate saved query, replacing the nested SELECT statements in LEFT JOIN clauses. This would be a more efficient solution as no longer do you run correlated subqueries nested in IIF() for each row value.

Note: Access SQL requires parentheses wrapped for each JOIN parings; hence for complex queries it is better to predesign joins in query's Design View:

SELECT t.[Employee Number], t.[Full Name], t.[Business Area], 
t.[Absence], IIF(a.AbsenceEmpNum IS NOT NULL, 'Y', 'N') AS AbsenceFlag,
t.[Complaints], IIF(a.ComplaintsEmpNum IS NOT NULL, 'Y', 'N') AS ComplaintsFlag,
t.[Service Time], IIF(a.ServiceTimeEmpNum IS NOT NULL, 'Y', 'N') AS ServiceTimeFlag

FROM ((tblKPIScores t

LEFT JOIN
(SELECT TOP 10 PERCENT sub.[Employee Number] As AbsenceEmpNum
FROM tblKPIScores as sub
ORDER BY sub.Absence DESC) AS a
ON t.[Employee Number] = a.AbsenceEmpNum)

LEFT JOIN
(SELECT TOP 10 PERCENT sub.[Employee Number] As ComplaintsEmpNum
FROM tblKPIScores as sub
ORDER BY sub.Complaints DESC) AS c
ON t.[Employee Number] = c.ComplaintsEmpNum)

LEFT JOIN
(SELECT TOP 10 PERCENT sub.[Employee Number] As ServiceTimeEmpNum
FROM tblKPIScores as sub
ORDER BY sub.[Service Time] DESC) AS s
ON t.[Employee Number] = s.ServiceEmpNum


Related Topics



Leave a reply



Submit