How to Select Top X But Still Get a Count of the Whole Query

How do you Select TOP x but still get a COUNT of the whole query?

You can use COUNT(*) OVER()

SELECT TOP 20 *, 
COUNT(*) OVER() AS TotalMatchingRows
FROM master..spt_values
WHERE type='P'
ORDER BY number

Doing two queries may work out more efficient however especially if you have narrower indexes that can be used in determining the matching row count but don't cover the entire SELECT list.

Count, order desc and select top 5

In SQL Server you can use TOP to select a certain number of rows along with an order by to get the proper records:

select top 5 type, count(*) Total
from yourtable
group by type
order by total desc

See SQL Fiddle with Demo

SQL Server - Get the Total Count with the TOP 1 product

One option is with the WITH TiES clause

Select Top 1 with ties 
CustID
,CustName
,ProductId
,TotalQty
From (
Select C.CustID
,C.CustName
,O.ProductId
,TotalQty = count(O.CustId) over (Partition By O.CustID)
,ProdCount = count(O.CustId) over (Partition By O.CustID,O.ProductID)
From #Cust C
Left Join #Orders O on C.CustID=O.CustId
) A
Order by Row_Number() over (Partition By CustID Order by ProdCount Desc)

Returns

CustID  CustName    ProductId   TotalQty
1 Paul 1 3
2 F 1 1
3 Francis NULL 0

SQL query for finding records where count 1

Use the HAVING clause and GROUP By the fields that make the row unique

The below will find

all users that have more than one payment per day with the same account number

SELECT 
user_id,
COUNT(*) count
FROM
PAYMENT
GROUP BY
account,
user_id,
date
HAVING COUNT(*) > 1

Update
If you want to only include those that have a distinct ZIP you can get a distinct set first and then perform you HAVING/GROUP BY

 SELECT 
user_id,
account_no,
date,
COUNT(*)
FROM
(SELECT DISTINCT
user_id,
account_no,
zip,
date
FROM
payment
) payment
GROUP BY
user_id,
account_no,
date
HAVING COUNT(*) > 1

TSQL - TOP and COUNT in one SELECT

Right join trick.

SELECT TOP 1
CASE WHEN tbl_CommunicationElements.pk_Communication IS NULL THEN 0 ELSE 1 END hasAnsweredCom
, tbl_Communication.subject AS hasAnsweredComStepName
FROM tbl_Communication
JOIN tbl_CommunicationElements ON tbl_CommunicationElements.pk_Communication = tbl_Communication.pk_Communication
RIGHT JOIN (VALUES(1)) AS Ext(x) ON (
tbl_Communication.pk_Ticket = @pk_Ticket
AND tbl_Communication.isClosed = 0
AND tbl_Communication.pk_CommunicationType = (SELECT pk_CommunicationType
FROM tbl_CommunicationType
WHERE name = 'query')
)

Get top 1 row of each group

;WITH cte AS
(
SELECT *,
ROW_NUMBER() OVER (PARTITION BY DocumentID ORDER BY DateCreated DESC) AS rn
FROM DocumentStatusLogs
)
SELECT *
FROM cte
WHERE rn = 1

If you expect 2 entries per day, then this will arbitrarily pick one. To get both entries for a day, use DENSE_RANK instead

As for normalised or not, it depends if you want to:

  • maintain status in 2 places
  • preserve status history
  • ...

As it stands, you preserve status history. If you want latest status in the parent table too (which is denormalisation) you'd need a trigger to maintain "status" in the parent. or drop this status history table.

select top 1000, but know how many rows are there?

SELECT TOP 1000 x, y, z, COUNT(*) OVER () AS TotalCount
FROM dbo.table

Fastest way to count exact number of rows in a very large table?

Simple answer:

  • Database vendor independent solution = use the standard = COUNT(*)
  • There are approximate SQL Server solutions but don't use COUNT(*) = out of scope

Notes:

COUNT(1) = COUNT(*) = COUNT(PrimaryKey) just in case

Edit:

SQL Server example (1.4 billion rows, 12 columns)

SELECT COUNT(*) FROM MyBigtable WITH (NOLOCK)
-- NOLOCK here is for me only to let me test for this answer: no more, no less

1 runs, 5:46 minutes, count = 1,401,659,700

--Note, sp_spaceused uses this DMV
SELECT
Total_Rows= SUM(st.row_count)
FROM
sys.dm_db_partition_stats st
WHERE
object_name(object_id) = 'MyBigtable' AND (index_id < 2)

2 runs, both under 1 second, count = 1,401,659,670

The second one has less rows = wrong. Would be the same or more depending on writes (deletes are done out of hours here)

Run a query with a LIMIT/OFFSET and also get the total number of rows

Yes. With a simple window function:

SELECT *, count(*) OVER() AS full_count
FROM tbl
WHERE /* whatever */
ORDER BY col1
OFFSET ?
LIMIT ?

Be aware that the cost will be substantially higher than without the total number, but typically still cheaper than two separate queries. Postgres has to actually count all rows either way, which imposes a cost depending on the total number of qualifying rows. Details:

  • Best way to get result count before LIMIT was applied

However, as Dani pointed out, when OFFSET is at least as great as the number of rows returned from the base query, no rows are returned. So we also don't get full_count.

If that's not acceptable, a possible workaround to always return the full count would be with a CTE and an OUTER JOIN:

WITH cte AS (
SELECT *
FROM tbl
WHERE /* whatever */
)
SELECT *
FROM (
TABLE cte
ORDER BY col1
LIMIT ?
OFFSET ?
) sub
RIGHT JOIN (SELECT count(*) FROM cte) c(full_count) ON true;

You get one row of NULL values with the full_count appended if OFFSET is too big. Else, it's appended to every row like in the first query.

If a row with all NULL values is a possible valid result you have to check offset >= full_count to disambiguate the origin of the empty row.

This still executes the base query only once. But it adds more overhead to the query and only pays if that's less than repeating the base query for the count.

If indexes supporting the final sort order are available, it might pay to include the ORDER BY in the CTE (redundantly).



Related Topics



Leave a reply



Submit