Finding The Decade with Largest Records, SQL Server

Finding the decade with largest records, SQL Server

You can use the LEFT function in SQL Server to get the decade from the year. The decade is the first 3 digits of the year. You can group by the decade and then count the number of movies. If you sort, or order, the results by the number of movies - the decade with the largest number of movies will be at the top. For example:

select
count(id) as number_of_movies,
left(cast([year] as varchar(4)), 3) + '0s' as decade
from movies
group by left(cast([year] as varchar(4)), 3)
order by number_of_movies desc

Query for find the decade with the largest number of records

I would do this by generating the years, joining in the movies, and then aggregating:

select y.year as decade_start, y.year + 9 as decade_end,
count(*) as num_movies
from (select distinct year from movies) y join
movies m
on m.year >= y.year and m.year < y.year + 10
group by y.year
order by count(*) desc
limit 1;

SQL to Count Records that Existed by Decade

You can also take advantage of the connect by clause to do that.
I assume you don't have any duplicate rows (these columns FILE_NUM, START_DATE, END_DATE should be unique) in your real data.

SELECT dec_start, dec_end, COUNT(*) nb
FROM (
SELECT t.*, level
, 10 * trunc( extract ( year from (START_DATE) ) / 10 ) + 10 * LEVEL - 10 dec_start
, 10 * trunc( extract ( year from (START_DATE) ) / 10 ) + 10 * LEVEL - 1 dec_end
FROM YourTable T
CONNECT BY
10 * TRUNC( EXTRACT ( YEAR FROM (START_DATE) ) / 10 ) + 10 * LEVEL - 10
< 10 * CEIL( EXTRACT ( YEAR FROM (END_DATE) ) / 10 )
AND PRIOR FILE_NUM = FILE_NUM
AND PRIOR START_DATE = START_DATE
AND PRIOR END_DATE = END_DATE
AND PRIOR SYS_GUID() IS NOT NULL
)
group by dec_start, dec_end
order by dec_start, dec_end
;

demo on db<>fiddle

Top-N By Decade for successive decades (in SQL Server)

If I followed you correctly, you want to top 5 per decade. If so:

  • you would need to group by decade rather than by calendar year to get the proper counts; it is easier to compute the decade in a subquery so you don't have to repeat the case expression

  • the rank should be computed over decade partitions rather than per year

  • you can then use that column to filter in an outer query

Consider:

select *
from (
select
dtd.documenttitle as title,
rank() over (partition by dd.decade order by count(*) desc) as rnk,
count(*) as number_of_occurrences,
dd.decade
from tbldocumentTitleDimension dtd
inner join tblDocumentFact df on dtd.documenttitleid = df.documenttitleid
inner join (
select
dateid,
case
when calendarYear between 2010 and 2019 then '2010 - 2019'
when calendarYear between 2000 and 2009 then '2000 - 2009'
when calendarYear between 1990 and 1999 then '1990 - 1999'
when calendarYear between 1980 and 1989 then '1980 - 1989'
when calendarYear between 1970 and 1979 then '1970 - 1979'
when calendarYear between 1960 and 1969 then '1960 - 1969'
else 'all others'
end AS decade
from tblDateDimension
) dd on df.publicationdateid = dd.dateid
group by dtd.documenttitle, dd.decade
) t
where rnk <= 5
order by decade, number_of_occurrences desc

Side notes:

  • don't use single quotes for identifiers (although SQL Server allows that, single quotes should be reserved for litteral stings, as defined in the SQL standard) - better yet, you can use identifiers that do not require quoting

  • in a multi-table query, always qualify all column names with the table they belong to; I made a few assumptions here

  • unless you have null values in column documentTitle that you don't want to count in, you can use count(*) instead of count(documentTitle) - this is straight-forward, and more efficient

SQL Query to return maximums over decades

SELECT
Lookup.DecadeID,
Data.*
FROM
(
SELECT
truncate(yearid/10,0) as decadeID,
MAX(HR) as Homers
FROM
masterplusbatting
GROUP BY
truncate(yearid/10,0)
)
AS lookup
INNER JOIN
masterplusbatting AS data
ON data.yearid >= lookup.decadeID * 10
AND data.yearid < lookup.decadeID * 10 + 10
AND data.HR = lookup.homers

Editted for MySQL

SQL query to find dates where more records were active

Test table and data:

create table startend ( prod, startdate, enddate )
as
select 'a', date'1789-04-01', date'1799-12-14' from dual union all
select 'b', date'1797-03-04', date'1826-07-04' from dual union all
select 'c', date'1801-03-04', date'1826-07-04' from dual union all
select 'd', date'1809-03-04', date'1836-06-28' from dual union all
select 'e', date'1817-03-04', date'1831-07-04' from dual ;

SQL> select * from startend;
PROD STARTDATE ENDDATE
a 01-APR-89 14-DEC-99
b 04-MAR-97 04-JUL-26
c 04-MAR-01 04-JUL-26
d 04-MAR-09 28-JUN-36
e 04-MAR-17 04-JUL-31

Let's assume that we need to find/examine every possible combination of STARTDATE and ENDDATE. We could use a JOIN like the one in the inline view below. In this query, the rownum values have been renamed to: ERA (and will be used for GROUP BY at a later stage).

  select 
to_char( startdate, 'YYYY-MM-DD') start_
, to_char( enddate, 'YYYY-MM-DD') end_
, enddate - startdate as duration
, rownum as era
from (
select distinct
S1.startdate
, S2.enddate
from startend S1
join startend S2 on S1.startdate < S2.enddate
)
;

-- result
START_ END_ DURATION ERA
---------- ---------- ---------- ----------
1789-04-01 1836-06-28 17254 1
1789-04-01 1826-07-04 13607 2
1801-03-04 1831-07-04 11079 3
1809-03-04 1836-06-28 9978 4
1817-03-04 1836-06-28 7056 5
1817-03-04 1831-07-04 5235 6
1801-03-04 1826-07-04 9253 7
1809-03-04 1826-07-04 6331 8
1789-04-01 1831-07-04 15433 9
1797-03-04 1799-12-14 1015 10
1797-03-04 1826-07-04 10713 11
1797-03-04 1831-07-04 12539 12
1817-03-04 1826-07-04 3409 13
1789-04-01 1799-12-14 3909 14
1797-03-04 1836-06-28 14360 15
1801-03-04 1836-06-28 12900 16
1809-03-04 1831-07-04 8157 17

17 rows selected.

The conditions you need seem to be as follows (see the WHERE clause):

-- test dates: from your question
select prod
from startend
where startdate <= date'1817-03-04' and startdate < date'1826-07-04'
and enddate > date'1817-03-04' and enddate >= date'1826-07-04'
;

-- result
b
c
d
e

Final step: combine the ideas behind the first 2 queries, something like (Oracle 11g):

select count(*)                        as "prod_count"
, to_char( E.startdate, 'YYYY-MM-DD' ) as "StartDate"
, to_char( E.enddate, 'YYYY-MM-DD' ) as "EndDate"
from
(
select startdate, enddate, rownum as era
from
(
select distinct
S1.startdate
, S2.enddate
from startend S1 join startend S2 on S1.startdate < S2.enddate
)
) E
join
(
select distinct prod, startdate, enddate from startend
) P
on
( P.startdate <= E.startdate and P.startdate < E.enddate )
and ( P.enddate > E.startdate and P.enddate >= E.enddate )
--
group by era, E.startdate, E.enddate
order by 2, 3
;

Result

prod_count StartDate  EndDate   
---------- ---------- ----------
1 1789-04-01 1799-12-14
2 1797-03-04 1799-12-14
1 1797-03-04 1826-07-04
2 1801-03-04 1826-07-04
3 1809-03-04 1826-07-04
1 1809-03-04 1831-07-04
1 1809-03-04 1836-06-28
4 1817-03-04 1826-07-04
2 1817-03-04 1831-07-04
1 1817-03-04 1836-06-28

10 rows selected.

See also: dbfiddle here. When working with Oracle 12c (or 18c), you could use CROSS APPLY (instead of JOIN ... ON ...)

query to select count of records for each year

A simple method to get all years in the data -- even when they don't meet the conditions of the where clause -- is to use conditional aggregation:

select year(fact_date) as yyyy,
sum(case when stat = 1 and id = 16 then 1 else 0 end) as cnt_16
from tbl_fact
group by year(fact_date)
order by yyyy;


Related Topics



Leave a reply



Submit