How to Add a Running Count to Rows in a 'streak' of Consecutive Days

How to add a running count to rows in a 'streak' of consecutive days

Building on this table (not using the SQL keyword "date" as column name.):

CREATE TABLE tbl(
pid int
, the_date date
, PRIMARY KEY (pid, the_date)
);

Query:

SELECT pid, the_date
, row_number() OVER (PARTITION BY pid, grp ORDER BY the_date) AS in_streak
FROM (
SELECT *
, the_date - '2000-01-01'::date
- row_number() OVER (PARTITION BY pid ORDER BY the_date) AS grp
FROM tbl
) sub
ORDER BY pid, the_date;

Subtracting a date from another date yields an integer. Since you are looking for consecutive days, every next row would be greater by one. If we subtract row_number() from that, the whole streak ends up in the same group (grp) per pid. Then it's simple to deal out number per group.

grp is calculated with two subtractions, which should be fastest. An equally fast alternative could be:

the_date - row_number() OVER (PARTITION BY pid ORDER BY the_date) * interval '1d' AS grp

One multiplication, one subtraction. String concatenation and casting is more expensive. Test with EXPLAIN ANALYZE.

Don't forget to partition by pid additionally in both steps, or you'll inadvertently mix groups that should be separated.

Using a subquery, since that is typically faster than a CTE. There is nothing here that a plain subquery couldn't do.

And since you mentioned it: dense_rank() is obviously not necessary here. Basic row_number() does the job.

Counting consecutive days for all items in SQL

one thing you could do is self join the table to itself on consecutive days and count it. note I add one to the count because it wont count the first day

SELECT MIN(e.date_created) as date_created, e.username, COUNT(e.username) + 1 AS streak
FROM example e
LEFT JOIN example ee
ON e.username = ee.username
AND DATE(e.date_created) = DATE(DATE_ADD(ee.date_created, INTERVAL -1 DAY))
WHERE ee.username IS NOT NULL
GROUP BY e.username;

Sql Fiddle

MySQL count consecutive dates for current streak

The query keeps the streak count in a variable and as soon as there's a gap it resets the count to a large negative. It then returns the largest streak.

Depending on how many votes a user can have you might need to change -99999 to a larger (negative) value.

select if(max(maxcount) < 0, 0, max(maxcount)) streak
from (
select
if(datediff(@prevDate, datecreated) = 1, @count := @count + 1, @count := -99999) maxcount,
@prevDate := datecreated
from votes v cross join
(select @prevDate := date(curdate() + INTERVAL 1 day), @count := 0) t1
where username = 'bob'
and datecreated <= curdate()
order by datecreated desc
) t1;

http://sqlfiddle.com/#!2/37129/6

Update

Another variation

select * from (
select datecreated,
@streak := @streak+1 streak,
datediff(curdate(),datecreated) diff
from votes
cross join (select @streak := -1) t1
where username = 'bob'
and datecreated <= curdate()
order by datecreated desc
) t1 where streak = diff
order by streak desc limit 1

http://sqlfiddle.com/#!2/c6dd5b/20

Note, fiddle will only return correct streaks if run at the date of this post :)

Update 2

The query below works with tables that allow multiple votes per day by the same user by selecting from a derived table where duplicate dates are removed.

select * from (
select date_created,
@streak := @streak+1 streak,
datediff(curdate(),date_created) diff
from (
select distinct date(date_created) date_created
from votes where username = 'pinkpopcold'
) t1
cross join (select @streak := -1) t2
order by date_created desc
)
t1 where streak = diff
order by streak desc limit 1

http://sqlfiddle.com/#!2/5fc6d/7

You may want to replace select * with select streak + 1 depending on whether you want to include the 1st vote in the streak.

Calculate consecutive days posting in Rails

I'm not sure if it's the best way, but here's one way to do it in SQL. First, take a look at the following query.

SELECT
series_date,
COUNT(posts.id) AS num_posts_on_date
FROM generate_series(
'2014-12-01'::timestamp,
'2014-12-17'::timestamp,
'1 day'
) AS series_date
LEFT OUTER JOIN posts ON posts.created_at::date = series_date
GROUP BY series_date
ORDER BY series_date DESC;

We use generate_series to generate a range of dates starting on 2014-12-01 and ending 2014-12-17 (today). Then we do a LEFT OUTER JOIN with our posts table. This gives us one row for every day in the range, with the number of posts on that day in the num_posts_on_date column. The results looks like this (SQL Fiddle here):

 series_date                     | num_posts_on_date
---------------------------------+-------------------
December, 17 2014 00:00:00+0000 | 1
December, 16 2014 00:00:00+0000 | 1
December, 15 2014 00:00:00+0000 | 2
December, 14 2014 00:00:00+0000 | 1
December, 13 2014 00:00:00+0000 | 0
December, 12 2014 00:00:00+0000 | 0
... | ...
December, 01 2014 00:00:00+0000 | 0

Now we know there's a post on every day from Dec. 14–17, so if today's Dec. 17 we know the current "streak" is 4 days. We could do some more SQL to get e.g. the longest streak, as described in this article, but since we're only interested in the length of the "current" streak, it'll just take a small change. All we have to do is change our query to get only the first date for which num_posts_on_date is 0 (SQL Fiddle):

SELECT series_date
FROM generate_series(
'2014-12-01'::timestamp,
'2014-12-17'::timestamp,
'1 day'
) AS series_date
LEFT OUTER JOIN posts ON posts.created_at::date = series_date
GROUP BY series_date
HAVING COUNT(posts.id) = 0
ORDER BY series_date DESC
LIMIT 1;

And the result:

 series_date
---------------------------------
December, 13 2014 00:00:00+0000

But since we actually want the number of days since the last day with no posts, we can do that in SQL too (SQL Fiddle):

SELECT ('2014-12-17'::date - series_date::date) AS days
FROM generate_series(
'2014-12-01'::timestamp,
'2014-12-17'::timestamp,
'1 day'
) AS series_date
LEFT OUTER JOIN posts ON posts.created_at::date = series_date
GROUP BY series_date
HAVING COUNT(posts.id) = 0
ORDER BY series_date DESC
LIMIT 1;

Result:

 days
------
4

There you go!

Now, how to apply it to our Rails code? Something like this:

qry = <<-SQL
SELECT (CURRENT_DATE - series_date::date) AS days
FROM generate_series(
( SELECT created_at::date FROM posts
WHERE posts.user_id = :user_id
ORDER BY created_at
ASC LIMIT 1
),
CURRENT_DATE,
'1 day'
) AS series_date
LEFT OUTER JOIN posts ON posts.user_id = :user_id AND
posts.created_at::date = series_date
GROUP BY series_date
HAVING COUNT(posts.id) = 0
ORDER BY series_date DESC
LIMIT 1
SQL

Post.find_by_sql([ qry, { user_id: some_user.id } ]).first.days # => 4

As you can see, we added a condition to restrict results by user_id, and replaced our hard-coded dates with a query that gets the date of the user's first post (the sub-select inside the generate_series function) for the beginning of the range and CURRENT_DATE for the end of the range.

That last line is a little funny because find_by_sql will return an array of Post instances, so you then have to call days on the first one in the array on to get the value. Alternatively, you could do something like this:

sql = Post.send(:sanitize_sql, [ qry, { user_id: some_user.id } ])
result_value = Post.connection.select_value(sql)
streak_days = Integer(result_value) rescue nil # => 4

Within ActiveRecord it can be made a little cleaner:

class Post < ActiveRecord::Base
USER_STREAK_DAYS_SQL = <<-SQL
SELECT (CURRENT_DATE - series_date::date) AS days
FROM generate_series(
( SELECT created_at::date FROM posts
WHERE posts.user_id = :user_id
ORDER BY created_at ASC
LIMIT 1
),
CURRENT_DATE,
'1 day'
) AS series_date
LEFT OUTER JOIN posts ON posts.user_id = :user_id AND
posts.created_at::date = series_date
GROUP BY series_date
HAVING COUNT(posts.id) = 0
ORDER BY series_date DESC
LIMIT 1
SQL

def self.user_streak_days(user_id)
sql = sanitize_sql [ USER_STREAK_DAYS_SQL, { user_id: user_id } ]
result_value = connection.select_value(sql)
Integer(result_value) rescue nil
end
end

class User < ActiveRecord::Base
def post_streak_days
Post.user_streak_days(self)
end
end

# And then...
u = User.find(123)
u.post_streak_days # => 4

The above is untested, so it'll likely take some fiddling to make it work, but I hope it points you in the right direction at least.

Calculate Running count in SQL Server excluding certain rows

@HHH, I added another temp table around @TAB. This works, Please test it and tell.

DECLARE @TAB2 TABLE (MASTERID INT IDENTITY(1,1),ID INT,DT DATE,DAY VARCHAR(15),ATTENDANCE BIT)
INSERT INTO @TAB2
SELECT * FROM @TAB WHERE DAY IS NULL

SELECT Y.*,
LU2.Streak
FROM @TAB Y LEFT JOIN (
SELECT X.ID, X.MASTERID - LU.FROMID + 1 [Streak]
FROM @TAB2 X LEFT JOIN
(
SELECT (SELECT MIN(MASTERID) FROM @TAB2) FROMID,MIN(MASTERID) TOID FROM @TAB2 WHERE ATTENDANCE = 0
UNION
SELECT A.MASTERID FROMID,B.MASTERID TOID
FROM (SELECT MASTERID,ROW_NUMBER() OVER (ORDER BY MASTERID) R FROM @TAB2 WHERE ATTENDANCE = 0) A CROSS JOIN
(SELECT MASTERID,ROW_NUMBER() OVER (ORDER BY MASTERID) R FROM @TAB2 WHERE ATTENDANCE = 0) B
WHERE A.R = (B.R - 1)
UNION
SELECT MAX(MASTERID),(SELECT MAX(MASTERID) FROM @TAB2) FROM @TAB2 WHERE ATTENDANCE = 0
UNION
SELECT MAX(MASTERID),MAX(MASTERID) + 1 FROM @TAB2
) LU
ON X.MASTERID >= LU.FROMID AND X.MASTERID < LU.TOID ) LU2
ON Y.ID = LU2.ID

RESULT:

Sample Image

Calculate the streaks of visit of users limited to 7

My thinking ran along similar lines to forpas':

SELECT user_id, COUNT(*) streak
FROM
(
SELECT
user_id, streak,
FLOOR((ROW_NUMBER() OVER (PARTITION BY user_id, streak ORDER BY visit_date)-1)/7) substreak
FROM
(
SELECT
user_id, visit_date,
SUM(runtot) OVER (PARTITION BY user_id ORDER BY visit_date) streak
FROM (
SELECT
user_id, visit_date,
CASE WHEN DATE_ADD(visit_date, INTERVAL -1 DAY) = LAG(visit_date) OVER (PARTITION BY user_id ORDER BY visit_date) THEN 0 ELSE 1 END as runtot
FROM visitors_data
GROUP BY user_id, visit_date
) x
) y
) z
GROUP BY user_id, streak, substreak

As an explanation of how this works; a usual trick for counting runs of successive records is to use LAG to examine the record before and if there is only e.g. one day difference then put a 0, otherwise put a 1. This then means the first record of a consecutive run is 1, and the rest are 0, so the column ends up looking like ​1,0,0,0,1,0... SUM OVER ORDER BY sums this in a "running total" fashion. This effectively means it forms a counter that ticks up every time the start of a run is encountered so a run of 4 days followed by a gap then a run of 3 days looks like 1,1,1,1,2,2,2 etc and it forms a "streak ID number".

If this is then fed into a row numbering that partitions by the streak ID number, it establishes an incrementing counter that restarts every time the streak ID changes. If we sub 1 off this so it runs from 0 instead of 1 then we can divide it by 7 to get a "sub streak ID" for our 9-long streak that is 0,0,0,0,0,0,0,1,1 (and so on. A streak of 25 would have 7 zeroes, 7 ones, 7 twos, and 4 threes)

All that remains then is to group by the user, the streak ID, the substreakID and count the result

Before the final group and count the data looks like:

Sample Image

Which should give some idea of how it all works

PostgreSQL: find number of consecutive days up until now

with t as (
SELECT distinct(uca.created_at::date) as created_at
FROM user_challenge_activities as uca
INNER JOIN user_challenges as uc ON user_challenge_id = uc.ID
WHERE uc.user_id = #{user.id}
)
select count(*)
from t
where t.create_at > (
select d.d
from generate_series('2010-01-01'::date, CURRENT_DATE, '1 day') d(d)
left outer join t on t.created_at = d.d::date
where t.created_at is null
order by d.d desc
limit 1
)

Return rows of the latest 'streak' of data

Assuming (as you don't tell) that

  • there are exactly two distinct values for result: (W, L).
  • id is sequential in the sense that the latest entry has the highest id.

This would do the job:

SELECT *
FROM tbl
WHERE id > (
SELECT max(id)
FROM tbl
GROUP BY result
ORDER BY max(id)
LIMIT 1
);

This gets the latest id for W and L, the earlier of the two first. So a LIMIT 1 gets the last entry of the opposite outcome. Rows with an id higher than that form the latest streak. Voilá.



Related Topics



Leave a reply



Submit