Rolling 90 Days Active Users in Bigquery, Improving Preformance (Dau/Mau/Wau)

Monthly active users in ClickHouse

You could try something like this:

SELECT
created_at,
uniqExact(user_id) AS mau
FROM
(
SELECT
created_at + n AS created_at,
user_id
FROM
(
SELECT
today() - 14 AS created_at,
123 AS user_id
UNION ALL
SELECT
today() - 20 AS created_at,
456 AS user_id
)
ARRAY JOIN range(30) AS n
)
WHERE created_at <= today()
GROUP BY created_at
FORMAT TSV

2019-04-11 1
2019-04-12 1
2019-04-13 1
2019-04-14 1
2019-04-15 1
2019-04-16 1
2019-04-17 2
2019-04-18 2
2019-04-19 2
2019-04-20 2
2019-04-21 2
2019-04-22 2
2019-04-23 2
2019-04-24 2
2019-04-25 2
2019-04-26 2
2019-04-27 2
2019-04-28 2
2019-04-29 2
2019-04-30 2
2019-05-01 2

21 rows in set. Elapsed: 0.005 sec.

Weekly Active Users for each day from log

To get a "Weekly Average User" count (per my understanding of your specification... "for each day, the count of distinct user_ids seen during that day and the previous six days"), a query along the lines of the one below could be used. (The query also returns the "Daily Average User" count.

SELECT d.day
, COUNT(DISTINCT u.user_id) AS wau
, COUNT(DISTINCT IF(u.day=d.day,u.user_id,NULL)) AS dau
FROM ( SELECT FLOOR(k.ts/86400) AS `day`
FROM `log` k
GROUP BY `day`
) d
JOIN ( SELECT FLOOR(l.ts/86400) AS `day`
, l.user_id
FROM `log` l
GROUP BY `day`, l.user_id
) u
ON u.day <= d.day
AND u.day > d.day - 7
GROUP BY d.day
ORDER BY d.day

(I have not yet run a test of this; but I will later, and I will update this statement if any corrections are needed.)

This query is joining the list of users for a given day (from the u rowsource), to a set of days from the log table (the d rowsource). Note the literal "7" that appears in the join predicate (the ON clause), that's what's getting the user list "matched" to the previous 6 days.

Note that this could also be extended to get the distinct user count over the past 3 days, for example, by adding another expression in the SELECT list.

     , COUNT(DISTINCT IF(u.day<=d.day AND u.day>d.day-3,u.user_id,NULL)) AS 3day

That literal "7" could be increased to get a larger range. And that literal 3 in the expression above could be changed to get any number of days... we just need to be sure we've got enough previous day rows (from d) joined to each row from u.

PERFORMANCE NOTE: Due to the inline views (or derived tables, as MySQL calls them), this query may not be very fast, since the resultsets for those inline views has to be materialized into intermediate MyISAM tables.

The inline view aliased as u may not be optimal; it might be faster to join directly to the log table. I was thinking in terms of getting a unique list of users for a given day, which is what that query in the inline view got me. It was just easier for me to conceptualize what was going on. And I was thinking that if you had hundreds of the same user entered for day, the inline view would weed out a whole bunch of the duplicates, before we did the join to the other days.
A WHERE clause to limit the number of days we are returning would be best added inside the u and d inline views. (The d inline view would need to include an extra earlier 6 days.)


On another note, if ts column is TIMESTAMP datatype, I would be more inclined to use a DATE(ts) expression to extract the date portion. But that would return a DATE datatype in the resultset, rather than an integer, which would be different from the resultset you specified.)

SELECT d.day
, COUNT(DISTINCT u.user_id) AS wau
, COUNT(DISTINCT IF(u.day=d.day,u.user_id,NULL)) AS dau
FROM ( SELECT DATE(k.ts) AS `day`
FROM `log` k
GROUP BY `day`
) d
JOIN ( SELECT DATE(l.ts) AS `day`
, l.user_id
FROM `log` l
GROUP BY `day`, l.user_id
) u
ON u.day <= d.day
AND u.day > DATE_ADD(d.day, INTERVAL -7 DAY)
GROUP BY d.day
ORDER BY d.day



Related Topics



Leave a reply



Submit