Count cumulative total in Postgresql
With larger datasets, window functions are the most efficient way to perform these kinds of queries -- the table will be scanned only once, instead of once for each date, like a self-join would do. It also looks a lot simpler. :) PostgreSQL 8.4 and up have support for window functions.
This is what it looks like:
SELECT created_at, sum(count(email)) OVER (ORDER BY created_at)
FROM subscriptions
GROUP BY created_at;
Here OVER
creates the window; ORDER BY created_at
means that it has to sum up the counts in created_at
order.
Edit: If you want to remove duplicate emails within a single day, you can use sum(count(distinct email))
. Unfortunately this won't remove duplicates that cross different dates.
If you want to remove all duplicates, I think the easiest is to use a subquery and DISTINCT ON
. This will attribute emails to their earliest date (because I'm sorting by created_at in ascending order, it'll choose the earliest one):
SELECT created_at, sum(count(email)) OVER (ORDER BY created_at)
FROM (
SELECT DISTINCT ON (email) created_at, email
FROM subscriptions ORDER BY email, created_at
) AS subq
GROUP BY created_at;
If you create an index on (email, created_at)
, this query shouldn't be too slow either.
(If you want to test, this is how I created the sample dataset)
create table subscriptions as
select date '2000-04-04' + (i/10000)::int as created_at,
'foofoobar@foobar.com' || (i%700000)::text as email
from generate_series(1,1000000) i;
create index on subscriptions (email, created_at);
Calculating Cumulative Sum in PostgreSQL
Basically, you need a window function. That's a standard feature nowadays. In addition to genuine window functions, you can use any aggregate function as window function in Postgres by appending an OVER
clause.
The special difficulty here is to get partitions and sort order right:
SELECT ea_month, id, amount, ea_year, circle_id
, sum(amount) OVER (PARTITION BY circle_id
ORDER BY ea_year, ea_month) AS cum_amt
FROM tbl
ORDER BY circle_id, ea_year, ea_month;
And no GROUP BY
.
The sum for each row is calculated from the first row in the partition to the current row - or quoting the manual to be precise:
The default framing option is
RANGE UNBOUNDED PRECEDING
, which is
the same asRANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
. WithORDER BY
, this sets the frame to be all rows from the partition
start up through the current row's lastORDER BY
peer.
Bold emphasis mine.
This is the cumulative (or "running") sum you are after.
In default RANGE
mode, rows with the same rank in the sort order are "peers" - same (circle_id, ea_year, ea_month)
in this query. All of those show the same running sum with all peers added to the sum. But I assume your table is UNIQUE
on (circle_id, ea_year, ea_month)
, then the sort order is deterministic and no row has peers. (And you might as well use the cheaper ROWS
mode.)
Postgres 11 added tools to include / exclude peers with the new frame_exclusion
options. See:
- Aggregating all values not in the same group
Now, ORDER BY ... ea_month
won't work with strings for month names. Postgres would sort alphabetically according to the locale setting.
If you have actual date
values stored in your table you can sort properly. If not, I suggest to replace ea_year
and ea_month
with a single column the_date
of type date
in your table.
Transform what you have with
to_date()
:to_date(ea_year || ea_month , 'YYYYMonth') AS the_date
For display, you can get original strings with
to_char()
:to_char(the_date, 'Month') AS ea_month
to_char(the_date, 'YYYY') AS ea_year
While stuck with the unfortunate design, this will work:
SELECT ea_month, id, amount, ea_year, circle_id
, sum(amount) OVER (PARTITION BY circle_id ORDER BY the_date) AS cum_amt
FROM (SELECT *, to_date(ea_year || ea_month, 'YYYYMonth') AS the_date FROM tbl)
ORDER BY circle_id, mon;
Postgresql - Cumulative sum of created users
I'd take a two-step approach. First, use an inner query to count how many users were created each month. Then, wrap this query with another query that calculates the cumulative sum of these counts:
SELECT created_at, SUM(cnt) OVER (ORDER BY created_at ASC)
FROM (SELECT TO_CHAR(created_at, 'YYYY-MM') AS created_at, COUNT(*) AS cnt
FROM users
GROUP BY TO_CHAR(created_at, 'YYYY-MM')) t
ORDER BY 1 ASC;
Running Count Total with PostgresQL
Just the answer posted to close the question:
-- Set "1" for counting to be used later
WITH DATA AS (
SELECT
orders.id,
orders.client_id,
orders.deliver_on,
COUNT(1) -- Creates a column of "1" for counting the occurrences
FROM orders
GROUP BY 1
ORDER BY deliver_on, client_id
)
SELECT
id,
client_id,
deliver_on,
SUM(COUNT) OVER (PARTITION BY client_id
ORDER BY client_id, deliver_on
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) -- Counts the sequential client_ids based on the number of times they appear
FROM DATA
SQL - cumulative sum in postgres
You can use window functions and subtraction:
select t.*,
( initial_amount +
sum(amount) over (partition by item order by date_of_purchase)
) as cumulative
from t;
Related Topics
Difference Between Filtering Queries in Join and Where
SQL Query for Getting Data for Last 3 Months
Script All Data from SQL Server Database
Is There a Postgres Closest Operator
How to Combine Aggregate Functions in MySQL
Oracle Trigger Error Ora-04091
Convert a String to Int Using SQL Query
How to Select Data of a Table from Another Database in SQL Server
"This SQLtransaction Has Completed; It Is No Longer Usable."... Configuration Error
Copy Data from One Column to Other Column (Which Is in a Different Table)
SQL Error: Ora-00942 Table or View Does Not Exist
Select Latest Row for Each Group from Oracle
T-SQL Calculate Moving Average
Extbase - Get Created SQL from Query
How to Split a Varchar Column as Multiple Values in SQL