Add Missing Data from Previous Month or Year Cumulatively

Cumulatively adding up sales from last years closing balance to current years opening balances

Oracle Setup:

CREATE TABLE sales ( id, name, grp, year, month, sales ) AS
SELECT 4, 'A', 'group1', '2015', '10', 20 FROM DUAL UNION ALL
SELECT 5, 'C', 'group2', '2015', '12', 89 FROM DUAL UNION ALL
SELECT 13, 'B', 'group2', '2016', '01', 10 FROM DUAL UNION ALL
SELECT 14, 'A', 'group3', '2016', '02', 8 FROM DUAL UNION ALL
SELECT 15, 'B', 'group1', '2016', '02', 16 FROM DUAL UNION ALL
SELECT 16, 'D', 'group2', '2016', '04', 15 FROM DUAL UNION ALL
SELECT 17, 'D', 'group4', '2016', '05', 23 FROM DUAL UNION ALL
SELECT 18, 'D', 'group3', '2016', '06', 39 FROM DUAL UNION ALL
SELECT 19, 'D', 'group3', '2016', '07', 45 FROM DUAL UNION ALL
SELECT 20, 'D', 'group3', '2016', '08', 12 FROM DUAL UNION ALL
SELECT 21, 'D', 'group4', '2016', '09', 20 FROM DUAL UNION ALL
SELECT 22, 'D', 'group3', '2016', '10', 4 FROM DUAL UNION ALL
SELECT 23, 'D', 'group3', '2016', '11', 98 FROM DUAL UNION ALL
SELECT 24, 'D', 'group4', '2016', '12', 70 FROM DUAL;

Query:

SELECT *
FROM (
SELECT year,
month,
grp,
COALESCE(
SUM( sales ) OVER ( PARTITION BY grp
ORDER BY dt
RANGE BETWEEN UNBOUNDED PRECEDING
AND INTERVAL '1' MONTH PRECEDING
),
0
) AS opening_balance,
SUM( sales ) OVER ( PARTITION BY grp
ORDER BY dt
RANGE BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW
) AS closing_balance
FROM (
SELECT s.*,
TO_DATE( year || month, 'YYYYMM' ) AS dt
FROM sales s
)
)
WHERE year = 2016
ORDER BY year, month, grp;

Query - Alternative without RANGE BETWEEN:

SELECT *
FROM (
SELECT year,
month,
grp,
SUM( sales ) OVER ( PARTITION BY grp
ORDER BY year, month )
- sales AS opening_balance,
SUM( sales ) OVER ( PARTITION BY grp
ORDER BY year, month
) AS closing_balance
FROM sales
)
WHERE year = 2016
ORDER BY year, month, grp;

Output:

YEAR MO GRP    OPENING_BALANCE CLOSING_BALANCE
---- -- ------ --------------- ---------------
2016 01 group2 89 99
2016 02 group1 20 36
2016 02 group3 0 8
2016 04 group2 99 114
2016 05 group4 0 23
2016 06 group3 8 47
2016 07 group3 47 92
2016 08 group3 92 104
2016 09 group4 23 43
2016 10 group3 104 108
2016 11 group3 108 206
2016 12 group4 43 113

Cumulative sum of values by month, filling in for missing months

This is very similar to other questions, but the best query is still tricky.

Basic query to get the running sum quickly:

SELECT to_char(date_trunc('month', date_added), 'Mon YYYY') AS mon_text
, sum(sum(qty)) OVER (ORDER BY date_trunc('month', date_added)) AS running_sum
FROM tbl
GROUP BY date_trunc('month', date_added)
ORDER BY date_trunc('month', date_added);

The tricky part is to fill in for missing months:

WITH cte AS (
SELECT date_trunc('month', date_added) AS mon, sum(qty) AS mon_sum
FROM tbl
GROUP BY 1
)
SELECT to_char(mon, 'Mon YYYY') AS mon_text
, sum(c.mon_sum) OVER (ORDER BY mon) AS running_sum
FROM (SELECT min(mon) AS min_mon FROM cte) init
, generate_series(init.min_mon, now(), interval '1 month') mon
LEFT JOIN cte c USING (mon)
ORDER BY mon;

The implicit CROSS JOIN LATERAL requires Postgres 9.3+. This starts with the first month in the table.

To start with a given month:

WITH cte AS (
SELECT date_trunc('month', date_added) AS mon, sum(qty) AS mon_sum
FROM tbl
GROUP BY 1
)
SELECT to_char(mon, 'Mon YYYY') AS mon_text
, COALESCE(sum(c.mon_sum) OVER (ORDER BY mon), 0) AS running_sum
FROM generate_series('2015-01-01'::date, now(), interval '1 month') mon
LEFT JOIN cte c USING (mon)
ORDER BY mon;

db<>fiddle here

Old sqlfiddle

Keeping months from different years apart. You did not ask for that, but you'll most likely want it.

Note that the "month" to some degree depends on the time zone setting of the current session! Details:

  • Ignoring time zones altogether in Rails and PostgreSQL

Related:

  • Calculating Cumulative Sum in PostgreSQL
  • PostgreSQL: running count of rows for a query 'by minute'
  • Postgres window function and group by exception

oracle missing expression when using Max

Move comma to another place:

ADD_MONTHS( (SELECT MAX(INSPECTION_DATE) AS MAX_DATE FROM DDS_MQC_QAP_VW) -26,))+1,-1)
^
here

ADD_MONTHS( (SELECT MAX(INSPECTION_DATE) AS MAX_DATE FROM DDS_MQC_QAP_VW), -26))+1,-1)

Add Data of previous months in next month,Historization of data in SAP HANA Calculation View

for the solution of this requirement, I can suggest you to use a numbers table on HANA database. In the following part, I've shared the SQLScript codes in which I've used this numbers table function for creating the series of dates in a specific range

with input as (
select
id,
status,
year,
cast(month as integer) month,
case when status = 'OPEN' then 1 else -1 end as COUNT_OPEN,
case when status = 'CLOSED' then 1 else 0 end as COUNT_CLOSED
from Historization
)
SELECT
y.rownum as year,
m.rownum as month,
(select sum(COUNT_OPEN) from input where year <= y.rownum and month <= m.rownum) as COUNT_OPEN,
(select sum(COUNT_CLOSED) from input where year <= y.rownum and month <= m.rownum) as COUNT_CLOSED
FROM Numbers_Table(2017) as y
CROSS JOIN Numbers_Table(12) as m
where y.rownum = 2017
order by y.rownum, m.rownum

You see I used the numbers function "Numbers_Table" in the outer SELECT statement which creates the FROM part of the join clause

For simulating the COUNTs, I assume that a record cannot be CLOSED before it is OPENED so, when it is in OPEN status I add 1 to OPEN state count.
While the record is closing, I add 1 to CLOSED state. But the trick here is, I use -1 for OPEN state for a closing item.
SUM() aggregation function helps me to get the final row data

The output is as follows,

Sample Image

Pyspark: For each month, make a cumulative sum of the previous 3 months

Sample DataFrame:

df.show()
+------+-----+
| Month|Value|
+------+-----+
|Jan/19| 1|
|Feb/19| 0|
|Mar/19| 4|
|Apr/19| 5|
|May/19| 0|
|Jun/19| 10|
+------+-----+

You can use window function, but you need to convert your month column to a proper timestamp format, and then cast that to long to compute range(3months) based on unix time or timestamp in seconds. You can partitionBy your grouping columns in your real data. (86400 is 1 day in seconds).

from pyspark.sql import functions as F
from pyspark.sql.window import Window

w=Window().orderBy(F.col("Month").cast("long")).rangeBetween(-(86400*89), 0)
df\
.withColumn("Month", F.to_timestamp("Month","MMM/yy"))\
.withColumn("Sum", F.sum("Value").over(w)).show()

+-------------------+-----+---+
| Month|Value|Sum|
+-------------------+-----+---+
|2019-01-01 00:00:00| 1| 1|
|2019-02-01 00:00:00| 0| 1|
|2019-03-01 00:00:00| 4| 5|
|2019-04-01 00:00:00| 5| 9|
|2019-05-01 00:00:00| 0| 9|
|2019-06-01 00:00:00| 10| 15|
+-------------------+-----+---+

If you would like to go back 3 months only in each year. Meaning Jan/19 will only have Jan/19 value. For this case, you should use a partitionBy of Year, and orderBy month number, and rangeBetween -2 and 0.

w=Window().partitionBy(F.year("Month")).orderBy(F.month("Month")).rangeBetween(-2, 0)
df\
.withColumn("Month", F.to_timestamp("Month","MMM/yy"))\
.withColumn("Sum", F.sum("Value").over(w)).show()

+-------------------+-----+---+
| Month|Value|Sum|
+-------------------+-----+---+
|2019-01-01 00:00:00| 1| 1|
|2019-02-01 00:00:00| 0| 1|
|2019-03-01 00:00:00| 4| 5|
|2019-04-01 00:00:00| 5| 9|
|2019-05-01 00:00:00| 0| 9|
|2019-06-01 00:00:00| 10| 15|
+-------------------+-----+---+

SQL Server - Cumulative Sum over Last 12 Months, but starting from the Last Month (SQL Server 18)

You seem to understand window functions pretty well. You just have to adjust the window frame:

SUM(Amount) OVER (PARTITION BY ID
ORDER BY Date_Month
ROWS BETWEEN 12 PRECEDING AND 1 PRECEDING
)

Dynamically adding zero-valued records for subsequent APs for analytical function to work

A fairly simplistic approach (and similar to what that AskTom link shows) is to extract all the year/month pairs, and all the name/r_group pairs, and then cross-join those:

with data as (
select 1 id, 'A' name, 'fruit' r_group, '2007' year, '04' month, 5 sales from dual union all
select 2 id, 'Z' name, 'fruit' r_group, '2007' year, '04' month, 99 sales from dual union all
select 3 id, 'A' name, 'fruit' r_group, '2008' year, '05' month, 10 sales from dual union all
select 4 id, 'B' name, 'vegetable' r_group, '2008' year, '07' month, 20 sales from dual
)
select a.year, a.month, b.name, b.r_group, nvl(d.sales, 0) as sales
from (select distinct year, month from data) a
cross join (select distinct name, r_group from data) b
left join data d on d.year = a.year and d.month = a.month and d.name = b.name and d.r_group = b.r_group
order by year, month, name, r_group;

YEAR MO N R_GROUP SALES
---- -- - --------- ----------
2007 04 A fruit 5
2007 04 B vegetable 0
2007 04 Z fruit 99
2008 05 A fruit 10
2008 05 B vegetable 0
2008 05 Z fruit 0
2008 07 A fruit 0
2008 07 B vegetable 20
2008 07 Z fruit 0

But that produces more rows than you wanted with your first level fo aggregation:

YEAR MO N R_GROUP        SALES    OPENING    CLOSING
---- -- - --------- ---------- ---------- ----------
2007 04 A fruit 5 0 5
2007 04 B vegetable 0 0 0
2007 04 Z fruit 99 0 99
2008 05 A fruit 10 5 15
2008 05 B vegetable 0 0 0
2008 05 Z fruit 0 99 99
2008 07 A fruit 0 15 15
2008 07 B vegetable 20 0 20
2008 07 Z fruit 0 99 99

and when aggregated with your second level (from the other query) would produce extra rows for, say, 2007/04/vegetable:

YEAR MO R_GROUP        SALES    OPENING    CLOSING
---- -- --------- ---------- ---------- ----------
2007 04 fruit 104 0 104
2007 04 vegetable 0 0 0
2008 05 fruit 10 104 114
2008 05 vegetable 0 0 0
2008 07 fruit 0 114 114
2008 07 vegetable 20 0 20

which you could partially filter those out before aggregating because all the intermediate columns would be zero:

with data as (
select 1 id, 'A' name, 'fruit' r_group, '2007' year, '04' month, 5 sales from dual union all
select 2 id, 'Z' name, 'fruit' r_group, '2007' year, '04' month, 99 sales from dual union all
select 3 id, 'A' name, 'fruit' r_group, '2008' year, '05' month, 10 sales from dual union all
select 4 id, 'B' name, 'vegetable' r_group, '2008' year, '07' month, 20 sales from dual
)
select year,
month,
r_group,
sum(sales) sales,
sum(opening) opening,
sum(closing) closing
from (
select t.*,
(sum(sales) over (partition by name, r_group
order by year, month
rows between unbounded preceding and current row
) -sales ) as opening,
sum(sales) over (partition by name, r_group
order by year, month
rows between unbounded preceding and current row
) as closing
from (
select a.year, a.month, b.name, b.r_group, nvl(d.sales, 0) as sales
from (select distinct year, month from data) a
cross join (select distinct name, r_group from data) b
left join data d
on d.year = a.year and d.month = a.month and d.name = b.name and d.r_group = b.r_group
) t
)
where sales != 0 or opening != 0 or closing != 0
group by year, month, r_group
order by year, month;

to get:

YEAR MO R_GROUP        SALES    OPENING    CLOSING
---- -- --------- ---------- ---------- ----------
2007 04 fruit 104 0 104
2008 05 fruit 10 104 114
2008 07 fruit 0 114 114
2008 07 vegetable 20 0 20

You could further filter that result to remove rows where the aggregated sales value is still zero, though if you're doing that the filter before aggregation isn't needed any more; but it's still a bit messy. And it isn't clear if your outermost aggregation can be modified to do that.

Mysql cumulative sum and group by month year

Try this :

select date_format(s1.date,'%b-%Y') as month,
(select sum(s2.value)
from saving s2
where s2.date <= last_day(s1.date)) as total
from saving s1
group by month
order by s1.date
limit 50;

You can see more information on this question :

Create a Cumulative Sum Column in MySQL



Related Topics



Leave a reply



Submit