Product() Aggregate Function

Why is there no PRODUCT aggregate function in SQL?

There is no PRODUCT set function in the SQL Standard. It would appear to be a worthy candidate, though (unlike, say, a CONCATENATE set function: it's not a good fit for SQL e.g. the resulting data type would involve multivalues and pose a problem as regards first normal form).

The SQL Standards aim to consolidate functionality across SQL products circa 1990 and to provide 'thought leadership' on future development. In short, they document what SQL does and what SQL should do. The absence of PRODUCT set function suggests that in 1990 no vendor though it worthy of inclusion and there has been no academic interest in introducing it into the Standard.

Of course, vendors always have sought to add their own functionality, these days usually as extentions to Standards rather than tangentally. I don't recall seeing a PRODUCT set function (or even demand for one) in any of the SQL products I've used.

In any case, the work around is fairly simple using log and exp scalar functions (and logic to handle negatives) with the SUM set function; see @gbn's answer for some sample code. I've never needed to do this in a business application, though.

In conclusion, my best guess is that there is no demand from SQL end users for a PRODUCT set function; further, that anyone with an academic interest would probably find the workaround acceptable (i.e. would not value the syntactic sugar a PRODUCT set function would provide).

Out of interest, there is indeed demand in SQL Server Land for new set functions but for those of the window function variety (and Standard SQL, too). For more details, including how to get involved in further driving demand, see Itzik Ben-Gan's blog.

Product() aggregate function

Oracle has exposed these ODCI (Oracle Data Cartridge Interface) methods to do the Aggregate! Reference:

  • ODCIAggregateDelete() - Removes an input value from the current group.
  • ODCIAggregateInitialize() - Initializes the aggregation context and instance of the implementation object type, and returns it as an OUT parameter.
  • ODCIAggregateIterate() - Iterates through input rows by processing the input values, updating and then returning the aggregation context.
  • ODCIAggregateMerge() - Merges two aggregation contexts into a single object instance during either serial or parallel evaluation of the user-defined aggregate.
  • ODCIAggregateTerminate() - Calculates the result of the aggregate computation and performs all necessary cleanup, such as freeing memory.
  • ODCIAggregateWrapContext() Integrates all external pieces of the current aggregation context to make the context self-contained.

Code For PRODUCT() Aggregate function :

CREATE OR REPLACE type PRODUCT_IMPL
AS
object
(
result NUMBER,
static FUNCTION ODCIAggregateInitialize(sctx IN OUT PRODUCT_IMPL)
RETURN NUMBER,
member FUNCTION ODCIAggregateIterate(self IN OUT PRODUCT_IMPL,
value IN NUMBER)
RETURN NUMBER,
member FUNCTION ODCIAggregateTerminate( self IN PRODUCT_IMPL,
returnValue OUT NUMBER,
flags IN NUMBER)
RETURN NUMBER,
member FUNCTION ODCIAggregateMerge(self IN OUT PRODUCT_IMPL,
ctx2 IN PRODUCT_IMPL )
RETURN NUMBER );
/
/* 1.Initializes the computation by initializing the aggregation context—the rows over which aggregation is performed: */
CREATE OR REPLACE type body PRODUCT_IMPL
IS
static FUNCTION ODCIAggregateInitialize(sctx IN OUT PRODUCT_IMPL)
RETURN NUMBER
IS
BEGIN
sctx := PRODUCT_IMPL(1);
RETURN ODCIConst.Success;
END;
/* 2.Iteratively processes each successive input value and updates the context: */
member FUNCTION ODCIAggregateIterate(self IN OUT PRODUCT_IMPL,
value IN NUMBER)
RETURN NUMBER
IS
BEGIN
self.result := value * self.result;
RETURN ODCIConst.Success;
END;
member FUNCTION ODCIAggregateTerminate(
self IN PRODUCT_IMPL,
returnValue OUT NUMBER,
flags IN NUMBER)
RETURN NUMBER
IS
BEGIN
returnValue := self.result;
RETURN ODCIConst.Success;
END;
member FUNCTION ODCIAggregateMerge(self IN OUT PRODUCT_IMPL,
ctx2 IN PRODUCT_IMPL)
RETURN NUMBER
IS
BEGIN
self.result := self.result;
RETURN ODCIConst.Success;
END;
END;
/

/* Create A function using the PRODUCT_IMPL implementation we did above */
CREATE OR REPLACE FUNCTION product(input NUMBER)
RETURN NUMBER
PARALLEL_ENABLE AGGREGATE USING PRODUCT_IMPL;
/

Results:

SELECT group_name,product(num) FROM product_test GROUP BY group_name;

Mahesh -60000
Mahesh_1 9

How to Calculate Aggregated Product Function in SQL Server

Oh, this is a pain. Most databases do not support a product aggregation function. You can emulate it with logs and powers. So, something like this might work:

select t.*,
(select exp(sum(log(serial)))
from table t2
where t2.no <= t.no
) as cumeProduct
from table t;

Note that log() might be called ln() in some databases. Also, this works for positive numbers. There are variations to handle negative numbers and zeroes, but this complicates the answer (and the sample data is all positive).

Product Aggregate in PostgreSQL

I found a solution from a very smart guy, who realized you can use logarithms to achieve this (credit goes to him):

select exp(sum(ln(x))) from generate_series(1,5) x;
exp
-----
120
(1 row)

Aggregate function to multiply all values

Yes, there's a technique described here in this blog:

Basically you take the natural log of SUM and then do an exponential (e^x)

SELECT EXP (SUM (LN (col))) as product from t;

Since the output of this would be a floating point, you may do a FLOOR

FLOOR( EXP (SUM (LN (col))) ) 

DEMO

Note: I've just found that this would fail if one of the rows has 0. so you should use a separate condition or a with clause that if one of them is zero the product should be zero.

is there a PRODUCT function like there is a SUM function in Oracle SQL?

select exp(sum(ln(col)))
from table;

edit:

if col always > 0

Cumulative product in Spark

As long as the number are strictly positive (0 can be handled as well, if present, using coalesce) as in your example, the simplest solution is to compute the sum of logarithms and take the exponential:

import org.apache.spark.sql.functions.{exp, log, max, sum}

val df = Seq(
("rr", "gg", "20171103", 2), ("hh", "jj", "20171103", 3),
("rr", "gg", "20171104", 4), ("hh", "jj", "20171104", 5),
("rr", "gg", "20171105", 6), ("hh", "jj", "20171105", 7)
).toDF("A", "B", "date", "val")

val result = df
.groupBy("A", "B")
.agg(
max($"date").as("date"),
exp(sum(log($"val"))).as("val"))

Since this uses FP arithmetic the result won't be exact:

result.show
+---+---+--------+------------------+
| A| B| date| val|
+---+---+--------+------------------+
| hh| jj|20171105|104.99999999999997|
| rr| gg|20171105|47.999999999999986|
+---+---+--------+------------------+

but after rounding should good enough for majority of applications.

result.withColumn("val", round($"val")).show
+---+---+--------+-----+
| A| B| date| val|
+---+---+--------+-----+
| hh| jj|20171105|105.0|
| rr| gg|20171105| 48.0|
+---+---+--------+-----+

If that's not enough you can define an UserDefinedAggregateFunction or Aggregator (How to define and use a User-Defined Aggregate Function in Spark SQL?) or use functional API with reduceGroups:

import scala.math.Ordering

case class Record(A: String, B: String, date: String, value: Long)

df.withColumnRenamed("val", "value").as[Record]
.groupByKey(x => (x.A, x.B))
.reduceGroups((x, y) => x.copy(
date = Ordering[String].max(x.date, y.date),
value = x.value * y.value))
.toDF("key", "value")
.select($"value.*")
.show
+---+---+--------+-----+
| A| B| date|value|
+---+---+--------+-----+
| hh| jj|20171105| 105|
| rr| gg|20171105| 48|
+---+---+--------+-----+

How to define own aggregate function in Mysql for GROUP BY?

If you want each product to appear once, then one method is to use variables:

select p.*
from (select p.*,
(@rn := if(@pid = p.product_id, @rn + 1,
if(@pid := p.product_id, 1, 1)
)
) as rn
from products p cross join
(select @pid := -1, @rn := 0) params
order by product_id, field(lang_no, 3, 4, 1, 5, 2) -- or whatever
) p
where rn = 1;

Another method uses a correlated subquery:

select p.*
from products p
where p.lang_no = (select p2.lang_no
from products p2
where p2.product_id = p.product_id
order by field(lang_no, 3, 4, 1, 5, 2) -- or whatever
limit 1
);

Both these versions use field(). This allows you to list all languages with their priority.

In your case, the correlated subquery is probably faster, assuming you have an index on product_id.

The one caveat is that all languages should be listed, because the missing values get a 0. If that is an issue, use this logic:

field(lang_no, 1, 3) desc

This will put 3 as the first priority, 1 as the second, and then anything else.



Related Topics



Leave a reply



Submit