Converting Aggregate Operators from SQL to Relational Algebra

Converting aggregate operators from SQL to relational algebra

This is only half an answer...

The relation "boats reserved by two or more sailors" can be found using conditional join and projection, which are both in your set of allowed operations:

SELECT DISTINCT R1.bid
FROM Reserves AS R1
JOIN Reserves AS R2
ON R1.bid = R2.bid
AND R1.sid < R2.sid;

The relation "boats reserved by three or more sailors" can be found using conditional join (twice) and projection, which are both in your set of allowed operations:

SELECT DISTINCT R1.bid
FROM Reserves AS R1
JOIN Reserves AS R2
ON R1.bid = R2.bid
AND R1.sid < R2.sid
JOIN Reserves AS R3
ON R1.bid = R3.bid
AND R2.sid < R3.sid;

If we had a minus operator e.g. EXCEPT in Standard SQL:

SELECT DISTINCT R1.bid
FROM Reserves AS R1
JOIN Reserves AS R2
ON R1.bid = R2.bid
AND R1.sid < R2.sid
EXCEPT
SELECT DISTINCT R1.bid
FROM Reserves AS R1
JOIN Reserves AS R2
ON R1.bid = R2.bid
AND R1.sid < R2.sid
JOIN Reserves AS R3
ON R1.bid = R3.bid
AND R2.sid < R3.sid;

If we had restriction (WHERE in SQL) and a semi difference (a.k.a. antijoin) operator (e.g. NOT IN in SQL):

SELECT DISTINCT R1.bid
FROM Reserves AS R1
JOIN Reserves AS R2
ON R1.bid = R2.bid
AND R1.sid < R2.sid
WHERE R1.bid NOT IN (
SELECT DISTINCT R1.bid
FROM Reserves AS R1
JOIN Reserves AS R2
ON R1.bid = R2.bid
AND R1.sid < R2.sid
JOIN Reserves AS R3
ON R1.bid = R3.bid
AND R2.sid < R3.sid
);

...but your set of allowed operations does not include restriction, semi difference or minus :(

Converting SQL Query with Aggregate Function to Relational Algebra Expression in Apache Calcite - No match found for function signature

The problem with the code is that new SqlStdOperatorTable() creates a validator which is not initialized. The correct way to use SqlStdOperatorTable is to use SqlStdOperatorTable.instance().

I found the solution after emailing the dev@calcite.apache.org mailing list. I would like to thank Yuzhao Chen for looking into the question I had and pointing out the problem with my code.

Converting a certain SQL query into relational algebra

I don't know. And anybody else is not likely to know either.

RA courses typically limit themselves to the selection, projection and join operators. Aggregations are not typically covered by an RA course. There even isn't any standard approach (that I know of) that the RA takes on aggregations.

What is the operator that your course defines for doing aggregations on relations ? What type of value does that operator produce for its result ? A relation ? Something else ? If something else, how does your course explain doing relational restrictions on that result, given that these result values aren't relations, but restriction works only on relations ?

Algebraically, this case starts with a natural join (produkt x ordre).

[The result of] this natural join is subjected to an aggregation operation. Thus this natural join is to appear where you specify the relational input argument to your aggregation operator. The other needed specs for specifying the aggregation are the output attribute names (total), and the way to compute them (SUM(...)). Those might appear in subscript next to your aggregation operator symbol as "annotations", much like the attribute lists on projection and the restriction condition on restriction. But anything concerning this operator is course-specific, because there isn't any agreed-upon standard notation for aggregations, as far as I know.

Then if your aggregation operator is defined to return a relation, you can specify your aggregation result as the input argument to a restriction with condition "total>=10000".

How do you convert a Sql command into Relational Algebra?

Aggregation and grouping operations can't be constructed from the basic relational algebra operations. You would have to define your own COUNT and GROUP BY operators to perform this.

There are many proposed extensions to the basic relational algebra that you could use, or you could even define your own - but to do this formally, I suspect the mathematics would get reasonably complex.

A simple proposal (without much formality) can be found here, http://myweb.lmu.edu/dondi/share/db/relational3.pdf. (Section 3.2)

Using the extended algebra proposed in that link, your expression would be written like:

Sample Image

How to represent GROUP BY with HAVING COUNT(*)1 in relational algebra?

There are many variants of "relational algebra", differing even on what a relation is. You need to tell us which one you are supposed to use.

Also you don't explain what it means for a pair of RA & SQL queries to "have the form of" or "be the same as" each other. (Earlier versions.) Same result? Or also some kind of parallel structure?

Also you don't explain what "get the list of clients" means. What attributes does the result have?

If you try to write a definition of the count you are trying to use in σ count(IdClient)>1 (...)--what it inputs & what it outputs based on that--you will see that you can't. That kind of count that takes just an attribute does not correspond to a relational operator. It is used in a grouping expression--which you are missing. Such count & group are not actually relational operators, they are non-terminals in so-called relational algebras that are really query languages, designed by SQL apologists, suggesting it is easy to map SQL to relational algebra, but begging the question of how we aggregate in an algebra. Still, maybe that's the sort of "relational algebra" you were told to use.

I saw the use of count() there https://cs.stackexchange.com/questions/29897/use-count-in-relational-algebra

The nature of algebras is that the only sense in which we "use" operators "with" other operators is to pass outputs of operator calls as inputs to other operator calls. (Hence some so-called algebras are not.) In your linked answer, grouping operator G inputs aggregate name count and attribute name name and that affects the output. The answer quotes Database System Concepts, 5th Ed:

 G1, G2, ..., Gn G F1(A1), F2(A2), ..., Fm(Am) (E)

where E is any relational-algebra expression; G1, G2, ..., Gn constitute a list of attributes on which to group; each Fi is an aggregate function; and each Ai is an attribute name.

G returns rows with attributes G1, ..., A1, ... where one or more rows with the same G1, ... subrows are in E and each Ai holds the output from aggregating Fi on Ai over those rows.

But the answer when you read & linked it used that definition improperly. (I got it fixed since.) Correct is:

π name (σ phone>1 (name G count(phone) (Person)))

This is clear if you carefully read the definition.

G has misleading syntax. count(phone) is not a call of an operator; it's just a pair of arguments, an aggregate name count & an attribute name phone. Less misleading syntax would be

π name (σ phone>1 (name G count phone (Person)))

One does not need a grouping operator to write your query. That makes it all the more important to know what "relational algebra" means in the exam. It is harder if you can't use a grouping operator.

"Π Rental seems like a shady business" is unclear. You do use projection incorrectly; proper use is π attributes (relation). I guess you are using π in an attempt to involve a grouping operator like G. Re "the logic behind the math" see this.



Related Topics



Leave a reply



Submit