Cannot Use Group by and Over(Partition By) in the Same Query

Cannot use group by and over(partition by) in the same query?

I found the solution.

I do not need to use OVER(PARTITION BY col_1) because it is already in the GROUP BY clause. Thus, the following query gives me the right answer:

SELECT col_1, col_2, sum(Value) as sum_value
from myTable GROUP BY col_1, col_2

since I am already grouping w.r.t col_1 and col_2.

Dave, thanks, I got the idea from your post.

Using GROUP BY and OVER

You need to nest the sums:

SELECT Year, Country, SUM([Total Sales]),
SUM(SUM([Total Sales])) OVER (PARTITION BY Year)
FROM Table
GROUP BY Country, Year;

This syntax is a little funky the first time you see it. But, the window function is evaluated after the GROUP BY. What this says is to sum the sum of the total sales . . . exactly what you want.

Using GROUP BY and PARTITION BY together

i think you need a HAVING COUNT

SELECT
R.name AS name_of_r,
C.name AS name_of_c,
COUNT(O.id),
date
FROM
orders O
INNER JOIN restaurants R ON R.id = O.restaurant_id AND R.country = O.country
INNER JOIN customers C ON C.id = O.customer_id AND C.country = O.country
GROUP BY R.name, C.name, date
HAVING COUNT(O.id) >= 3
;

PS: But it will work for those who placed the 3rd order on the same day. Otherwise, date must be excluded from the grouping.

UPDATE:
added a request for the selection of every third client in the restaurant.

SELECT name_of_r, name_of_c, date
FROM (
SELECT
R.name AS name_of_r,
C.name AS name_of_c,
date,
ROW_NUMBER() OVER (PARTITION BY R.name ORDER BY date) AS nc
FROM
orders O
INNER JOIN restaurants R ON R.id = O.restaurant_id AND R.country = O.country
INNER JOIN customers C ON C.id = O.customer_id AND C.country = O.country
) t
WHERE t.nc = 3
;

See the ROW_NUMBER Function

Can't use Partition by and select * in the same query

When you use * in Oracle, it must be qualified if any other expressions are being selected. So:

SELECT e.*,
MAX(e.Salary) OVER (PARTITION BY e.ID_DEPT ORDER BY e.Salary DESC) as R
FROM SG_EMPLOYEES e;

Note that I'm a big fan of qualifying all column names.

Your query actually seems very strange. You don't need the ORDER BY clause:

SELECT e.*,
MAX(e.Salary) OVER (PARTITION BY e.ID_DEPT) as R
FROM SG_EMPLOYEES e;

Your version is taking the cumulative maximum and then ordering the salaries from the highest to the lowest -- so the cumulative is the same as the overall max.

Why do you need to include a field in GROUP BY when using OVER (PARTITION BY x)?

You need to nest the sum()s:

select year_num, age_bucket, sum(num_cust),
sum(sum(num_cust)) over (partition by year_num) --WORKS!!
from foo
group by year_num, age_bucket
order by 1, 2;

Why? Well, the window function is not doing aggregation. The argument needs to be an expression that can be evaluated after the group by (because this is an aggregation query). Because num_cust is not a group by key, it needs an aggregation function.

Perhaps this is clearer if you used a subquery:

select year_num, age_bucket, sum_num_cust,
sum(sum_num_cust) over (partition by year_num)
from (select year_num, age_bucket, sum(num_cust) as sum_num_cust
from foo
group by year_num, age_bucket
) ya
order by 1, 2;

These two queries do exactly the same thing. But with the subquery it should be more obvious why you need the extra aggregation.

must appear in the GROUP BY clause or be used in an aggregate function

Yes, this is a common aggregation problem. Before SQL3 (1999), the selected fields must appear in the GROUP BY clause[*].

To workaround this issue, you must calculate the aggregate in a sub-query and then join it with itself to get the additional columns you'd need to show:

SELECT m.cname, m.wmname, t.mx
FROM (
SELECT cname, MAX(avg) AS mx
FROM makerar
GROUP BY cname
) t JOIN makerar m ON m.cname = t.cname AND t.mx = m.avg
;

cname | wmname | mx
--------+--------+------------------------
canada | zoro | 2.0000000000000000
spain | usopp | 5.0000000000000000

But you may also use window functions, which looks simpler:

SELECT cname, wmname, MAX(avg) OVER (PARTITION BY cname) AS mx
FROM makerar
;

The only thing with this method is that it will show all records (window functions do not group). But it will show the correct (i.e. maxed at cname level) MAX for the country in each row, so it's up to you:

 cname  | wmname |          mx           
--------+--------+------------------------
canada | zoro | 2.0000000000000000
spain | luffy | 5.0000000000000000
spain | usopp | 5.0000000000000000

The solution, arguably less elegant, to show the only (cname, wmname) tuples matching the max value, is:

SELECT DISTINCT /* distinct here matters, because maybe there are various tuples for the same max value */
m.cname, m.wmname, t.avg AS mx
FROM (
SELECT cname, wmname, avg, ROW_NUMBER() OVER (PARTITION BY avg DESC) AS rn
FROM makerar
) t JOIN makerar m ON m.cname = t.cname AND m.wmname = t.wmname AND t.rn = 1
;

cname | wmname | mx
--------+--------+------------------------
canada | zoro | 2.0000000000000000
spain | usopp | 5.0000000000000000

[*]: Interestingly enough, even though the spec sort of allows to select non-grouped fields, major engines seem to not really like it. Oracle and SQLServer just don't allow this at all. Mysql used to allow it by default, but now since 5.7 the administrator needs to enable this option (ONLY_FULL_GROUP_BY) manually in the server configuration for this feature to be supported...

Can I group by in SQL query with window function?

If you run your second query without the group by - which you may have already tried, from the extra semicolon in what you posted - you'll see that you get one row for every employee, each showing the minimum salary in their department. That minimum is the analytic min() because it has a window clause. The PARTITION BY is the equivalent of a GROUP BY, but without the aggregation over the whole result set.

The simplest way to get the same result (almost) is to use the RANK() analytic function instead, which ranks the values based on the partition and order you supply, while allowing for ties:

SELECT employee_id, last_name, salary, department_id,
RANK() OVER (PARTITION BY department_id ORDER BY salary) AS rnk
FROM employees
ORDER BY department_id, rnk;

EMPLOYEE_ID LAST_NAME SALARY DEPARTMENT_ID RNK
----------- ------------------------- ---------- ------------- ----------
200 Whalen 4400 10 1
202 Fay 6000 20 1
201 Hartstein 13000 20 2
119 Colmenares 2500 30 1
118 Himuro 2600 30 2
117 Tobias 2800 30 3
116 Baida 2900 30 4
115 Khoo 3100 30 5
114 Raphaely 11000 30 6
...
102 De Haan 17000 90 1
101 Kochhar 17000 90 1
100 King 24000 90 3
...

For departments 20 and 30 you can see the row ranked 1 is the lowest salary. For department 90 there are two employees ranked 1, because they have the same lowest salary.

You can use that as an inline view and select just those rows ranked number 1:

SELECT employee_id, last_name, salary, department_id
FROM (
SELECT employee_id, last_name, salary, department_id,
RANK() OVER (PARTITION BY department_id ORDER BY salary) AS rnk
FROM employees
)
WHERE rnk = 1
ORDER BY department_id;

EMPLOYEE_ID LAST_NAME SALARY DEPARTMENT_ID
----------- ------------------------- ---------- -------------
200 Whalen 4400 10
202 Fay 6000 20
119 Colmenares 2500 30
203 Mavris 6500 40
132 Olson 2100 50
107 Lorentz 4200 60
204 Baer 10000 70
173 Kumar 6100 80
101 Kochhar 17000 90
102 De Haan 17000 90
113 Popp 6900 100
206 Gietz 8300 110
178 Grant 7000

13 rows selected.

If you didn't have to worry about ties there is an even simpler alternative, but it ins't appropriate here.

Notice that this gives you one more row than your original query. You are joining on sml.department_id = emp.department_id. If the department ID is null, as it is for employee 178, that join fails because you can't compare null to null with equality tests. Because this solution doesn't have a join, that doesn't apply, and you see that employee in the results.

SQL - Use dense_rank and group by together

All the columns which you have used in select statement, it is contained in either an aggregate function or the group by clause.

we can use rank function and group by in the same query set but all the columns should be contained in either aggregate function or the Group by clause.

So this query set is giving result by grouping Product and type and giving rank based on highest to lowest sales amount because you have used descending in Order by clause.



Related Topics



Leave a reply



Submit