Cannot Use Group by and Over(Partition By) in the Same Query

Cannot use group by and over(partition by) in the same query?

I found the solution.

I do not need to use OVER(PARTITION BY col_1) because it is already in the GROUP BY clause. Thus, the following query gives me the right answer:

SELECT col_1, col_2, sum(Value) as sum_value
from myTable GROUP BY col_1, col_2

since I am already grouping w.r.t col_1 and col_2.

Dave, thanks, I got the idea from your post.

Using GROUP BY and OVER

You need to nest the sums:

SELECT Year, Country, SUM([Total Sales]),
       SUM(SUM([Total Sales])) OVER (PARTITION BY Year) 
FROM Table
GROUP BY Country, Year;

This syntax is a little funky the first time you see it. But, the window function is evaluated after the GROUP BY. What this says is to sum the sum of the total sales . . . exactly what you want.

Using GROUP BY and PARTITION BY together

i think you need a HAVING COUNT

SELECT
    R.name AS name_of_r,
    C.name AS name_of_c,
    COUNT(O.id),
    date
  FROM 
    orders O
  INNER JOIN restaurants R ON R.id = O.restaurant_id AND R.country = O.country
  INNER JOIN customers C   ON C.id = O.customer_id   AND C.country = O.country
  GROUP BY R.name, C.name, date
  HAVING COUNT(O.id) >= 3
;

PS: But it will work for those who placed the 3rd order on the same day. Otherwise, date must be excluded from the grouping.

UPDATE:
added a request for the selection of every third client in the restaurant.

SELECT name_of_r, name_of_c, date
  FROM (
    SELECT
        R.name AS name_of_r,
        C.name AS name_of_c,
        date,
        ROW_NUMBER() OVER (PARTITION BY R.name ORDER BY date) AS nc
      FROM 
        orders O
      INNER JOIN restaurants R ON R.id = O.restaurant_id AND R.country = O.country
      INNER JOIN customers C   ON C.id = O.customer_id   AND C.country = O.country
  ) t
  WHERE t.nc = 3
;

See the ROW_NUMBER Function

Can't use Partition by and select * in the same query

When you use * in Oracle, it must be qualified if any other expressions are being selected. So:

SELECT e.*,
       MAX(e.Salary) OVER (PARTITION BY e.ID_DEPT ORDER BY e.Salary DESC) as R
FROM SG_EMPLOYEES e;

Note that I'm a big fan of qualifying all column names.

Your query actually seems very strange. You don't need the ORDER BY clause:

SELECT e.*,
       MAX(e.Salary) OVER (PARTITION BY e.ID_DEPT) as R
FROM SG_EMPLOYEES e;

Your version is taking the cumulative maximum and then ordering the salaries from the highest to the lowest -- so the cumulative is the same as the overall max.

Why do you need to include a field in GROUP BY when using OVER (PARTITION BY x)?

You need to nest the sum()s:

select year_num, age_bucket, sum(num_cust),
       sum(sum(num_cust)) over (partition by year_num)  --WORKS!!
from foo
group by year_num, age_bucket
order by 1, 2;

Why? Well, the window function is not doing aggregation. The argument needs to be an expression that can be evaluated after the group by (because this is an aggregation query). Because num_cust is not a group by key, it needs an aggregation function.

Perhaps this is clearer if you used a subquery:

select year_num, age_bucket, sum_num_cust,
       sum(sum_num_cust) over (partition by year_num)
from (select year_num, age_bucket, sum(num_cust) as sum_num_cust
      from foo
      group by year_num, age_bucket
     ) ya
order by 1, 2;

These two queries do exactly the same thing. But with the subquery it should be more obvious why you need the extra aggregation.

must appear in the GROUP BY clause or be used in an aggregate function

Yes, this is a common aggregation problem. Before SQL3 (1999), the selected fields must appear in the GROUP BY clause[*].

To workaround this issue, you must calculate the aggregate in a sub-query and then join it with itself to get the additional columns you'd need to show:

SELECT m.cname, m.wmname, t.mx
FROM (
    SELECT cname, MAX(avg) AS mx
    FROM makerar
    GROUP BY cname
    ) t JOIN makerar m ON m.cname = t.cname AND t.mx = m.avg
;

 cname  | wmname |          mx           
--------+--------+------------------------
 canada | zoro   |     2.0000000000000000
 spain  | usopp  |     5.0000000000000000

But you may also use window functions, which looks simpler:

SELECT cname, wmname, MAX(avg) OVER (PARTITION BY cname) AS mx
FROM makerar
;

The only thing with this method is that it will show all records (window functions do not group). But it will show the correct (i.e. maxed at cname level) MAX for the country in each row, so it's up to you:

 cname  | wmname |          mx           
--------+--------+------------------------
 canada | zoro   |     2.0000000000000000
 spain  | luffy  |     5.0000000000000000
 spain  | usopp  |     5.0000000000000000

The solution, arguably less elegant, to show the only (cname, wmname) tuples matching the max value, is:

SELECT DISTINCT /* distinct here matters, because maybe there are various tuples for the same max value */
    m.cname, m.wmname, t.avg AS mx
FROM (
    SELECT cname, wmname, avg, ROW_NUMBER() OVER (PARTITION BY avg DESC) AS rn 
    FROM makerar
) t JOIN makerar m ON m.cname = t.cname AND m.wmname = t.wmname AND t.rn = 1
;

 cname  | wmname |          mx           
--------+--------+------------------------
 canada | zoro   |     2.0000000000000000
 spain  | usopp  |     5.0000000000000000

[*]: Interestingly enough, even though the spec sort of allows to select non-grouped fields, major engines seem to not really like it. Oracle and SQLServer just don't allow this at all. Mysql used to allow it by default, but now since 5.7 the administrator needs to enable this option (ONLY_FULL_GROUP_BY) manually in the server configuration for this feature to be supported...

Can I group by in SQL query with window function?

If you run your second query without the group by - which you may have already tried, from the extra semicolon in what you posted - you'll see that you get one row for every employee, each showing the minimum salary in their department. That minimum is the analytic min() because it has a window clause. The PARTITION BY is the equivalent of a GROUP BY, but without the aggregation over the whole result set.

The simplest way to get the same result (almost) is to use the RANK() analytic function instead, which ranks the values based on the partition and order you supply, while allowing for ties:

SELECT employee_id, last_name, salary, department_id,
  RANK() OVER (PARTITION BY department_id ORDER BY salary) AS rnk
FROM employees
ORDER BY department_id, rnk;

EMPLOYEE_ID LAST_NAME                     SALARY DEPARTMENT_ID        RNK
----------- ------------------------- ---------- ------------- ----------
        200 Whalen                          4400            10          1
        202 Fay                             6000            20          1
        201 Hartstein                      13000            20          2
        119 Colmenares                      2500            30          1
        118 Himuro                          2600            30          2
        117 Tobias                          2800            30          3
        116 Baida                           2900            30          4
        115 Khoo                            3100            30          5
        114 Raphaely                       11000            30          6
...
        102 De Haan                        17000            90          1
        101 Kochhar                        17000            90          1
        100 King                           24000            90          3
...

For departments 20 and 30 you can see the row ranked 1 is the lowest salary. For department 90 there are two employees ranked 1, because they have the same lowest salary.

You can use that as an inline view and select just those rows ranked number 1:

SELECT employee_id, last_name, salary, department_id
FROM (
  SELECT employee_id, last_name, salary, department_id,
    RANK() OVER (PARTITION BY department_id ORDER BY salary) AS rnk
  FROM employees
)
WHERE rnk = 1
ORDER BY department_id;

EMPLOYEE_ID LAST_NAME                     SALARY DEPARTMENT_ID
----------- ------------------------- ---------- -------------
        200 Whalen                          4400            10
        202 Fay                             6000            20
        119 Colmenares                      2500            30
        203 Mavris                          6500            40
        132 Olson                           2100            50
        107 Lorentz                         4200            60
        204 Baer                           10000            70
        173 Kumar                           6100            80
        101 Kochhar                        17000            90
        102 De Haan                        17000            90
        113 Popp                            6900           100
        206 Gietz                           8300           110
        178 Grant                           7000              

13 rows selected.

If you didn't have to worry about ties there is an even simpler alternative, but it ins't appropriate here.

Notice that this gives you one more row than your original query. You are joining on sml.department_id = emp.department_id. If the department ID is null, as it is for employee 178, that join fails because you can't compare null to null with equality tests. Because this solution doesn't have a join, that doesn't apply, and you see that employee in the results.

SQL - Use dense_rank and group by together

All the columns which you have used in select statement, it is contained in either an aggregate function or the group by clause.

we can use rank function and group by in the same query set but all the columns should be contained in either aggregate function or the Group by clause.

So this query set is giving result by grouping Product and type and giving rank based on highest to lowest sales amount because you have used descending in Order by clause.

Cannot Use Group by and Over(Partition By) in the Same Query