How to Group by a Calculated Field

How to group by a Calculated Field

Sure, just add the same calculation to the GROUP BY clause:

select dateadd(day, -7, Convert(DateTime, mwspp.DateDue) + (7 - datepart(weekday, mwspp.DateDue))),
sum(mwspp.QtyRequired)
from manufacturingweekshortagepartpurchasing mwspp
where mwspp.buildScheduleSimID = 10109 and mwspp.partID = 8366
group by dateadd(day, -7, Convert(DateTime, mwspp.DateDue) + (7 - datepart(weekday, mwspp.DateDue)))
order by dateadd(day, -7, Convert(DateTime, mwspp.DateDue) + (7 - datepart(weekday, mwspp.DateDue)))

Edit after comment:

Like all questions regarding the optimiser, the answer is really "it depends", but most likely it will only be performed once - you'll see this in the execution plan as a Compute Scalar operator.

Based on this Compute Scalar operation, the optimiser will then decide how to perform the actual aggregation.

The other answers here (CTE/subquery, etc) are all equally valid, but don't really change the logic - ultimately they will be performing similar operations. SQL Server might treat them differently but it's unlikely. However they will help with readability.

If you're worried about efficiency you can look at a couple of options, e.g. setting up the calculation as a persisted computed column and using this in an index, or adding interim results into a temporary table.

The only way to really know for sure is to inspect the Execution Plan/IO statistics when running the query on a typical data set and seeing if you're satisfied with the performance; if not, perhaps investigating one of the above options.

Hive: group by calculated column

Try repeating the expression:

select myUsualField, SOME_FUNCTION(myAnotherField) as myUnusualField 
from MYTABLE
group by myUsualField, SOME_FUNCTION(myAnotherField) ;

How to Group by and calculation of other column. pandas

First aggregate sum and then multiple columns in DataFrame.eval:

df = (df.groupby(['Col1','Col2'])
.sum()
.eval('Weightage_count / Count')
.reset_index(name='Result'))
print (df)
Col1 Col2 Result
0 A S1 0.6250
1 A S2 0.4375
2 B S3 1.0000
3 C S4 0.5000

Or divide by Series.div with DataFrame.pop for remove columns after processing:

df = df.groupby(['Col1','Col2'], as_index=False)[['Count','Weightage_count']].sum()
df['new'] = df.pop('Weightage_count').div(df.pop('Count'))
print (df)
Col1 Col2 new
0 A S1 0.6250
1 A S2 0.4375
2 B S3 1.0000
3 C S4 0.5000

If need also columns:

df = df.groupby(['Col1','Col2'])[['Count','Weightage_count']].sum()
df['new'] = df['Weightage_count'].div(df['Count'])
print (df)
Count Weightage_count new
Col1 Col2
A S1 4 2.50 0.6250
S2 4 1.75 0.4375
B S3 4 4.00 1.0000
C S4 3 1.50 0.5000

Group by Calculated Field

I don't think you want to use aggregate functions (and hence GROUP BY) at all. Instead, I think you want something like this:

SELECT acct_id, l_type, num_license, num_active
, q_date - MIN(q_date) OVER ( ) AS num_days
FROM s.table
WHERE flag = 'N';

That is, use MIN() as an analytic (window) function instead. Now, if you want to get a count of accounts for each number of active days, you can do something like this:

SELECT TRUNC(num_days), COUNT(*) FROM (
SELECT acct_id, l_type, num_license, num_active
, q_date - MIN(q_date) OVER ( ) AS num_days
FROM s.table
WHERE flag = 'N'
) GROUP BY TRUNC(num_days);

How to group distinct values and calculate fields in one SQL query

Below is for BigQuery Standard SQL

If you know in advance product names (like '1', '2', '3' in your example) and there are just few - you can use below simple version

#standardSQL
SELECT name,
MAX(product = '1') AS has1,
MAX(product = '2') AS has2,
MAX(product = '3') AS has3
FROM `project.dataset.table`
GROUP BY name

If to apply to sample data from your question (I assume your product are of string data type here)

WITH `project.dataset.table` AS (
SELECT 'Andy' name, '1' product UNION ALL
SELECT 'Bill', '2' UNION ALL
SELECT 'Cole', '2' UNION ALL
SELECT 'Andy', '2' UNION ALL
SELECT 'Bill', '1' UNION ALL
SELECT 'Cole', '2' UNION ALL
SELECT 'Dave', '3'
)

result is

Row name    has1    has2    has3     
1 Andy true true false
2 Bill true true false
3 Cole false true false
4 Dave false false true

In case if product names are not known in advance and/or number of products more than just few - below version can be handy

EXECUTE IMMEDIATE '''
SELECT name,''' || (
SELECT STRING_AGG(DISTINCT "MAX(product = '" || product || "') AS has" || product)
FROM `project.dataset.table`
) || '''
FROM `project.dataset.table`
GROUP BY name
'''

with exact same output

As you can see here - whole query is assembled dynamically so you don't need to worry about number of products and their names

Below version is identical to above, but easier to manage/read

EXECUTE IMMEDIATE FORMAT('''
SELECT name, %s
FROM `project.dataset.table`
GROUP BY name
''', (
SELECT STRING_AGG(DISTINCT "MAX(product = '" || product || "') AS has" || product)
FROM `project.dataset.table`
))

Group-by and add new calculated column in Python

You can use shift per group:

df['Col4'] = df['Col3'] + df.groupby('Col1')['Col3'].shift(1).fillna(0)

>>> df
Col1 Col2 Col3 Col4
0 a 1 2 2.0
1 a 2 3 5.0
2 a 4 6 9.0
3 b 3 7 7.0
4 b 5 1 8.0

group by a calculated field in SQL

What you need to do is use a subquery for your aliased CASE column. With your query as a subquery, you are able to group by your aliased column.

select product
from
(
select
CASE
WHEN [col1] = 's' THEN '1'
WHEN [col1] = 't' THEN '2'
WHEN [col1] = 'u' THEN '3'
WHEN [col2] = 'v' THEN '4'
END AS product,
SUM(col3) as Col3Sum
FROM dbo.TableA
) a
group by product


Related Topics



Leave a reply



Submit