Weighted Average in T-Sql (Like Excel's Sumproduct)

Weighted average in T-SQL (like Excel's SUMPRODUCT)

Quassnoi's answer shows how to do the SumProduct, and using a WHERE clause would allow you to restrict by a Date field...

SELECT
SUM([tbl].data * [tbl].weight) / SUM([tbl].weight)
FROM
[tbl]
WHERE
[tbl].date >= '2009 Jan 01'
AND [tbl].date < '2010 Jan 01'

The more complex part is where you want to "dynamically specify" the what field is [data] and what field is [weight]. The short answer is that realistically you'd have to make use of Dynamic SQL. Something along the lines of:

- Create a string template

- Replace all instances of [tbl].data with the appropriate data field

- Replace all instances of [tbl].weight with the appropriate weight field

- Execute the string

Dynamic SQL, however, carries it's own overhead. Is the queries are relatively infrequent , or the execution time of the query itself is relatively long, this may not matter. If they are common and short, however, you may notice that using dynamic sql introduces a noticable overhead. (Not to mention being careful of SQL injection attacks, etc.)

EDIT:

In your lastest example you highlight three fields:

  • RecordDate
  • KPI
  • Actual

When the [KPI] is "Weight Y", then [Actual] the Weighting Factor to use.

When the [KPI] is "Tons Milled", then [Actual] is the Data you want to aggregate.


Some questions I have are:

  • Are there any other fields?
  • Is there only ever ONE actual per date per KPI?

The reason I ask being that you want to ensure the JOIN you do is only ever 1:1. (You don't want 5 Actuals joining with 5 Weights, giving 25 resultsing records)

Regardless, a slight simplification of your query is certainly possible...

SELECT
SUM([baseSeries].Actual * [weightSeries].Actual) / SUM([weightSeries].Actual)
FROM
CalcProductionRecords AS [baseSeries]
INNER JOIN
CalcProductionRecords AS [weightSeries]
ON [weightSeries].RecordDate = [baseSeries].RecordDate
-- AND [weightSeries].someOtherID = [baseSeries].someOtherID
WHERE
[baseSeries].KPI = 'Tons Milled'
AND [weightSeries].KPI = 'Weighty'

The commented out line only needed if you need additional predicates to ensure a 1:1 relationship between your data and the weights.


If you can't guarnatee just One value per date, and don't have any other fields to join on, you can modify your sub_query based version slightly...

SELECT
SUM([baseSeries].Actual * [weightSeries].Actual) / SUM([weightSeries].Actual)
FROM
(
SELECT
RecordDate,
SUM(Actual)
FROM
CalcProductionRecords
WHERE
KPI = 'Tons Milled'
GROUP BY
RecordDate
)
AS [baseSeries]
INNER JOIN
(
SELECT
RecordDate,
AVG(Actual)
FROM
CalcProductionRecords
WHERE
KPI = 'Weighty'
GROUP BY
RecordDate
)
AS [weightSeries]
ON [weightSeries].RecordDate = [baseSeries].RecordDate

This assumes the AVG of the weight is valid if there are multiple weights for the same day.


EDIT : Someone just voted for this so I thought I'd improve the final answer :)

SELECT
SUM(Actual * Weight) / SUM(Weight)
FROM
(
SELECT
RecordDate,
SUM(CASE WHEN KPI = 'Tons Milled' THEN Actual ELSE NULL END) AS Actual,
AVG(CASE WHEN KPI = 'Weighty' THEN Actual ELSE NULL END) AS Weight
FROM
CalcProductionRecords
WHERE
KPI IN ('Tons Milled', 'Weighty')
GROUP BY
RecordDate
)
AS pivotAggregate

This avoids the JOIN and also only scans the table once.

It relies on the fact that NULL values are ignored when calculating the AVG().

SQL SUMPRODUCT with a twist

You just should group by country like this

SELECT 
Country
SUM(widget.price * weight.weight) / SUM(weight.weight)
FROM
Widget
INNER JOIN
Weight ON Widget.contract = Weight.contract
WHERE
Weight.contract >= '2016-01-01'
AND Weight.contract <= '2016-12-01'
GROUP BY Country

SUMPRODUCT In SQL

Try:

SELECT Sim, ((A.TypeA * B.TypeA) + (A.TypeB * B.TypeB) + (A.TypeC * B.TypeC)) 'BA Der',
((A.TypeA * C.TypeA) + (A.TypeB * C.TypeB) + (A.TypeC * C.TypeC)) 'BSL ENH'
FROM tbl1 A, tbl2 B, tbl2 C
WHERE B.AssetID = 'BA Der'
AND C.AssetID = 'BSL ENH'

See Demo

MDX Query SUM PROD to do Weighted Average

First of all, you would need an intermediate measure, lets say Rank times Searches, in the cube. The most efficient way to implement this would be to calculate it when processing the measure group. You would extend your fact table by a column e. g. in a view or add a named calculation in the data source view. The SQL expression for this column would be something like Searches * Rank. In the cube definition, you would set the aggregation function of this measure to Sum and make it invisible. Then just define your weighted average as

[Measures].[Rank times Searches] / [Measures].[Searches]

or, to avoid irritating results for zero/null values of searches:

IIf([Measures].[Searches] <> 0, [Measures].[Rank times Searches] / [Measures].[Searches], NULL)

Since Analysis Services 2012 SP1, you can abbreviate the latter to

Divide([Measures].[Rank times Searches], [Measures].[Searches], NULL)

Then the MDX engine will apply everything automatically across all dimensions for you.

In the second expression, the <> 0 test includes a <> null test, as in numerical contexts, NULL is evaluated as zero by MDX - in contrast to SQL.

Finally, as I interpret the link you have in your question, you could leave your measure Rank times Searches on SQL/Data Source View level to be anything, maybe just 0 or null, and would then add the following to your calculation script:

({[Measures].[Rank times Searches]}, Leaves()) = [Measures].[Rank] * [Measures].[Searches];

From my point of view, this solution is not as clear as to directly calculate the value as described above. I would also think it could be slower, at least if you use aggregations for some partitions in your cube.

Conditional Weighted Average using SUMPRODUCT, SUMIF, ISNUMBER

In this set up, which is slightly simplified from yours but essentially the same, I added the last column (E is just C x D) and used this formula

=SUMIFS(E2:E6,A2:A6,"B",B2:B6,"*truck*")/SUMIFS(C2:C6,A2:A6,"B",B2:B6,"*truck*")

Sample Image

If you don't want a helper column, could use this

=SUMPRODUCT((A2:A6="B")*(ISNUMBER(SEARCH("truck",B2:B6))*(C2:C6)*(D2:D6)))/SUMPRODUCT((A2:A6="B")*(ISNUMBER(SEARCH("truck",B2:B6))*(C2:C6)))

SQL Server 2012: Weighted Average Calculation

To calculate WAM and WAIR is pretty simple. Think sumproduct() in Excel

Declare @YourTable table (SaleEventID  int,LID int,CurrentUPB money,[Interest Rate] money,RemainingMonths money)
Insert Into @YourTable values
(1,1,100000.00,6.100, 11.00),
(1,2,67000.00,6.200, 360.00),
(1,3,1400000.00,6.300, 240.00),
(1,4,500000.00,7.000, 100.00),
(2,5,1400000.00,7.100, 240.00),
(2,6,500000.00,7.000,100.00)

Select SaleEventID
,UPB = sum(CurrentUPB)
,WAM = sum(CurrentUPB*RemainingMonths)/sum(CurrentUPB)
,WAIR = sum(CurrentUPB*[Interest Rate] )/sum(CurrentUPB)
From @YourTable
Where SaleEventID = @Event
Group By SaleEventID

Returns

SaleEventID UPB         WAM         WAIR
1 2067000.00 198.9453 6.4564
2 1900000.00 203.1578 7.0736

How to replicate excel sumproduct function correctly in python?

Pandas supports (and enforces) data alignment. When you apply an operation to two data frames, the operation is applied to the rows and columns with the same index (name), not at the same position. To apply operations to a pair of columns with different names, you should extract the underlying numpy arrays from them:

# Clean the NAs
import numpy as np
df1.replace("N/A", np.nan, inplace=True)

(df1[cols_left].fillna(0).values * df1[cols_right].values).sum() / df1[cols_right].sum(1)
#0 15.25

Note that nan * 0 is still a nan. You must convert nans to finite numbers (e.g., to 0s) to obtain a numeric result.



Related Topics



Leave a reply



Submit