Weighted average in T-SQL (like Excel's SUMPRODUCT)
Quassnoi's answer shows how to do the SumProduct, and using a WHERE clause would allow you to restrict by a Date field...
SELECT
SUM([tbl].data * [tbl].weight) / SUM([tbl].weight)
FROM
[tbl]
WHERE
[tbl].date >= '2009 Jan 01'
AND [tbl].date < '2010 Jan 01'
The more complex part is where you want to "dynamically specify" the what field is [data] and what field is [weight]. The short answer is that realistically you'd have to make use of Dynamic SQL. Something along the lines of:
- Create a string template
- Replace all instances of [tbl].data with the appropriate data field
- Replace all instances of [tbl].weight with the appropriate weight field
- Execute the string
Dynamic SQL, however, carries it's own overhead. Is the queries are relatively infrequent , or the execution time of the query itself is relatively long, this may not matter. If they are common and short, however, you may notice that using dynamic sql introduces a noticable overhead. (Not to mention being careful of SQL injection attacks, etc.)
EDIT:
In your lastest example you highlight three fields:
- RecordDate
- KPI
- Actual
When the [KPI] is "Weight Y", then [Actual] the Weighting Factor to use.
When the [KPI] is "Tons Milled", then [Actual] is the Data you want to aggregate.
Some questions I have are:
- Are there any other fields?
- Is there only ever ONE actual per date per KPI?
The reason I ask being that you want to ensure the JOIN you do is only ever 1:1. (You don't want 5 Actuals joining with 5 Weights, giving 25 resultsing records)
Regardless, a slight simplification of your query is certainly possible...
SELECT
SUM([baseSeries].Actual * [weightSeries].Actual) / SUM([weightSeries].Actual)
FROM
CalcProductionRecords AS [baseSeries]
INNER JOIN
CalcProductionRecords AS [weightSeries]
ON [weightSeries].RecordDate = [baseSeries].RecordDate
-- AND [weightSeries].someOtherID = [baseSeries].someOtherID
WHERE
[baseSeries].KPI = 'Tons Milled'
AND [weightSeries].KPI = 'Weighty'
The commented out line only needed if you need additional predicates to ensure a 1:1 relationship between your data and the weights.
If you can't guarnatee just One value per date, and don't have any other fields to join on, you can modify your sub_query based version slightly...
SELECT
SUM([baseSeries].Actual * [weightSeries].Actual) / SUM([weightSeries].Actual)
FROM
(
SELECT
RecordDate,
SUM(Actual)
FROM
CalcProductionRecords
WHERE
KPI = 'Tons Milled'
GROUP BY
RecordDate
)
AS [baseSeries]
INNER JOIN
(
SELECT
RecordDate,
AVG(Actual)
FROM
CalcProductionRecords
WHERE
KPI = 'Weighty'
GROUP BY
RecordDate
)
AS [weightSeries]
ON [weightSeries].RecordDate = [baseSeries].RecordDate
This assumes the AVG of the weight is valid if there are multiple weights for the same day.
EDIT : Someone just voted for this so I thought I'd improve the final answer :)
SELECT
SUM(Actual * Weight) / SUM(Weight)
FROM
(
SELECT
RecordDate,
SUM(CASE WHEN KPI = 'Tons Milled' THEN Actual ELSE NULL END) AS Actual,
AVG(CASE WHEN KPI = 'Weighty' THEN Actual ELSE NULL END) AS Weight
FROM
CalcProductionRecords
WHERE
KPI IN ('Tons Milled', 'Weighty')
GROUP BY
RecordDate
)
AS pivotAggregate
This avoids the JOIN and also only scans the table once.
It relies on the fact that NULL
values are ignored when calculating the AVG()
.
SQL SUMPRODUCT with a twist
You just should group by country like this
SELECT
Country
SUM(widget.price * weight.weight) / SUM(weight.weight)
FROM
Widget
INNER JOIN
Weight ON Widget.contract = Weight.contract
WHERE
Weight.contract >= '2016-01-01'
AND Weight.contract <= '2016-12-01'
GROUP BY Country
SUMPRODUCT In SQL
Try:
SELECT Sim, ((A.TypeA * B.TypeA) + (A.TypeB * B.TypeB) + (A.TypeC * B.TypeC)) 'BA Der',
((A.TypeA * C.TypeA) + (A.TypeB * C.TypeB) + (A.TypeC * C.TypeC)) 'BSL ENH'
FROM tbl1 A, tbl2 B, tbl2 C
WHERE B.AssetID = 'BA Der'
AND C.AssetID = 'BSL ENH'
See Demo
MDX Query SUM PROD to do Weighted Average
First of all, you would need an intermediate measure, lets say Rank times Searches
, in the cube. The most efficient way to implement this would be to calculate it when processing the measure group. You would extend your fact table by a column e. g. in a view or add a named calculation in the data source view. The SQL expression for this column would be something like Searches * Rank
. In the cube definition, you would set the aggregation function of this measure to Sum
and make it invisible. Then just define your weighted average as
[Measures].[Rank times Searches] / [Measures].[Searches]
or, to avoid irritating results for zero/null values of searches:
IIf([Measures].[Searches] <> 0, [Measures].[Rank times Searches] / [Measures].[Searches], NULL)
Since Analysis Services 2012 SP1, you can abbreviate the latter to
Divide([Measures].[Rank times Searches], [Measures].[Searches], NULL)
Then the MDX engine will apply everything automatically across all dimensions for you.
In the second expression, the <> 0
test includes a <> null
test, as in numerical contexts, NULL is evaluated as zero by MDX - in contrast to SQL.
Finally, as I interpret the link you have in your question, you could leave your measure Rank times Searches
on SQL/Data Source View level to be anything, maybe just 0 or null
, and would then add the following to your calculation script:
({[Measures].[Rank times Searches]}, Leaves()) = [Measures].[Rank] * [Measures].[Searches];
From my point of view, this solution is not as clear as to directly calculate the value as described above. I would also think it could be slower, at least if you use aggregations for some partitions in your cube.
Conditional Weighted Average using SUMPRODUCT, SUMIF, ISNUMBER
In this set up, which is slightly simplified from yours but essentially the same, I added the last column (E is just C x D) and used this formula
=SUMIFS(E2:E6,A2:A6,"B",B2:B6,"*truck*")/SUMIFS(C2:C6,A2:A6,"B",B2:B6,"*truck*")
If you don't want a helper column, could use this
=SUMPRODUCT((A2:A6="B")*(ISNUMBER(SEARCH("truck",B2:B6))*(C2:C6)*(D2:D6)))/SUMPRODUCT((A2:A6="B")*(ISNUMBER(SEARCH("truck",B2:B6))*(C2:C6)))
SQL Server 2012: Weighted Average Calculation
To calculate WAM and WAIR is pretty simple. Think sumproduct() in Excel
Declare @YourTable table (SaleEventID int,LID int,CurrentUPB money,[Interest Rate] money,RemainingMonths money)
Insert Into @YourTable values
(1,1,100000.00,6.100, 11.00),
(1,2,67000.00,6.200, 360.00),
(1,3,1400000.00,6.300, 240.00),
(1,4,500000.00,7.000, 100.00),
(2,5,1400000.00,7.100, 240.00),
(2,6,500000.00,7.000,100.00)
Select SaleEventID
,UPB = sum(CurrentUPB)
,WAM = sum(CurrentUPB*RemainingMonths)/sum(CurrentUPB)
,WAIR = sum(CurrentUPB*[Interest Rate] )/sum(CurrentUPB)
From @YourTable
Where SaleEventID = @Event
Group By SaleEventID
Returns
SaleEventID UPB WAM WAIR
1 2067000.00 198.9453 6.4564
2 1900000.00 203.1578 7.0736
How to replicate excel sumproduct function correctly in python?
Pandas supports (and enforces) data alignment. When you apply an operation to two data frames, the operation is applied to the rows and columns with the same index (name), not at the same position. To apply operations to a pair of columns with different names, you should extract the underlying numpy arrays from them:
# Clean the NAs
import numpy as np
df1.replace("N/A", np.nan, inplace=True)
(df1[cols_left].fillna(0).values * df1[cols_right].values).sum() / df1[cols_right].sum(1)
#0 15.25
Note that nan * 0
is still a nan
. You must convert nan
s to finite numbers (e.g., to 0s) to obtain a numeric result.
Related Topics
Repeating Rows Based on Column Value in Each Row
How to Get Current Database and User Name with 'select' in Postgresql
Copy Data from One Existing Row to Another Existing Row in SQL
Cycle Detection with Recursive Subquery Factoring
PHP Is Truncating Mssql Blob Data (4096B), Even After Setting Ini Values. am I Missing One
Id Best Practices for Databases
Postgres - Comparing Two Arrays
Field Value Must Be Unique Unless It Is Null
Sql Computation of Cosine Similarity
Join Tables Using a Value Inside a JSONb Column
How to Use Max() on a Subquery Result
Get the List of Stored Procedures Created And/Or Modified on a Particular Date
Are There Any Limits on Length of String in MySQL
Order by Month and Year in SQL with Sum
Why Bulk Import Is Faster Than Bunch of Inserts