Sql: Last_Value() Returns Wrong Result (But First_Value() Works Fine)

SQL: Last_Value() returns wrong result (but First_Value() works fine)

There is nothing wrong with your script, this is a way how partitioning works in SQL server :/. If you change LAST_VALUE to MAX result will be the same. Solution would be:

SELECT A.EmpID,  
       First_Value(A.AverageAmount) OVER (PARTITION BY A.EmpID Order by A.DimYearMonthKey asc) AS  '200901AvgAmount', 
       Last_Value(A.AverageAmount) OVER (PARTITION BY A.EmpID Order by A.DimYearMonthKey ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS '201112AvgAmount'  
FROM  Emp_Amt  AS A

There is a great post about it, link. GL!

FIRST_VALUE working correctly but LAST_VALUE not giving desired results

As mentioned by @JeroenMostert, when you add an ORDER BY to an OVER clause (of an analytical function which takes ROWS/RANGE), the default window is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW, therefore the last value in the window is always the current row.

What you want is ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING, so you need to add that explicitly. Unintuitive, but that is how it is.

You should also add this to FIRST_VALUE for performance reasons.

SELECT 
    DISTINCT *,
    ROW_NUMBER() OVER (PARTITION BY dog_guid ORDER BY created_at ASC) AS test_number,
    LAG(created_at) OVER (PARTITION BY dog_guid ORDER BY created_at ASC) AS previous_test_date,
    CAST(DATEDIFF(SECOND, LAG(created_at) OVER (PARTITION BY dog_guid ORDER BY created_at ASC), created_at)/(60*60*24) AS FLOAT) AS date_diff_days,
    CAST(DATEDIFF(SECOND, LAG(created_at) OVER (PARTITION BY dog_guid ORDER BY created_at ASC), created_at)/(60) AS FLOAT) AS date_diff_mins,
    FIRST_VALUE(created_at) OVER (PARTITION BY dog_guid ORDER BY created_at ASC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS first_test_date,
    LAST_VALUE(created_at) OVER (PARTITION BY dog_guid ORDER BY created_at ASC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS last_test_date
FROM    
    complete_tests c
WHERE
    dog_guid IS NOT NULL

Strictly speaking, you could change it to FIRST_VALUE and DESC, but this is going to bad for performance, firstly because it will need a second sort, and secondly because the window is still RANGE which requires an on-disk worktable.

LAST_VALUE in SQL Server 2012 is returning weird results

SQL Server doesn't know or care about the order in which rows were inserted into the table. If you need specific order, always use ORDER BY. In your example ORDER BY is ambiguous, unless you include PK into the ORDER BY. Besides, LAST_VALUE function can return odd results if you are not careful - see below.

You can get your expected result using MAX or LAST_VALUE (SQLFiddle). They are equivalent in this case:

SELECT
    PK, Id1, Id2
    ,MAX(PK) OVER (PARTITION BY Id1, Id2) AS MaxValue
    ,LAST_VALUE(PK) OVER (PARTITION BY Id1, Id2 ORDER BY PK rows between unbounded preceding and unbounded following) AS LastValue
FROM
    Data
ORDER BY id1, id2, PK

Result of this query will be the same regardless of the order in which rows were originally inserted into the table. You can try to put INSERT statements in different order in the fiddle. It doesn't affect the result.

Also, LAST_VALUE behaves not quite as you'd intuitively expect with default window (when you have just ORDER BY in the OVER clause). Default window is ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW, while you'd expected it to be ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING. Here is a SO answer with a good explanation. The link to this SO answer is on MSDN page for LAST_VALUE. So, once the row window is specified explicitly in the query it returns what is needed.

If you want to know the order in which rows were inserted into the table, I think, the most simple way is to use IDENTITY. So, definition of your table would change to this:

CREATE TABLE Data 
(PK INT IDENTITY(1,1) PRIMARY KEY,
Id1 INT,
Id2 INT)

When you INSERT into this table you don't need to specify the value for PK, the server would generate it automatically. It guarantees that generated values are unique and growing (with positive increment parameter), even if you have many clients inserting into the table at the same time simultaneously. There may be gaps between generated values, but the relative order of the generated values will tell you which row was inserted after which row.

LAST_VALUE() with ASC and FIRST_VALUE with DESC return different results

There's a subtlety on how OVER() functions work when they have an (ORDER BY): They work incrementally.

See this query:

SELECT x, y, 
       FIRST_VALUE(x) OVER(ORDER BY y) first, 
       LAST_VALUE(x) OVER(ORDER BY y DESC) last,
       SUM(x) OVER() plain_sum_over, 
       SUM(x) OVER(ORDER BY y) sum_over_order
FROM (SELECT 1 x, 1 y),(SELECT 2 x, 2 y),(SELECT 3 x, 3 y),(SELECT 4 x, 4 y)

Sample Image

The plain_sum_over and sum_over_order reveal the secret: With an ORDER BY you get incremental results - and that's what you are witnessing in your results.

LAST_VALUE function in sql server 2012

You'll need to tell SQL Server which rows to include in the window, by default for these functions this will be "RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW" or in shorthand "ROWS UNBOUNDED PRECEDING", meaning include all rows from the start of the window untill the current row. So knowing this, the following would result in something you'd expected.

select a,b,c,
  FIRST_VALUE(c) over (partition by a, b order by c asc 
    ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as first_date,
  LAST_VALUE(c) over (partition by a, b order by c asc 
    ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as last_date,
  row_number() over (partition by a, b order by c asc) as rn
from #temp

PS: this gives the same result but is a bit more readable and probably faster.

select a,b,c,
  min(c) over (partition by a, b ) as first_date,
  max(c) over (partition by a, b) as last_date,
  row_number() over (partition by a, b order by c asc) as rn
 from #temp

Why is LAST_VALUE() not working in SQL Server?

I believe you are looking to use ROW_NUMBER() and the get the last value based on payperiodstart date:

SELECT t.EntityId
    ,t.EmployeeId
    ,t.LastIsFullTimeValue
FROM (
    SELECT EntityId
        ,EmployeeId
        ,ROW_NUMBER() OVER (
            PARTITION BY EntityId
            ,EmployeeId ORDER BY PayPeriodStart DESC
            ) AS rn
        ,LastIsFullTimeValue
    FROM dbo.Payroll
    WHERE EmployeeId = 316691   -- you could probably put this in your outer query instead
        AND PayPeriodStart <= '12/31/2014'
        AND PayPeriodEnd >= '1/1/2014'
    ) t
WHERE t.rn = 1;

Unexpected results when using FIRST_VALUE() in SQL Server 2012

The provided examples show very clearly that there is an inconsistency in the implementation of the FIRST_VALUE() analytic function.

Depending on whether the underlying table in the FROM clause is a base table (or temporary or a table variable or even a derived table created on the fly) in one case and a derived table (or cte) created by the LEFT JOIN of two created on the fly tables in the second case, the results are different. Seems like the NULL values are ignored in the 2nd case or treated as high values.

And they shouldn't be different, because the result of a SQL query should not depend on how the FROM clause gets the values of the table it provides to the SELECT clause and also because documentation of the OVER clause clearly states how NULL values should be treated:

order_by_expression
Specifies a column or expression on which to sort. order_by_expression can only refer to columns made available by the FROM clause. An integer cannot be specified to represent a column name or alias.
...
ASC | DESC
Specifies that the values in the specified column should be sorted in ascending or descending order. ASC is the default sort order. Null values are treated as the lowest possible values.

So, the correct results - according to the SQL-Server documentation - are the ones that do not ignore NULL values. Any other result should not happen and since it does happen, it is a bug.

I suggested you test in the most recent version (and not only in the RTM) as it may have been identified and corrected in some service pack or update and if it's still there (or if you don't have a newrer version available) to submit this as a bug in the Connect site.

Update

For future reference, the bug was submitted by the OP. The link is: Connect item and (our) @Aaron Bertrand has commented there that it also appears in most current SQL 2014 builds.

Incorrect results when using last_value()

The default window for FIRST_VALUE and LAST_VALUE is ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. I.e. it is the first resp. last value "so far".

You want it for the whole dataset, however, so you must explicitely describe the window range:

SELECT DISTINCT
  region,
  FIRST_VALUE(variety) OVER 
    (PARTITION BY region ORDER BY price DESC
     ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS expensive,
  LAST_VALUE(variety) OVER 
     (PARTITION BY region ORDER BY price DESC
      ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS cheapest
FROM wine_list;

Using GROUP BY with FIRST_VALUE and LAST_VALUE

SELECT 
    MIN(MinuteBar) AS MinuteBar5,
    Opening,
    MAX(High) AS High,
    MIN(Low) AS Low,
    Closing,
    Interval
FROM 
(
    SELECT FIRST_VALUE([Open]) OVER (PARTITION BY DATEDIFF(MINUTE, '2015-01-01 00:00:00', MinuteBar) / 5 ORDER BY MinuteBar) AS Opening,
           FIRST_VALUE([Close]) OVER (PARTITION BY DATEDIFF(MINUTE, '2015-01-01 00:00:00', MinuteBar) / 5 ORDER BY MinuteBar DESC) AS Closing,
           DATEDIFF(MINUTE, '2015-01-01 00:00:00', MinuteBar) / 5 AS Interval,
           *
    FROM #MinuteData
) AS T
GROUP BY Interval, Opening, Closing

A solution close to your current one. There are two places you did wrong.

FIRST_VALUE AND LAST_VALUE are Analytic Functions, which work on a window or partition, instead of a group. You can run the nested query alone and see its result.
LAST_VALUE is the last value of current window, which is not specified in your query, and a default window is rows from the first row of current partition to current row. You can either use FIRST_VALUE with descending order or specify a window
```
LAST_VALUE([Close]) OVER (PARTITION BY DATEDIFF(MINUTE, '2015-01-01 00:00:00', MinuteBar) / 5 
            ORDER BY MinuteBar 
            ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS Closing,
```

Sql: Last_Value() Returns Wrong Result (But First_Value() Works Fine)