Selecting First and Last Values in a Group

Selecting first and last values in a group

If you are using MySQL 8, the preferable solution would make use of the window functions FIRST_VALUE() and/or LAST_VALUE(), which are now available. Please have a look at Lukas Eder's answer.

But if you're using an older version of MySQL, those functions are not
supported. You have to simulate them using some kind of workarounds,
for example you could make use of the aggregated string function GROUP_CONCAT() that creates a set of all _open and _close values of the week ordered by _date for _open and by _date desc for _close, and extracting the first element of the set:

select
min(_low),
max(_high),
avg(_volume),
concat(year(_date), "-", lpad(week(_date), 2, '0')) AS myweek,
substring_index(group_concat(cast(_open as CHAR) order by _date), ',', 1 ) as first_open,
substring_index(group_concat(cast(_close as CHAR) order by _date desc), ',', 1 ) as last_close
from
mystockdata
group by
myweek
order by
myweek
;

Another solution would make use of subqueries with LIMIT 1 in the SELECT clause:

select
min(_low),
max(_high),
avg(_volume),
concat(year(_date), "-", lpad(week(_date), 2, '0')) AS myweek,
(
select _open
from mystockdata m
where concat(year(_date), "-", lpad(week(_date), 2, '0'))=myweek
order by _date
LIMIT 1
) as first_open,
(
select _close
from mystockdata m
where concat(year(_date), "-", lpad(week(_date), 2, '0'))=myweek
order by _date desc
LIMIT 1
) as last_close
from
mystockdata
group by
myweek
order by
myweek
;

Please note I added the LPAD() string function to myweek, to make the week number always two digits long, otherwise weeks won't be ordered correctly.

Also be careful when using substring_index in conjunction with group_concat(): if one of the grouped strings contains a comma, the function might not return the expected result.

Select first and last row for each group and take the column value difference in MySQL?

Using a MySQL-8.0/ MariaDB-10.2+ window function:

SELECT symbol,
LAST - FIRST AS price_change
FROM
(SELECT DISTINCT symbol,
first_value(price) OVER w AS FIRST,
last_value(price) OVER w AS LAST
FROM ticks WINDOW w AS (PARTITION BY symbol
ORDER BY date
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
) AS p

ref: fiddle

Get values from first and last row per group

This is a bit of a pain, because Postgres has the nice window functions first_value() and last_value(), but these are not aggregation functions. So, here is one way:

select t.name, min(t.week) as minWeek, max(firstvalue) as firstvalue,
max(t.week) as maxWeek, max(lastvalue) as lastValue
from (select t.*, first_value(value) over (partition by name order by week) as firstvalue,
last_value(value) over (partition by name order by week) as lastvalue
from table t
) t
group by t.name;

select first and last record of each group horizontally

I think this is much simpler using window functions and select distinct:

select distinct facility_id, name,
first_value(value) over (partition by facility_id, name order by created_at asc) as first_value,
min(created_at) as first_created_at,
first_value(value) over (partition by facility_id, name order by created_at desc) as last_value,
max(created_at) as last_created_at
from t;

No subqueries. No joins.

You can also use arrays to accomplish the same functionality, using group by. It is a shame that SQL Server doesn't directly support first_value() as a window function.

get first and last values in a groupby

Option 1

def first_last(df):
return df.ix[[0, -1]]

df.groupby(level=0, group_keys=False).apply(first_last)

Sample Image



Option 2 - only works if index is unique

idx = df.index.to_series().groupby(level=0).agg(['first', 'last']).stack()
df.loc[idx]


Option 3 - per notes below, this only makes sense when there are no NAs

I also abused the agg function. The code below works, but is far uglier.

df.reset_index(1).groupby(level=0).agg(['first', 'last']).stack() \
.set_index('level_1', append=True).reset_index(1, drop=True) \
.rename_axis([None, None])


Note

per @unutbu: agg(['first', 'last']) take the firs non-na values.

I interpreted this as, it must then be necessary to run this column by column. Further, forcing index level=1 to align may not even make sense.

Let's include another test

df = pd.DataFrame(np.arange(20).reshape(10, -1),
[list('aaaabbbccd'),
list('abcdefghij')],
list('XY'))

df.loc[tuple('aa'), 'X'] = np.nan


def first_last(df):
return df.ix[[0, -1]]

df.groupby(level=0, group_keys=False).apply(first_last)

Sample Image

df.reset_index(1).groupby(level=0).agg(['first', 'last']).stack() \
.set_index('level_1', append=True).reset_index(1, drop=True) \
.rename_axis([None, None])

Sample Image

Sure enough! This second solution is taking the first valid value in column X. It is now nonsensical to have forced that value to align with the index a.

MySQL best way to select first, second and last value in each group

The following code is untested but should give you a good idea:

SELECT
t1.userID,
t1.purchaseTime AS first,
t2.purchaseTime AS `second`,
t4.purchaseTime AS last
FROM purchaseLog t1
LEFT JOIN purchaseLog t0 ON t1.userID = t0.userID AND t0.purchaseTime < t1.purchaseTime
LEFT JOIN purchaseLog t2 ON t1.userID = t2.userID AND t1.purchaseTime < t2.purchaseTime
LEFT JOIN purchaseLog t3 ON t1.userID = t3.userID AND t1.purchaseTime < t3.purchaseTime
AND t3.purchaseTime < t2.purchaseTime
JOIN purchaseLog t4 ON t1.userID = t4.userID AND t1.purchaseTime <= t4.purchaseTime
LEFT JOIN purchaseLog t5 ON t1.userID = t5.userID AND t4.purchaseTime < t5.purchaseTime
WHERE t0.purchaseTime IS NULL AND t3.purchaseTime IS NULL AND t5.purchaseTime IS NULL

Let me break that down step-by-step:

First, I get all the rows for which no earlier row for the same userID exists:

SELECT
t1.userID,
t1.purchaseTime AS first
FROM purchaseLog t1
LEFT JOIN purchaseLog t0 ON t1.userID = t0.userID AND t0.purchaseTime < t1.purchaseTime
WHERE t0.purchaseTime IS NULL

Next, I get all the rows with a purchaseTime greater than the first purchaseTime for which there's no rows with a purchaseTime in-between the two:

SELECT
t1.userID,
t2.purchaseTime AS `second`
FROM purchaseLog t1
LEFT JOIN purchaseLog t2 ON t1.userID = t2.userID AND t1.purchaseTime < t2.purchaseTime
LEFT JOIN purchaseLog t3 ON t1.userID = t3.userID AND t1.purchaseTime < t3.purchaseTime
AND t3.purchaseTime < t2.purchaseTime
WHERE t3.purchaseTime IS NULL

Finally, I get the rows with a purchaseTime greater than or equal to the first for which no greater purchaseTime exists:

SELECT
t1.userID,
t4.purchaseTime AS last
FROM purchaseLog t1
JOIN purchaseLog t4 ON t1.userID = t4.userID AND t1.purchaseTime <= t4.purchaseTime
LEFT JOIN purchaseLog t5 ON t1.userID = t5.userID AND t4.purchaseTime < t5.purchaseTime
WHERE t5.purchaseTime IS NULL

Combine them all into one query to get the answer above.

Using GROUP BY with FIRST_VALUE and LAST_VALUE

SELECT 
MIN(MinuteBar) AS MinuteBar5,
Opening,
MAX(High) AS High,
MIN(Low) AS Low,
Closing,
Interval
FROM
(
SELECT FIRST_VALUE([Open]) OVER (PARTITION BY DATEDIFF(MINUTE, '2015-01-01 00:00:00', MinuteBar) / 5 ORDER BY MinuteBar) AS Opening,
FIRST_VALUE([Close]) OVER (PARTITION BY DATEDIFF(MINUTE, '2015-01-01 00:00:00', MinuteBar) / 5 ORDER BY MinuteBar DESC) AS Closing,
DATEDIFF(MINUTE, '2015-01-01 00:00:00', MinuteBar) / 5 AS Interval,
*
FROM #MinuteData
) AS T
GROUP BY Interval, Opening, Closing

A solution close to your current one. There are two places you did wrong.

  1. FIRST_VALUE AND LAST_VALUE are Analytic Functions, which work on a window or partition, instead of a group. You can run the nested query alone and see its result.

  2. LAST_VALUE is the last value of current window, which is not specified in your query, and a default window is rows from the first row of current partition to current row. You can either use FIRST_VALUE with descending order or specify a window

    LAST_VALUE([Close]) OVER (PARTITION BY DATEDIFF(MINUTE, '2015-01-01 00:00:00', MinuteBar) / 5 
    ORDER BY MinuteBar
    ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS Closing,

How to get the first and the last record per group in SQL Server 2008?

How about using ROW_NUMBER:

SQL Fiddle

WITH Cte AS(
SELECT *,
RnAsc = ROW_NUMBER() OVER(PARTITION BY [group] ORDER BY val),
RnDesc = ROW_NUMBER() OVER(PARTITION BY [group] ORDER BY val DESC)
FROM tbl
)
SELECT
id, [group], val, start, [end]
FROM Cte
WHERE
RnAsc = 1 OR RnDesc = 1
ORDER BY [group], val


Related Topics



Leave a reply



Submit