How to Get a Minimum Value by Group

Group by minimum value in one field while selecting distinct rows

How about something like:

SELECT mt.*     
FROM MyTable mt INNER JOIN
(
SELECT id, MIN(record_date) AS MinDate
FROM MyTable
GROUP BY id
) t ON mt.id = t.id AND mt.record_date = t.MinDate

This gets the minimum date per ID, and then gets the values based on those values. The only time you would have duplicates is if there are duplicate minimum record_dates for the same ID.

How to get a minimum value by group

Here's a step-by-step R base solution:

# renaming for easy handle
x <- test.df$date
g <- test.df$id
# getting min
split(x, g) <- lapply(split(x, g), min)
# merging
test.df$first.login <- do.call("c", split(x, g))
#printting result
test.df
id date first.login
1 1 2016-02-13 2016-02-13
2 1 2016-06-01 2016-02-13
3 1 2016-09-01 2016-02-13
4 3 2015-08-02 2015-08-02
5 3 2015-09-21 2015-08-02
6 3 2016-12-01 2015-08-02
7 3 2017-02-11 2015-08-02

Actually this is how ave Works inside

Extract row corresponding to minimum value of a variable by group

Slightly more elegant:

library(data.table)
DT[ , .SD[which.min(Employees)], by = State]

State Company Employees
1: AK D 24
2: RI E 19

Slighly less elegant than using .SD, but a bit faster (for data with many groups):

DT[DT[ , .I[which.min(Employees)], by = State]$V1]

Also, just replace the expression which.min(Employees) with Employees == min(Employees), if your data set has multiple identical min values and you'd like to subset all of them.

See also Subset rows corresponding to max value by group using data.table.

Select Rows with min value for each group

But the best would be save dates as date column. The you can use all function for dates

CREATE TABLE table1 (
[Date] varchar(10),
[Container ID] INTEGER
);

INSERT INTO table1
([Date], [Container ID])
VALUES
('1/1', '1'),
('2/2', '1'),
('3/3', '1'),
('4/4', '2'),
('5/5', '2'),
('6/6', '3'),
('7/7', '3');
GO
SELECT MIN([Date]), [Container ID] FROM table1 GROUP BY [Container ID]
GO

(No column name) | Container ID
:--------------- | -----------:
1/1 | 1
4/4 | 2
6/6 | 3

db<>fiddle here

How to get min value by group conditional on another group in python?

Use DataFrame.sort_values by all 3 columns and then remove duplicates by id column with DataFrame.drop_duplicates:

df['date'] = pd.to_datetime(df['date'])

df = (df.sort_values(['id','value','date'], ascending=[True, False, True])
.drop_duplicates(['id']))
print (df)
id date value
0 1 2020-01-01 1
3 2 2020-01-01 1
5 3 2020-01-05 1
6 4 2020-01-05 0

Only keep the minimum value of each group

With .SD:

dataz[,.SD[value==min(value)],by=.(group)]
group value
<char> <num>
1: ZAS 0.39590814
2: Car 0.42591138
3: EEE 0.07049145
4: EEff 0.34670793
5: 2133 0.05702904
6: EETTE 0.31071582

How can I get the minimum values ​and date of each group?

Most effective query for such request is usage of Window Functions, which is not supported by EF and, I think, this will never happen.
So just use SQL and run it via Dapper, whatever.

SELECT 
s.Code,
s.Value,
s.Date
FROM
(
SELECT
t1.Code,
t2.Value,
t2.Date,
ROW_NUMBER() OVER (PARTITION BY t1.Code ORDER BY t2.Value) AS RN
FROM TABLE1 t1
JOIN TBALE3 t3 ON t3.NameOfProtocol = t1.NameOfProtocol
LEFT JOIN TABLE2 t2 ON t1.IdGroup1 = t2.IdGroup1 AND t1.IdGroup2 = t2.IdGroup2 AND t2.FechaCaudalHistorico <= @dateFilter
WHERE t3.Protocol = 'TCP'
) s
WHERE s.RN = 1

If you are not a pure EF Core adept and still needs LINQ, you can try linq2db.EntityFrameworkCore extension which has this possibility and query can be written via LINQ:

var dateFilter = DateTime.Parse ("2020-09-16 03:00:00");
var rnQuery =
from t1 in Table1
join t3 in Table3 on t1.NameOfProtocol equals t3.NameOfProtocol
from t2 in Table2.Where(t2 => t1.IdGroup1 == t2.IdGroup1 && t1.IdGroup2 == t2.IdGroup2 && t2.FechaCaudalHistorico <= dateFilter)
.DefaultIfEmpty()
where t3.Protocol == "TCP"
select new
{
t1.Code,
Value = Sql.ToNullable(t2.Value),
Date = Sql.ToNullable(t2.Date),
RN = Sql.Ext.RowNumber().Over().PartitionBy(t1.Code).OrderBy(t2.Value).ToValue()
};

var query = from s in rnQuery
where s.RN == 1
select new
{
s.Code,
s.Value,
s.Date,
};

// switch to alternative LINQ parser
query = query.ToLinqToDB();

var result = query.ToList();

As result you will have identical SQL as described above.

Pyspark groupBy: Get minimum value for column but retrieve value from different column of same row

Create a window function, then use a groupby. The idea is to create the first_tyre_pressure column before doing the groupby. To create this column we need the window function.

from pyspark.sql import functions as F
from pyspark.sql import Window

w = Window.partitionBy('race_id', 'car_type', 'car_make', 'driver').orderBy('time_recorded')

df.withColumn('start_tyre_pressure', F.first('tyre_pressure').over(w).alias('start_tyre_pressure'))\
.groupby('race_id', 'car_type', 'car_make', 'driver', 'start_tyre_pressure')\
.agg(F.min('time_recorded').alias('start_time'),
F.max('time_recorded').alias('end_time')).show()

Output

+-------+---------+--------+------+-------------------+----------+--------+
|race_id| car_type|car_make|driver|start_tyre_pressure|start_time|end_time|
+-------+---------+--------+------+-------------------+----------+--------+
| 1|automatic| mazda| bob| 31| 09:32| 09:43|
| 2|automatic| merc| linda| 33| 10:11| 10:27|
| 1| manual| ford| juan| 35| 09:32| 09:53|
+-------+---------+--------+------+-------------------+----------+--------+

Pandas GroupBy and select rows with the minimum value in a specific column

I feel like you're overthinking this. Just use groupby and idxmin:

df.loc[df.groupby('A').B.idxmin()]

A B C
2 1 2 10
4 2 4 4

df.loc[df.groupby('A').B.idxmin()].reset_index(drop=True)

A B C
0 1 2 10
1 2 4 4

How to Use window function to get minimum value for bunch of groups but grouped in inplace sorted manner

I think I understand what you need, you need to define a group for each block of types and then get the minimum date per block:

This is a type of gaps-and-islands problem. In the CTE we assign an incrementing row number for only type LF, and subtract from it a row number for all rows, this results in the same value for consecutive types. This then provides the mechanism to partition/group by in order to get the minimum date per block of consecutive types.

with grouped as (
select id, date, type,
case when type='LF' then Row_Number() over (partition by id, type order by date) end -Row_Number() over (partition by id order by date) gp
from customer
)
select date, id, type,
case when type='LF' then Min(date) over(partition by gp) end New_Date
from grouped
order by date


Related Topics



Leave a reply



Submit