Group by minimum value in one field while selecting distinct rows
How about something like:
SELECT mt.*
FROM MyTable mt INNER JOIN
(
SELECT id, MIN(record_date) AS MinDate
FROM MyTable
GROUP BY id
) t ON mt.id = t.id AND mt.record_date = t.MinDate
This gets the minimum date per ID, and then gets the values based on those values. The only time you would have duplicates is if there are duplicate minimum record_dates for the same ID.
How to get a minimum value by group
Here's a step-by-step R base solution:
# renaming for easy handle
x <- test.df$date
g <- test.df$id
# getting min
split(x, g) <- lapply(split(x, g), min)
# merging
test.df$first.login <- do.call("c", split(x, g))
#printting result
test.df
id date first.login
1 1 2016-02-13 2016-02-13
2 1 2016-06-01 2016-02-13
3 1 2016-09-01 2016-02-13
4 3 2015-08-02 2015-08-02
5 3 2015-09-21 2015-08-02
6 3 2016-12-01 2015-08-02
7 3 2017-02-11 2015-08-02
Actually this is how ave
Works inside
Extract row corresponding to minimum value of a variable by group
Slightly more elegant:
library(data.table)
DT[ , .SD[which.min(Employees)], by = State]
State Company Employees
1: AK D 24
2: RI E 19
Slighly less elegant than using .SD
, but a bit faster (for data with many groups):
DT[DT[ , .I[which.min(Employees)], by = State]$V1]
Also, just replace the expression which.min(Employees)
with Employees == min(Employees)
, if your data set has multiple identical min values and you'd like to subset all of them.
See also Subset rows corresponding to max value by group using data.table.
Select Rows with min value for each group
But the best would be save dates as date column. The you can use all function for dates
CREATE TABLE table1 (
[Date] varchar(10),
[Container ID] INTEGER
);
INSERT INTO table1
([Date], [Container ID])
VALUES
('1/1', '1'),
('2/2', '1'),
('3/3', '1'),
('4/4', '2'),
('5/5', '2'),
('6/6', '3'),
('7/7', '3');
GO
SELECT MIN([Date]), [Container ID] FROM table1 GROUP BY [Container ID]
GO
(No column name) | Container ID
:--------------- | -----------:
1/1 | 1
4/4 | 2
6/6 | 3
db<>fiddle here
How to get min value by group conditional on another group in python?
Use DataFrame.sort_values
by all 3 columns and then remove duplicates by id
column with DataFrame.drop_duplicates
:
df['date'] = pd.to_datetime(df['date'])
df = (df.sort_values(['id','value','date'], ascending=[True, False, True])
.drop_duplicates(['id']))
print (df)
id date value
0 1 2020-01-01 1
3 2 2020-01-01 1
5 3 2020-01-05 1
6 4 2020-01-05 0
Only keep the minimum value of each group
With .SD
:
dataz[,.SD[value==min(value)],by=.(group)]
group value
<char> <num>
1: ZAS 0.39590814
2: Car 0.42591138
3: EEE 0.07049145
4: EEff 0.34670793
5: 2133 0.05702904
6: EETTE 0.31071582
How can I get the minimum values and date of each group?
Most effective query for such request is usage of Window Functions, which is not supported by EF and, I think, this will never happen.
So just use SQL and run it via Dapper, whatever.
SELECT
s.Code,
s.Value,
s.Date
FROM
(
SELECT
t1.Code,
t2.Value,
t2.Date,
ROW_NUMBER() OVER (PARTITION BY t1.Code ORDER BY t2.Value) AS RN
FROM TABLE1 t1
JOIN TBALE3 t3 ON t3.NameOfProtocol = t1.NameOfProtocol
LEFT JOIN TABLE2 t2 ON t1.IdGroup1 = t2.IdGroup1 AND t1.IdGroup2 = t2.IdGroup2 AND t2.FechaCaudalHistorico <= @dateFilter
WHERE t3.Protocol = 'TCP'
) s
WHERE s.RN = 1
If you are not a pure EF Core adept and still needs LINQ, you can try linq2db.EntityFrameworkCore extension which has this possibility and query can be written via LINQ:
var dateFilter = DateTime.Parse ("2020-09-16 03:00:00");
var rnQuery =
from t1 in Table1
join t3 in Table3 on t1.NameOfProtocol equals t3.NameOfProtocol
from t2 in Table2.Where(t2 => t1.IdGroup1 == t2.IdGroup1 && t1.IdGroup2 == t2.IdGroup2 && t2.FechaCaudalHistorico <= dateFilter)
.DefaultIfEmpty()
where t3.Protocol == "TCP"
select new
{
t1.Code,
Value = Sql.ToNullable(t2.Value),
Date = Sql.ToNullable(t2.Date),
RN = Sql.Ext.RowNumber().Over().PartitionBy(t1.Code).OrderBy(t2.Value).ToValue()
};
var query = from s in rnQuery
where s.RN == 1
select new
{
s.Code,
s.Value,
s.Date,
};
// switch to alternative LINQ parser
query = query.ToLinqToDB();
var result = query.ToList();
As result you will have identical SQL as described above.
Pyspark groupBy: Get minimum value for column but retrieve value from different column of same row
Create a window function, then use a groupby. The idea is to create the first_tyre_pressure
column before doing the groupby. To create this column we need the window function.
from pyspark.sql import functions as F
from pyspark.sql import Window
w = Window.partitionBy('race_id', 'car_type', 'car_make', 'driver').orderBy('time_recorded')
df.withColumn('start_tyre_pressure', F.first('tyre_pressure').over(w).alias('start_tyre_pressure'))\
.groupby('race_id', 'car_type', 'car_make', 'driver', 'start_tyre_pressure')\
.agg(F.min('time_recorded').alias('start_time'),
F.max('time_recorded').alias('end_time')).show()
Output
+-------+---------+--------+------+-------------------+----------+--------+
|race_id| car_type|car_make|driver|start_tyre_pressure|start_time|end_time|
+-------+---------+--------+------+-------------------+----------+--------+
| 1|automatic| mazda| bob| 31| 09:32| 09:43|
| 2|automatic| merc| linda| 33| 10:11| 10:27|
| 1| manual| ford| juan| 35| 09:32| 09:53|
+-------+---------+--------+------+-------------------+----------+--------+
Pandas GroupBy and select rows with the minimum value in a specific column
I feel like you're overthinking this. Just use groupby
and idxmin
:
df.loc[df.groupby('A').B.idxmin()]
A B C
2 1 2 10
4 2 4 4
df.loc[df.groupby('A').B.idxmin()].reset_index(drop=True)
A B C
0 1 2 10
1 2 4 4
How to Use window function to get minimum value for bunch of groups but grouped in inplace sorted manner
I think I understand what you need, you need to define a group for each block of types and then get the minimum date per block:
This is a type of gaps-and-islands problem. In the CTE we assign an incrementing row number for only type LF, and subtract from it a row number for all rows, this results in the same value for consecutive types. This then provides the mechanism to partition/group by in order to get the minimum date per block of consecutive types.
with grouped as (
select id, date, type,
case when type='LF' then Row_Number() over (partition by id, type order by date) end -Row_Number() over (partition by id order by date) gp
from customer
)
select date, id, type,
case when type='LF' then Min(date) over(partition by gp) end New_Date
from grouped
order by date
Related Topics
R Programming: Read.Csv() Skips Lines Unexpectedly
How to Merge Two Data Frame Based on Partial String Match with R
Logistic Regression: How to Try Every Combination of Predictors in R
Reshape Data from Wide to Long
Count Number of Distinct Values in a Vector
Change Standard Error Color for Geom_Smooth
Shiny Ui.R - Error in Tag("Div", List(...)) - Not Sure Where Error Is
Error in Install.Packages:Type =="Both" Cannot Be Used with 'Repos =Null'
Changing the Order of Dodged Bars in Ggplot2 Barplot
Selecting Multiple Parts of a List
Display Different Time Elements at Different Speeds in Gganimate
Removing Row with Duplicated Values in All Columns of a Data Frame (R)
Collapse/Concatenate/Aggregate Multiple Columns to a Single Comma Separated String Within Each Group
Function/Loop to Replace Na with Values in Adjacent Columns in R