Distinct() and Orderby Issue

DISTINCT() and ORDERBY issue

From the Queryable.Distinct documentation;

The expected behavior is that it returns an unordered sequence of the unique items in source.

In other words, any order the existing IQueryable has is lost when you use Distinct() on it.

What you want is probably something more like this, an OrderBy() after the Distinct() is done;

var query = (from o in db.Orders
select new
{
o.CustomerID
}).Distinct().OrderBy(x => x.CustomerID);

How to use DISTINCT and ORDER BY in same SELECT statement?

The problem is that the columns used in the ORDER BY aren't specified in the DISTINCT. To do this, you need to use an aggregate function to sort on, and use a GROUP BY to make the DISTINCT work.

Try something like this:

SELECT DISTINCT Category, MAX(CreationDate) 
FROM MonitoringJob
GROUP BY Category
ORDER BY MAX(CreationDate) DESC, Category

SQL: Distinct and OrderBy issue

There are a few things to note here. I'm really not fond of "natural joins" as they simply disguise useful detail in my view, so I have not used them. I had to assume that the table "GROUP" joins via CONCENTRATOR_GROUP for an example of that missing detail.

The table name "GROUP" isn't a great idea as it is a very commonly used reserved word. I'd not recommend using such a word as a table name. Due to this "GROUP" is quoted (it isn't normal to quote object names in Oracle my experience).

You talk about "distinct" as if it has some magical quality that I should intuitively understand. It doesn't, and I don't. Let's say there are just 2 departments both are also "distinct"

DeptX
DeptY

So now let's assume there are 2 concentrators, both of these are "distinct" too:

ConcenA
ConcenB

Both concentrators are used in both departments, so we produce this query:

select distinct 
c.name as c_name, d.name as d_name
from concentrators c
inner join departments d on c.dept_id=d.dept_id

The result is:

ConcenA DeptX
ConcenB DeptX
ConcenA DeptY
ConcenB DeptY

All 4 rows are "distinct"

The point is that "select distinct" is a "row operator", i.e. it considers the entire row to determine if any part of the row is different to all other rows. There are no subtleties or options to "select distinct", it always works the same way (over the entire row). So, with this in mind, we now know that "select distinct" simply is not going to be the right technique (and due to the technical definition of distinct you might also sense it isn't a good way to describe your problem either).

So, as "select distinct" isn't the right technique typically one can turn to these as techniques: "group by" or "row_number()"
because these do give us subtleties and options.

Now you haven't explained why or how you would choose just one department (in fact, to me, it sounds weird you would choose just one) but below I offer you A way to do this using row_number() and the "subtlety" being used is the ORDER BY which gives the number 1 to the first Department Name in alphabetic order, all other departments get more than 1; and this occurs for each CONCENTRATOR_ID because row_number() is "partitioned by" that field.

    SELECT
department_name
, type_name
, NAME
, CONCENTRATOR_ID
, INTERNALADDRESS
, TYPE_ID
, DEPARTMENT_ID
FROM (

SELECT
d.NAME AS department_name
, t.NAME AS type_name
, c.CONCENTRATOR_ID
, c.NAME
, c.INTERNALADDRESS
, c.TYPE_ID
, c.DEPARTMENT_ID
, ROW_NUMBER() OVER (PARTITION BY c.CONCENTRATOR_ID
ORDER BY d.NAME, t.NAME, c.NAME) AS RN
FROM CONCENTRATOR c
LEFT OUTER JOIN CONCENTRATOR_GROUP cg
ON c.CONCENTRATOR_ID = cg.CONCENTRATOR_ID
LEFT OUTER JOIN "GROUP" g
ON cg.GROUP_ID = g.GROUP_ID
LEFT OUTER JOIN TYPE t
ON c.TYPE_ID = t.TYPE_ID
LEFT OUTER JOIN DEPARTMENT d
ON c.DEPARTMENT_ID = c.DEPARTMENT_ID
) sq
WHERE RN = 1 /* HERE is where we restrict output to one department per concentrator */
ORDER BY
NAME ASC
, CONCENTRATOR_ID
;

I have no reason to change the type of joins as you can see they remain as left outer joins - but I suspect there may be no valid reason for all or some of these. Do use the more efficient INNER JOIN if you can.

Entity Framework Distinct with OrderBy

Your problem is that you are sorting before using Distinct. As taken from your comments, you said you cant order by Id because its not available. You will have to include the Id in the Select query then, however that Id is unique so the distinct could include duplicate partner entries.

Your only option is to use the Id field to preserve some order (since Ids are incremental) with an aggregate function.

var list = db.Orders.AsNoTracking()
.GroupBy(order => order.PartnerId)
.Select(group => new {
PartnerId = x.Key,
LastId = x.Max(order => order.Id)
})
.OrderByDescending(x => x.LastId)
.Take(20)
.Select(x => x.PartnerId)
.ToList();

This works as Ids are usually incremental, but a date column would usually be preferred for this query.

I hope this helps.

Using distinct on a column and doing order by on another column gives an error

As far as i understood from your question .

distinct :- means select a distinct(all selected values should be unique).
order By :- simply means to order the selected rows as per your requirement .

The problem in your first query is
For example :
I have a table

ID name
01 a
02 b
03 c
04 d
04 a

now the query select distinct(ID) from table order by (name) is confused which record it should take for ID - 04 (since two values are there,d and a in Name column). So the problem for the DB engine is here when you say
order by (name).........

Does the performance change when I switch Distinct() and OrderBy() in a LINQ Query?

For LINQ to objects even if we assume that that OrderBy(...).Distinct() and Distinct().OrderBy(...) will return the same result (which is not guaranteed) the performance will depend on the data.

If you have a lot of duplication in data - running Distinct first should be faster. Next benchmark shows that (at least on my machine):

public class LinqBench
{
private static List<int> test = Enumerable.Range(1, 100)
.SelectMany(i => Enumerable.Repeat(i, 10))
.Select((i, index) => (i, index))
.OrderBy(t => t.index % 10)
.Select(t => t.i)
.ToList();

[Benchmark]
public List<int> OrderByThenDistinct() => test.OrderBy(i => i).Distinct().ToList();

[Benchmark]
public List<int> DistinctThenOrderBy()=> test.Distinct().OrderBy(i => i).ToList();
}

On my machine for .Net Core 3.1 it gives:

























MethodMeanErrorStdDev
OrderByThenDistinct129.74 us2.120 us1.879 us
DistinctThenOrderBy19.58 us0.384 us0.794 us

SQL Select Distinct Values, but order by a different value

If there are multiple rows for the order, which date do you want to show? perhaps:

SELECT [orderId], MAX([datetime])
FROM [table]
GROUP BY [orderId]
ORDER BY MAX([datetime]) DESC

Django order_by() is not working along with distinct()

What is database?

Try:

data = list(table_name.objects.filter(experience=experience)\
.values('run_id', 'end_time').distinct('run_id').order_by('run_id', '-end_time'))[::1]


Related Topics



Leave a reply



Submit