Is Too Many Left Joins a Code Smell

Is too many Left Joins a code smell?

It's a perfectly legitimate solution for some designs.

Say you have a hierarchy of one-to-many relations like Customer - Order - Basket - Item - Price, etc., which can be unfilled on any level: a Customer may have no Orders, an Order can have no Baskets, etc.

In this case you issue something like:

SELECT  *
FROM Customer c
LEFT OUTER JOIN
Order o
ON o.CustomerID = c.ID
LEFT OUTER JOIN
Basket b
ON b.OrderID = c.ID

Note that it may be inefficient in some cases, and may be replaced with EXISTS or NOT EXISTS (if you only want to figure out that the corresponding records exist or do not exist in other tables).

See this article in my blog for performance details:

  • Finding incomplete orders - how to benefit from replacing LEFT JOIN's with NOT EXISTS

Is a long IN clause a code smell?

I think it is a code smell. For one thing, databases have limits as to the number of elements allowed in an IN clause, and if your SQL is generated dynamically, you may eventually bump up against those limits.

When the list starts to become long-ish, I would convert to using a stored procedure with a temporary table, to avoid any chance of errors.

I doubt performance is a major concern though, IN clauses are very fast, as they can short-circuit, unlike NOT IN clauses.

Query using CASE in left outer join takes long time to run

For proper answer attach full query, table structure (with indexes) and execution plan.

Original CASE is quite complicated, but hard to say if it's responsible for query performance without information from execution plan.

CASE 
WHEN da.unique_key = fo.dimension__first_report__primary_releasing_actor_key IS NOT NULL
THEN da.unique_key = fo.dimension__first_report__primary_releasing_actor_key
ELSE da.unique_key = fo.dimension__first_dictation__dictating_actor_key
END

This case can be transformed to

da.unique_key =  
CASE WHEN da.unique_key = fo.dimension__first_report__primary_releasing_actor_key IS NOT NULL
THEN fo.dimension__first_report__primary_releasing_actor_key
ELSE fo.dimension__first_dictation__dictating_actor_key
END

or even

da.unique_key = coalesce (fo.dimension__first_report__primary_releasing_actor_key ,fo.dimension__first_dictation__dictating_actor_key)

This should give a optimizer (and everybody else) better understanding which column (in da table) is key for joining

More efficient left join of big table

Your query is OK, just create the following indexes:

PeopleExtra (PeopleID) INCLUDE (DataNumber, ElementID)
FormElements (FormComponentID, FormElementID)

Rewriting the join is not required (SQL Server's optimizer can deal with the nested queries just fine), though it can make your query more human-readable.

Why bundle version a control plugin with an IDE?

Integration of the IDE with version control and, in particular, software change management (SCM) helps bringing together the philosophies of the IDE and the source control system.

One example is temporary files and binaries, that should not be checked-in and, e.g. in Visual Studio, end up within the source directory if you're not carefully creating new project and solution templates with a non-default directory configuration.

Another could be tracking of work items and complex bug fixes.

Also it saves some ceremony and context-switching when editing files.

Advanced integrations may also allow to push the change management system's concept of "configuration" ("branch", "tag", "view") into the IDE.

ClearCase integration, however, is clearly not "advanced".

Improve the speed of this string_agg?

The subquery (within a subquery) you have has a code "smell" to it that it's been written with intention, but not correctly.

Firstly you have 2 LEFT JOINs in the subquery, however, both the tables aliased as P3 and PP3 are required to have a non-NULL value; that is impossible if no related row is found. This means the JOINs are implicit INNER JOINs.

Next you have a DISTINCT against a single column when SELECTing from multiple tables; this seems wrong. DISTINCT is very expensive and the fact you are using it implies that either NAME is not unique or that due to your implicit INNER JOINs you are getting duplicate rows. I assume it's the latter. As a results, very likely you should actually be using an EXISTS, not LEFT JOINs INNER JOINs.

The following is very much a guess, but I suspect it will be more performant.

SELECT BOM.*, --Replace this with an explicit list of the columns you need
SA.childParentPartProjectName
INTO #tt2
FROM #tt1 BOM
CROSS APPLY (SELECT STRING_AGG(Prj.NAME, ', ') WITHIN GROUP (ORDER BY Prj.NAME) AS childParentPartProjectName
FROM dbo.Project Prj --Don't use an alias that is longer than the object name
WHERE EXISTS (SELECT 1
FROM dbo.Part P
JOIN dbo.PartProject PP ON P.ID = PP.SOURCE_ID
WHERE PP.Related_ID = Prg.ID
AND P.ITEM_NUMBER = BOM.childParentPart
AND P.Current = 1)) SA;

Replace OUTER APPLY

You can add a ROW_NUMBER to your subquery (and remove the TOP 1). Then you can use a LEFT JOIN.

Something like this:

SELECT e.Id,
Decision.Comment,
Decision.DATE,
Decision.IsRejected,
Decision.CommentedBy
FROM core.Event e
LEFT JOIN (
SELECT ESH.Event_StatusHistory_Comment [Comment],
ESH.Event_StatusHistory_Date [Date],
ESH.Event_StatusHistory_IsRejected [IsRejected],
U.[Name] [CommentedBy],
ROW_NUMBER() OVER (PARTITON BY e2.ID ORDER BY ESH.Event_StatusHistory_Date) as RN
FROM core.[Event] e2
JOIN core.Event_StatusHistory ESH
ON ESH.EventId = e2.Id
JOIN core.[User] U
ON ESH.Event_StatusHistory_UserId = U.Id
) Decision
ON e.id = Decision.id
AND Decision.RN = 1;


Related Topics



Leave a reply



Submit