How to Get Matching Data from Another SQL Table For Two Different Columns: Inner Join And/Or Union

How to get matching data from another SQL table for two different columns: Inner Join and/or Union?

(The following applies when every row is SQL DISTINCT, and outside SQL code similarly treats NULL like just another value.)

Every base table has a statement template, aka predicate, parameterized by column names, by which we put a row in or leave it out. We can use a (standard predicate logic) shorthand for the predicate that is like its SQL declaration.

-- facilitator [facilID] is named [facilFname] [facilLname]
facilitator(facilID, facilLname, facilFname)
-- class [classID] named [className] has prime [primeFacil] & backup [secondFacil]
class(classID, className, primeFacil, secondFacil)

Plugging a row into a predicate gives a statement aka proposition. The rows that make a true proposition go in a table and the rows that make a false proposition stay out. (So a table states the proposition of each present row and states NOT the proposition of each absent row.)

-- facilitator f1 is named Jane Doe
facilitator(f1, 'Jane', 'Doe')
-- class c1 named CSC101 has prime f1 & backup f8
class(c1, 'CSC101', f1, f8)

But every table expression value has a predicate per its expression. SQL is designed so that if tables T and U hold the (NULL-free non-duplicate) rows where T(...) and U(...) (respectively) then:

  • T CROSS JOIN U holds rows where T(...) AND U(...)
  • T INNER JOIN U ONcondition holds rows where T(...) AND U(...) AND condition
  • T LEFT JOIN U ONcondition holds rows where (for U-only columns U1,...)

        T(...) AND U(...) AND condition

    OR T(...)

        AND NOT there EXISTS values for U1,... where [U(...) AND condition]

        AND U1 IS NULL AND ...
  • T WHEREcondition holds rows where T(...) AND condition
  • T INTERSECT U holds rows where T(...) AND U(...)
  • T UNION U holds rows where T(...) OR U(...)
  • T EXCEPT U holds rows where T(...) AND NOT U(...)
  • SELECT DISTINCT * FROM T holds rows where T(...)
  • SELECT DISTINCTcolumns to keepFROM T holds rows where

    there EXISTS values for columns to drop where T(...)
  • VALUES (C1, C2, ...)((v1,v2, ...), ...) holds rows where

    C1 = v1 AND C2 = v2 AND ... OR ...

Also:

  • (...) IN T means T(...)
  • scalar= T means T(scalar)
  • T(..., X, ...) AND X = Y means T(..., Y, ...) AND X = Y

So to query we find a way of phrasing the predicate for the rows that we want in natural language using base table predicates, then in shorthand using base table predicates, then in shorthand using aliases in column names except for output columns, then in SQL using base table names plus ON & WHERE conditions etc. If we need to mention a base table twice then we give it aliases.

-- natural language
there EXISTS values for classID, primeFacil & secondFacil where
class [classID] named [className]
has prime [primeFacil] & backup [secondFacil]
AND facilitator [primeFacil] is named [pf.facilFname] [pf.facilLname]
AND facilitator [secondFacil] is named [sf.facilFname] [sf.facilLname]

-- shorthand
there EXISTS values for classID, primeFacil & secondFacil where
class(classID, className, primeFacil, secondFacil)
AND facilitator(pf.facilID, pf.facilLname, pf.facilFname)
AND pf.facilID = primeFacil
AND facilitator(sf.facilID, sf.facilLname, sf.facilFname)
AND sf.facilID = secondFacil

-- shorthand using aliases everywhere but result
-- use # to distinguish same-named result columns in specification
there EXISTS values for c.*, pf.*, sf.* where
className = c.className
AND facilLname#1 = pf.facilLname AND facilFname#1 = pf.facilFname
AND facilLname#2 = sf.facilLname AND facilFname#2 = sf.facilFname
AND class(c.classID, c.className, c.primeFacil, c.secondFacil)
AND facilitator(pf.facilID, pf.facilLname, pf.facilFname)
AND pf.facilID = c.primeFacil
AND facilitator(sf.facilID, sf.facilLname, sf.facilFname)
AND sf.facilID = c.secondFacil

-- table names & SQL (with MS Access parentheses)
SELECT className, pf.facilLname, pf.facilFname, sf.facilLname, sf.facilFname
FROM (class JOIN facilitator AS pf ON pf.facilID = primeFacil)
JOIN facilitator AS sf ON sf.facilID = secondFacil

OUTER JOIN would be used when a class doesn't always have both facilitators or something doesn't always have all names. (Ie if a column can be NULL.) But you haven't given the specific predicates for your base table and query or the business rules about when things might be NULL so I have assumed no NULLs.

Is there any rule of thumb to construct SQL query from a human-readable description?

(Re MS Access JOIN parentheses see this from SO and this from MS.)

Unioning two tables with different number of columns

Add extra columns as null for the table having less columns like

Select Col1, Col2, Col3, Col4, Col5 from Table1
Union
Select Col1, Col2, Col3, Null as Col4, Null as Col5 from Table2

How can I merge the columns from two tables into one output?

Specifying the columns on your query should do the trick:

select a.col1, b.col2, a.col3, b.col4, a.category_id 
from items_a a, items_b b
where a.category_id = b.category_id

should do the trick with regards to picking the columns you want.

To get around the fact that some data is only in items_a and some data is only in items_b, you would be able to do:

select 
coalesce(a.col1, b.col1) as col1,
coalesce(a.col2, b.col2) as col2,
coalesce(a.col3, b.col3) as col3,
a.category_id
from items_a a, items_b b
where a.category_id = b.category_id

The coalesce function will return the first non-null value, so for each row if col1 is non null, it'll use that, otherwise it'll get the value from col2, etc.

get results from multiple tables using union or left join

Seems like query with Union All is faster than the Query with Left joins (at least for this scenario).

Left join query runs full scan three times (with nested loops)

Explain Left Join

But using Union all there are only two table scans

Explain Union All

How to do a mysql select query on a table with two columns of foreign keys that relate to another table of names

You must join wp_divisions with wp_players twice:

select
d.Div_id,
p1.display_name player1,
p2.display_name player2
from wp_divisions d
inner join wp_players p1 on p1.ID = d.div_player1_id
inner join wp_players p2 on p2.ID = d.div_player2_id

If there is a case that div_player1_id or div_player2_id is null then use left joins instead of inner joins.

How to make two joins between two tables in MySQL such that they are interlinked to each other?

How to proceed towards the solution

Two joins can be formed between two columns by using Table aliases. As the question specifies, that one join is to be formed between the employee and the branch table, and another join needs to be formed between the branch and the employee table. The little bit tricky part of these types of joins is the relation specified after the ON keyword that joins the two tables.

As @philipxy writes in a comment to this question:

Constraints (including FKs & PKs) need not hold, be declared or be known in order to record or query. Joins are binary, the left table is the result of any previous joins in a series without parentheses. Except for output column order, inner & cross joins have no direction, t join u on c is u join t on c.

So according to the comment, we would form a join between employee and branch and another join between employee and an alias of branch table called branch2. The common confusion here is that most people(including me earlier) think that there is a "direction" of joins, the thing that philipxy covers in his aforementioned comment.

The solution to the problem

You can write a SQL query which queries the first_name, last_name and branch_id from the employee table and the branch_name from the branch table and forms a join between the two tables on the basis of branch_id. You have to query the mgr_id from the alias of the branch table called branch2; you have to query the first_name and the last_name of the branch managers from the employee table. You can easily join the employee and the branch table on the basis of emp_id such that the mgr_id=emp_id.

You can finally write the SQL query for the problem like this:

SELECT employee.first_name, employee.last_name, employee.branch_id,
branch.branch_name,
branch2.mgr_id, employee.first_name AS manager_first_name, employee.last_name AS manager_last_name
FROM employee
JOIN branch ON employee.branch_id=branch.branch_id
JOIN branch branch2 ON branch2.mgr_id=employee.emp_id;

Extra information

The above mentioned query would return this:

+------------+-----------+-----------+-------------+--------+--------------------+-------------------+
| first_name | last_name | branch_id | branch_name | mgr_id | manager_first_name | manager_last_name |
+------------+-----------+-----------+-------------+--------+--------------------+-------------------+
| David | Wallace | 1 | Corporate | 100 | David | Wallace |
| Michael | Scott | 2 | Scranton | 102 | Michael | Scott |
| Josh | Porter | 3 | Stamford | 106 | Josh | Porter |
+------------+-----------+-----------+-------------+--------+--------------------+-------------------+

These results might look useless as we have formed an INNER JOIN between the tables so it just returns us the name of the employees who are "managers" of a specific branch. If you form a LEFT JOIN between the tables instead of an INNER JOIN you would get results like this:

+------------+-----------+-----------+-------------+--------+--------------------+-------------------+
| first_name | last_name | branch_id | branch_name | mgr_id | manager_first_name | manager_last_name |
+------------+-----------+-----------+-------------+--------+--------------------+-------------------+
| David | Wallace | 1 | Corporate | 100 | David | Wallace |
| Jan | Levinson | 1 | Corporate | NULL | Jan | Levinson |
| Michael | Scott | 2 | Scranton | 102 | Michael | Scott |
| Angela | Martin | 2 | Scranton | NULL | Angela | Martin |
| Kelly | Kapoor | 2 | Scranton | NULL | Kelly | Kapoor |
| Stanley | Hudson | 2 | Scranton | NULL | Stanley | Hudson |
| Josh | Porter | 3 | Stamford | 106 | Josh | Porter |
| Andy | Bernard | 3 | Stamford | NULL | Andy | Bernard |
| Jim | Halpert | 3 | Stamford | NULL | Jim | Halpert |
+------------+-----------+-----------+-------------+--------+--------------------+-------------------+

These results were not as expected as the employees who are not managers of any branch just have a mgr_id with NULL value whereas the branch that they word in actually has a manager. With the mgr_id being NULL, the manager_first_name and manager_last_name have unexpected results too.

The above occurs because we cannot have the same manager for two employees because mgr_id can not be the same accross rows as it is the emp_id which is the PRIMARY KEY of the employee table.

Credits

  • @philpxy 's comments on this question


Related Topics



Leave a reply



Submit