MacLochlainns Weblog

Michael McLaughlin's Technical Blog

Site Admin

MySQL Join Tutorial

without comments

Some believe the most important part of SQL is the ability to query data. Queries typically retrieve data by joining many tables together into useful result sets. This tutorial takes the position that visibility into the data helps those new to SQL understand how joins work. To that end, the queries use Common Tabular Expressions (CTEs) instead of tables.

Default behavior of a JOIN without a qualifying descriptor is not simple because it may return:

  • A CROSS JOIN (or Cartesian Product) when there is no ON or USING subclause, or
  • An INNER JOIN when you use an ON or USING subclause.

The following query uses JOIN without a qualifier or an ON or USING subclause. It also uses two copies of the single CTE, which is more or less a derived table and the result of a subquery held in memory. This demonstrates the key reason for table aliases. That key reason is you can put two copies of the same table in memory under different identifiers or labels.

1
2
3
4
5
6
7
WITH alpha AS
 (SELECT 'A' AS letter, 130 AS amount
  UNION
  SELECT 'B' AS letter, 150 AS amount
  UNION
  SELECT 'C' AS letter, 321 AS amount)
SELECT * FROM alpha a JOIN alpha b;

It returns a Cartesian product:

+--------+--------+--------+--------+
| letter | amount | letter | amount |
+--------+--------+--------+--------+
| A      |    130 | A      |    130 |
| B      |    150 | A      |    130 |
| C      |    321 | A      |    130 |
| A      |    130 | B      |    150 |
| B      |    150 | B      |    150 |
| C      |    321 | B      |    150 |
| A      |    130 | C      |    321 |
| B      |    150 | C      |    321 |
| C      |    321 | C      |    321 |
+--------+--------+--------+--------+
9 rows in set (0.00 sec)

By adding an ON clause to line 8, the default JOIN keyword returns an INNER JOIN result.

1
2
3
4
5
6
7
8
WITH alpha AS
 (SELECT 'A' AS letter, 130 AS amount
  UNION
  SELECT 'B' AS letter, 150 AS amount
  UNION
  SELECT 'C' AS letter, 321 AS amount)
SELECT * FROM alpha a JOIN alpha b
ON a.letter = b.letter;

It displays results, like:

+--------+--------+--------+--------+
| letter | amount | letter | amount |
+--------+--------+--------+--------+
| A      |    130 | A      |    130 |
| B      |    150 | B      |    150 |
| C      |    321 | C      |    321 |
+--------+--------+--------+--------+
3 rows in set (0.00 sec)

The next example uses two CTEs. One uses letters 'A', 'B', 'C', and D and the other uses letters 'A', 'B', 'C', and 'E'. The letter D only exists in the alpha derived table and the letter 'E' only exists in the beta derived table. The amount column values differ for their respective letters in the two CTE tables.

The basic query below the comma delimited CTEs joins the alpha and beta derived tables with an INNER JOIN using an ON clause based on the letter column values found in both alpha and beta CTEs.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
WITH alpha AS
 (SELECT 'A' AS letter, 130 AS amount
  UNION
  SELECT 'B' AS letter, 150 AS amount
  UNION
  SELECT 'C' AS letter, 321 AS amount
  UNION
  SELECT 'D' AS letter, 783 AS amount)
, beta AS
 (SELECT 'A' AS letter, 387 AS amount
  UNION
  SELECT 'B' AS letter, 268 AS amount
  UNION
  SELECT 'C' AS letter, 532 AS amount
  UNION
  SELECT 'E' AS letter, 391 AS amount)
SELECT * FROM alpha a INNER JOIN beta b
ON a.letter = b.letter;

The INNER JOIN returns only those rows in alpha and beta CTEs where the letter column values match:

+--------+--------+--------+--------+
| letter | amount | letter | amount |
+--------+--------+--------+--------+
| A      |    130 | A      |    387 |
| B      |    150 | B      |    268 |
| C      |    321 | C      |    532 |
+--------+--------+--------+--------+
3 rows in set (0.01 sec)

If you change line 17 from an INNER JOIN to a LEFT JOIN, you return all the rows from the alpha CTE and only those rows from the beta CTE that have a matching letter column value. The new line 17 for a LEFT JOIN is:

17
SELECT * FROM alpha a LEFT JOIN beta b

It returns the three matching rows plus the one non-matching row from the alpha CTE that is on the left side of the LEFT JOIN operator. You should note that that a left outer join puts null values into the beta CTE columns where there is no matching row for the 'D' letter found in the alpha CTE.

The results are shown below:

+--------+--------+--------+--------+
| letter | amount | letter | amount |
+--------+--------+--------+--------+
| A      |    130 | A      |    387 |
| B      |    150 | B      |    268 |
| C      |    321 | C      |    532 |
| D      |    783 | NULL   |   NULL |
+--------+--------+--------+--------+
4 rows in set (0.01 sec)

If you change line 17 from an LEFT JOIN to a RIGHT JOIN, you return all the rows from the beta CTE and only those rows from the alpha CTE that have a matching letter column value. The new line 17 for a RIGHT JOIN is:

17
SELECT * FROM alpha a RIGHT JOIN beta b

It returns the following result set:

+--------+--------+--------+--------+
| letter | amount | letter | amount |
+--------+--------+--------+--------+
| A      |    130 | A      |    387 |
| B      |    150 | B      |    268 |
| C      |    321 | C      |    532 |
| NULL   |   NULL | E      |    391 |
+--------+--------+--------+--------+
4 rows in set (0.00 sec)

MySQL does not support a FULL JOIN operation but you can mimic a full join by combining a LEFT JOIN and RIGHT JOIN with the UNION operator. The UNION operator performs a unique sort operation, which reduces the two copies of matching rows returned by both the left and right join operation to a unique set.

This is the way to write the equivalent of a full join:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
WITH alpha AS
 (SELECT 'A' AS letter, 130 AS amount
  UNION
  SELECT 'B' AS letter, 150 AS amount
  UNION
  SELECT 'C' AS letter, 321 AS amount
  UNION
  SELECT 'D' AS letter, 783 AS amount)
, beta AS
 (SELECT 'A' AS letter, 387 AS amount
  UNION
  SELECT 'B' AS letter, 268 AS amount
  UNION
  SELECT 'C' AS letter, 532 AS amount
  UNION
  SELECT 'E' AS letter, 391 AS amount)
SELECT * FROM alpha LEFT JOIN beta
ON alpha.letter = beta.letter
UNION
SELECT * FROM alpha right JOIN beta
ON alpha.letter = beta.letter;

It returns one copy of the matching rows, and the non-matching rows from both the alpha and beta CTEs:

+--------+--------+--------+--------+
| letter | amount | letter | amount |
+--------+--------+--------+--------+
| A      |    130 | A      |    387 |
| B      |    150 | B      |    268 |
| C      |    321 | C      |    532 |
| D      |    783 | NULL   |   NULL |
| NULL   |   NULL | E      |    391 |
+--------+--------+--------+--------+
5 rows in set (0.00 sec)

A NATURAL JOIN would return no rows because it works by implicitly discovering columns with matching names in both CTEs and then joins the result set from both CTEs. While the letter column matches rows between the CTEs the amount column doesn’t hold any matches. The combination of letter and amount columns must match for a NATURAL JOIN operation to return any rows.

You also have the ability to override the cost optimizer and force a left to right join by using the STRAIGHT_JOIN operator. As always, I hope this helps those looking for a solution with an explanation.

Written by maclochlainn

January 26th, 2021 at 10:55 pm

Posted in MySQL,MySQL 8,sql

Tagged with