Inner vs. Outer Joins

Christian Emmer
Christian Emmer
Apr 29, 2022 · 16 min read
Inner vs. Outer Joins

They can't be used interchangeably, and the differences are important.

Across a few different companies now I've seen developers use outer joins, specifically left outer joins, as the default for how to join tables in a SQL query. I'll outline some dangers with that here.

The differences

Inner joins are used to find the commonality between two or more tables. When an inner join is used, results will only be returned when matches are found between the two tables. This is the default type of join for most databases and is the join that people will want to use the gross majority of the time.

Outer joins are used to optionally join two or more tables together. Depending on the direction used ("left" or "right"), one of the tables will have all of its results returned even if there are no matches in the other table. This has very specific use cases and should not be used by default.

Outer joins aren't commutative

Consider this example where a customer can have some orders associated with them:

customersintidPKstringnameordersintidPKintcustomer_idFKnumbertotalcan have many

customers: idname
1Isabel
2Roy
3Dora
orders: idcustomer_idtotal
12 (Roy)51.26
23 (Dora)27.20
33 (Dora)92.17

Isabel has no orders, Roy has one order, and Dora has two orders.

Consider this SQL to query all orders, complete with customer info. It makes use of a "left outer join":

SELECT o.id
     , c.name
     , o.total
FROM orders o
LEFT OUTER JOIN customers c ON o.customer_id = c.id;

You will get the 3 expected orders rows along with the associated customer's name:

idnametotal
1Roy51.26
2Dora27.20
3Dora92.17

But if you swap the order of those two tables you get different results:

SELECT o.id
     , c.name
     , o.total
FROM customers c
LEFT OUTER JOIN orders o ON o.customer_id = c.id;
idnametotal
1Roy51.26
2Dora27.20
3Dora92.17
(null)Isabel(null)

Why is that? It's because "outer" joins do not require matches between the two tables. The direction of the outer join ("left" or "right") determines which table will have all of its rows returned, and which table will have rows optionally matched.

In the last query above, the customers table is on the "left" and because we used a LEFT OUTER JOIN we will have every row from that table returned. The orders table is on the "right" so it will be optionally joined to customers.

We can construct a RIGHT OUTER JOIN that will return the same results, where the table specified on the "right" will have all of its rows returned:

SELECT o.id
     , c.name
     , o.total
FROM orders o
RIGHT OUTER JOIN customers c ON o.customer_id = c.id;
idnametotal
1Roy51.26
2Dora27.20
3Dora92.17
(null)Isabel(null)

Dangers with outer joins

Let's add a third table to our example to represent what reward tier a customer has earned:

customersintidPKstringnamecustomer_rewardsintidPKintcustomer_idFKintreward_levelordersintidPKintcustomer_idFKnumbertotalcan have onecan have many

customer_rewards: idcustomer_idreward_level
11 (Isabel)1
22 (Roy)1
33 (Dora)2

Let's say we want to find the number of orders created by VIP customers (those reward level 2 and above), the total revenue from those orders, and the average revenue per order - all using outer joins:

SELECT count(*)                AS order_count
     , sum(o.total)            AS total_revenue
     , sum(o.total) / count(*) AS average_revenue
FROM customers c
LEFT OUTER JOIN customer_rewards cr ON cr.customer_id = c.id
LEFT OUTER JOIN orders o ON o.customer_id = c.id
WHERE cr.reward_level >= 2;
order_counttotal_revenueaverage_revenue
2119.3759.685

That result looks right. Dora is the only VIP customer, and she has two orders.

Now let's try that query again for non-VIP customers:

SELECT count(*)                AS order_count
     , sum(o.total)            AS total_revenue
     , sum(o.total) / count(*) AS average_revenue
FROM customers c
LEFT OUTER JOIN customer_rewards cr ON cr.customer_id = c.id
LEFT OUTER JOIN orders o ON o.customer_id = c.id
WHERE cr.reward_level = 1;
order_counttotal_revenueaverage_revenue
251.2625.63

Something is wrong! Isabel and Roy are the two non-VIP customers, and they only have one order between the two of them.

It's because customers appears on the "left" of orders, so all customer rows will be returned regardless of the customer having any orders or not. If we change all joins to be "inner" then we get the expected result:

SELECT count(*)                AS order_count
     , sum(o.total)            AS total_revenue
     , sum(o.total) / count(*) AS average_revenue
FROM customers c
INNER JOIN customer_rewards cr ON cr.customer_id = c.id
INNER JOIN orders o ON o.customer_id = c.id
WHERE cr.reward_level = 1;
order_counttotal_revenueaverage_revenue
151.2651.26

Use cases for outer joins

When optional/nullable results are acceptable or expected.

There is no restriction in our database that every customer must have a reward level. We might have a use case for querying every customer in the database, also including their reward level, if it exists.

If we add a fourth customer Lloyd, but we don't give him a reward level, here would be our results:

SELECT c.id
     , c.name
     , cr.reward_level
FROM customers c
LEFT JOIN customer_rewards cr ON cr.customer_id = c.id;
idnamereward_level
1Isabel1
2Roy1
3Dora2
4Lloyd(null)

When a category/grouping is well-defined but related data is sparse.

If we add a new table countries which contains the five most populous countries in North America:

countries: idname
1United States
2Mexico
3Canada
4Guatemala
5Cuba

And assign each of our customers a country:

customers: idnamecountry_id
1Isabel2 (Mexico)
2Roy1 (United States)
3Dora3 (Canada)
4Lloyd1 (United States)

We can find how many customers we have in every country, even if there are none in the country:

SELECT co.name
     , count(cu.id)
FROM countries co
LEFT OUTER JOIN customers cu ON cu.country_id = co.id
GROUP BY 1
ORDER BY 2 DESC;
namecount
United States2
Canada1
Mexico1
Cuba0
Guatemala0

Join syntax for different databases

In both PostgreSQL and MySQL the default join type is "inner", and if you specify "left" or "right" it automatically implies "outer". Each of these pairs of queries are semantically the same:

SELECT * FROM customers c JOIN orders o ON o.customer_id = c.id;
SELECT * FROM customers c INNER JOIN orders o ON o.customer_id = c.id;

SELECT * FROM customers c LEFT JOIN orders o ON o.customer_id = c.id;
SELECT * FROM customers c LEFT OUTER JOIN orders o ON o.customer_id = c.id;

SELECT * FROM orders o RIGHT JOIN customers c ON o.customer_id = c.id;
SELECT * FROM orders o RIGHT OUTER JOIN customers c ON o.customer_id = c.id;

Summary

There are other types of joins such as "cross" joins and "full outer" joins that we didn't cover here, but hopefully this helps clarify when and where you would want to use an outer join over an inner join.