SQL Self-Joins with Examples

Hold up, another join?! 😵‍💫

Yes, you guessed it right! After learning about the different joins in this tutorial, now it's time to learn self-joins!

Self-joins might sound complex, but they're not. It's just like a table joining hands with itself. Think of it as a table of Amazon transactions joining with another table of Amazon transactions. Don't worry, it's simpler than it seems! 😉

In this tutorial, we'll explore self-joins using the goodreads dataset in this Goodreads Books SQL question.

Self-Joins Example: Personalizing Book Suggestions

Imagine you're part of the Goodreads Books team and you're designing a book recommendation system that provides users with personalized book suggestions tailored to their preferences. In other words, if someone loves romance, you want to give them more romantic options!

Finding Similar Books Within the Same Genre

Here's the query that generates meaningful book recommendations:

SELECT b1.genre, b1.book_title AS current_book, b2.book_title AS suggested_book FROM goodreads AS b1 INNER JOIN goodreads AS b2 ON b1.genre = b2.genre WHERE b1.book_id != b2.book_id ORDER BY b1.book_title;

Here's the breakdown:

  1. We're using a self-join on the genre column to find books of the same genre.
  2. The WHERE clause ensures that we don't suggest the same book as itself.
  3. The result will show their genres, the book they're reading, and the suggested book from the same genre.
current_genrecurrent_booksuggested_book
Non-FictionAce the Data Science InterviewData Engineering Cookbook
Non-FictionAce the Data Science InterviewBuilding Data-Intensive Apps with Flask and SQLAlchemy
Non-FictionAce the Data Science InterviewBlink: The Power of Thinking Without Thinking
Non-FictionAce the Data Science InterviewThe Power of Habit
Non-FictionAce the Data Science InterviewStorytelling with Data: A Data Visualization Guide for Business Professionals

Cool, right? 🤓 You're suggesting awesome books to readers based on their genre preferences!

Expanding the Suggestions: A Second Book

What if you want to suggest a second book? Well, just like that, we'll add another twist:

Taking Book Recommendations to the Next Level

‼️ Hold on, though! Be cautious when dealing with large tables. Joining a table to itself can create a lot of matches, which might slow things down. In this case, we're talking about 2,886 rows, so we've got a trick up our sleeve—a LIMIT clause:

SELECT b1.genre, b1.book_title AS current_book, b2.book_title AS suggested_book_1, b3.book_title AS suggested_book_2 FROM goodreads AS b1 INNER JOIN goodreads AS b2 ON b1.genre = b2.genre INNER JOIN goodreads as b3 ON b1.genre = b3.genre WHERE b1.book_id != b2.book_id AND b1.book_id != b3.book_id AND b2.book_id != b3.book_id ORDER BY b1.book_title, LIMIT 50;

Here's what's happening:

  1. We've levelled up with 2 additional self-joins (INNER JOIN goodreads b3 ON b1.genre = b3.genre) to generate a 2nd suggested book.
  2. The conditions (b1.book_id != b3.book_id and b2.book_id != b3.book_id) ensure that the 2nd suggestion is different from both the 1st suggestion and the current book.
current_genrecurrent_booksuggested_book_1suggested_book_2
Non-FictionAce the Data Science InterviewPython for Data AnalysisDesigning Data-Intensive Applications
Non-FictionAce the Data Science InterviewPython for Data AnalysisData Engineering with Python
Non-FictionAce the Data Science InterviewPython for Data AnalysisFundamentals of Data Engineering
Non-FictionAce the Data Science InterviewPython for Data AnalysisEducated: A Memoir
Non-FictionAce the Data Science InterviewPython for Data AnalysisData Science for Business

And there you have it! Self-joins in action—making book recommendations exciting and personal.

FAANG Self-Join SQL Interview Question

Now you're ready to test your knowledge of Self-Joins with this classic FAANG SQL interview question: Well-Paid Employees.

Suppose you had a table of salary data, where each row has the ID of an employee, their name, salary, department, and their manager's ID. Write a SQL query to identify all employees who earn more than their direct managers.

employee Example Input:
employee_idnamesalarydepartment_idmanager_id
1Emma Thompson380016
2Daniel Rodriguez223017
3Olivia Smith700018
4Noah Johnson680029
5Sophia Martinez1750111
6Liam Brown130003NULL
7Ava Garcia125003NULL
8William Davis68002NULL
Example Output:
employee_idemployee_name
3Olivia Smith

The output shows that Olivia Smith earns $7,000, surpassing her manager, William David who earns $6,800. Try solving this problem:

Employees Earning More Than Their Manager

What's Next: JOINING TABLES (WITHOUT JOINS)

Often joining two tables is synonymous with combining two tables. But did you know there is another way to combine tables WITHOUT USING A JOIN?

In the next tutorial, we'll think about joining tables VERTICALLY using UNION, instead of HORIZONTALLY (which is how the joins we've covered so far work).

SQL JOIN vs. UNION

Spatially challenged, and confused by what we mean by vertical vs. horizontal? That's a good sign to dive into the next tutorial!

SQL UNION, INTERCEPT, EXCEPT


Next Lesson

SQL UNION 🙏