In SQL, indicates the absence of a value. Unlike an empty string or zero, doesn't represent a specific value, but rather, a missing or unknown piece of information.
Here's an overview of what you'll learn in this tutorial:
Handling NULL values is a common task in various real-world scenarios where data may be incomplete or missing. In the realm of databases, values serve as placeholders for uncertain or absent data.
Here are a few examples of real-world scenarios:
In essence, values in SQL are like flexible placeholders that cater to the uncertainties and variations in real-world data. They allow databases to accommodate missing or pending information without resorting to assumptions or default values.
To demonstrate the usage of SQL functions for handling values, we'll be working with an illustrative table named that contaings details about the best-selling books across diverse genres.
book_title | author | book_genre | year_released | country | publication | book_rating | number_of_reviews |
---|---|---|---|---|---|---|---|
Ace the Data Science Interview | Kevin Huo, Nick Singh | Non-Fiction | 2021 | United States | Ace the Data Science Interview | 4.5 | 850 |
Becoming | Michelle Obama | Biography | 2018 | United States | Crown | 4.7 | 11600 |
Blink: The Power of Thinking Without Thinking | Malcolm Gladwell | Non-Fiction | 2005 | United States | Little Brown | 4.2 | 8400 |
Crazy Rich Asians | Kevin Kwan | Romance | 2013 | China | Anchor | 3.8 | 8900 |
Imagine you're a data analyst at Goodreads and you're identifying records where the book titles are null to ensure data completeness and integrity.
The expected result would be an empty result set because there are no records in the table where the column is explicitly set to NULL.
While it might seem like the query is looking for records where the is blank, SQL doesn't work that way with blank values. The word in SQL doesn't mean empty like it does in everyday language. Instead, it means "unknown" or "missing". So, when SQL tries to compare something to using , it doesn't work like you might expect. In fact, it doesn't work at all! It's like trying to compare apples and orangesβthey're just not the same thing.
That's why SQL has special functions for handling null values which we'll go through below:
Now, to correctly identify values, we can utilize the condition:
This query retrieves all records where the column is indeed .
And, to retrieve rows with actual values (not ) in a column, utilize :
This query delivers rows where the column isn't NULL.
π‘ Fun Fact
Did you know that in SQL's sorting order, takes a special place as the smallest value? When we sort a column containing values, these rows ascend to the top of the result. This unique behavior can be both surprising and useful, impacting how query results are organized.
Let's solve a real Data Analyst SQL Interview Question from Tesla, where your given a table of car parts, and the date they were finished being assembled (if they were finished, else it's NULL).
part | finish_date | assembly_step |
---|---|---|
battery | 01/22/2022 00:00:00 | 1 |
battery | 02/22/2022 00:00:00 | 2 |
battery | 03/22/2022 00:00:00 | 3 |
bumper | 01/22/2022 00:00:00 | 1 |
bumper | 02/22/2022 00:00:00 | 2 |
bumper | NULL | 3 |
bumper | NULL | 4 |
Your asked to write a SQL query which checks for nulls to determine which car parts have begun the assembly process but are not yet finished:
The function takes multiple inputs and returns the first non-null value.
Syntax:
If is NULL, it returns the specified 'expression'. Otherwise, it returns the value of .
Let's consider the scenario with the table:
book_title | author | book_genre | year_released | book_rating |
---|---|---|---|---|
Ace the Data Science Interview | Kevin Huo, Nick Singh | Non-Fiction | 2021 | 4.5 |
Becoming | Michelle Obama | Biography | 2018 | 4.7 |
Blink: The Power of Thinking Without Thinking | Malcolm Gladwell | Non-Fiction | 2005 | NULL |
Crazy Rich Asians | Kevin Kwan | Romance | 2013 | 3.8 |
Data Engineering with Python | Paul Crickard and Andreas MΓΌller | Non-Fiction | 2019 | NULL |
Now, let's apply the function to the column:
Here's how the results look:
book_title | coalesced_rating |
---|---|
Ace the Data Science Interview | 4.5 |
Becoming | 4.7 |
Blink: The Power of Thinking Without Thinking | 0.0 |
Crazy Rich Asians | 3.8 |
Data Engineering with Python | 0.0 |
In this example, the function checks each value. If a value isn't NULL, then it's returned unchanged. However, if is NULL, then function replaces it with 0.
Suppose some of the book ratings are NULL, we can use the function to fill in the gaps with default values.
Syntax:
Consider the scenario using the table:
In this instance, the function evaluates each value, and if a value is present, it remains unchanged. However, if is , the function substitutes it with 0.
Here's the expected results:
book_title | rated_books |
---|---|
Ace the Data Science Interview | 4.5 |
Becoming | 4.7 |
Blink: The Power of Thinking Without Thinking | 0.0 |
Crazy Rich Asians | 3.8 |
Data Engineering with Python | 0.0 |
While both the and functions serve a similar purpose of handling values, there is a key difference between them.
function: Versatile for multiple arguments, it returns the first non-null value among them.
function: Handles two arguments, returning the second if the first is null; else, it returns the first.
Example
Suppose you have these values:
With , the result would be 5 (the first non-null value). However, if is NULL, then the result would be 10 (the first non-null value).
With , the result would be 5 (since is NULL, it returns the second argument, which is 5).
So, while they both can serve similar purposes,
What if we wanted re-code all s to some special value?
Like for an e-commerce company, if the number of sales was then treat it as 0? Or in a user-survey, where you ask people to rate statements from "strongly disagree" to "strongly agree" (likert scale), you could treat all s as "Neither Agree Nor Disagree"?
In these cases, to treat NULL values conditionally as something else, we can use the statement β covered in the next lesson!
Next Lesson
SQL CASE πΌ