In SQL, indicates the absence of a value. Unlike an empty string or zero, doesn't represent a specific value, but rather, a missing or unknown piece of information.
Handling NULL values is a common task in various real-world scenarios where data may be incomplete or missing. In the realm of databases, values serve as placeholders for uncertain or absent data.
Here are a few examples of real-world scenarios:
In essence, values in SQL are like flexible placeholders that cater to the uncertainties and variations in real-world data. They allow databases to accommodate missing or pending information without resorting to assumptions or default values.
To demonstrate the usage of SQL functions for handling values, we'll be working with an illustrative table named that contaings details about the best-selling books across diverse genres.
book_title | author | book_genre | year_released | country | publication | book_rating | number_of_reviews |
---|---|---|---|---|---|---|---|
Ace the Data Science Interview | Kevin Huo, Nick Singh | Non-Fiction | 2021 | United States | Ace the Data Science Interview | 4.5 | 850 |
Becoming | Michelle Obama | Biography | 2018 | United States | Crown | 4.7 | 11600 |
Blink: The Power of Thinking Without Thinking | Malcolm Gladwell | Non-Fiction | 2005 | United States | Little Brown | 4.2 | 8400 |
Crazy Rich Asians | Kevin Kwan | Romance | 2013 | China | Anchor | 3.8 | 8900 |
Consider this SQL query:
At first glance, it appears to retrieve rows with values in the column. However, there's a subtle but crucial misunderstanding here.
SQL's unique behavior defies straightforward comparisons. Even common equality checks like fails because doesn't equate to any value, not even itself. This holds true for , , , , or comparisons.
To correctly identify values, utilize the condition:
This query retrieves all records where the column is indeed .
To retrieve rows with actual values (not ) in a column, utilize :
This query delivers rows where the column isn't NULL.
π‘ Fun Fact
Did you know that in SQL's sorting order, takes a special place as the smallest value? When we sort a column containing values, these rows ascend to the top of the result. This unique behavior can be both surprising and useful, impacting how query results are organized.
Let's solve a real Data Analyst SQL Interview Question from Tesla, where your given a table of car parts, and the date they were finished being assembled (if they were finished, else it's NULL).
part | finish_date | assembly_step |
---|---|---|
battery | 01/22/2022 00:00:00 | 1 |
battery | 02/22/2022 00:00:00 | 2 |
battery | 03/22/2022 00:00:00 | 3 |
bumper | 01/22/2022 00:00:00 | 1 |
bumper | 02/22/2022 00:00:00 | 2 |
bumper | NULL | 3 |
bumper | NULL | 4 |
Your asked to write a SQL query which checks for nulls to determine which car parts have begun the assembly process but are not yet finished:
Imagine you're dealing with data where missing values (represented as NULL) can hinder your analysis. We can use the handy function. It takes multiple inputs and intelligently returns the first non-null value.
Let's consider the scenario with the table:
book_title | author | book_genre | year_released | book_rating |
---|---|---|---|---|
Ace the Data Science Interview | Kevin Huo, Nick Singh | Non-Fiction | 2021 | 4.5 |
Becoming | Michelle Obama | Biography | 2018 | 4.7 |
Blink: The Power of Thinking Without Thinking | Malcolm Gladwell | Non-Fiction | 2005 | NULL |
Crazy Rich Asians | Kevin Kwan | Romance | 2013 | 3.8 |
Data Engineering with Python | Paul Crickard and Andreas MΓΌller | Non-Fiction | 2019 | NULL |
Now, applying the function to the column:
Here's how the results look like:
book_title | coalesced_rating |
---|---|
Ace the Data Science Interview | 4.5 |
Becoming | 4.7 |
Blink: The Power of Thinking Without Thinking | 0.0 |
Crazy Rich Asians | 3.8 |
Data Engineering with Python | 0.0 |
In this example, the function adeptly steps in. It checks each value. If a value isn't null, it's returned unchanged. However, if happens to be null, the function gracefully swaps it with 0.
Imagine you're navigating through data containing gaps due to missing values, often represented as . Here's where the function comes into play, offering an elegant solution to address these gaps.
Consider the scenario using the table:
In this instance, the function shines. It inspects each value, and if a value is present, it remains unchanged. However, when is , the function seamlessly substitutes it with 0.
Now, let's see how this translates into actual results:
book_title | rated_books |
---|---|
Ace the Data Science Interview | 4.5 |
Becoming | 4.7 |
Blink: The Power of Thinking Without Thinking | 0.0 |
Crazy Rich Asians | 3.8 |
Data Engineering with Python | 0.0 |
In these results, you can witness how the function adeptly replaces the NULL values in the "book_rating" column with the value 0, creating a more uniform and coherent data representation.
While both the and functions serve a similar purpose of handling values, there is a key difference between them.
Function: Versatile for multiple arguments, it returns the first non-null value among them.
Function: Handles two arguments, returning the second if the first is null; else, it returns the first.
Example
Suppose you have these values:
With , the result would be 5 (the first non-null value). However, if is NULL, then the result would be 10 (the first non-null value).
With , the result would be 5 (since is NULL, it returns the second argument, which is 5).
So, while they both can serve similar purposes,
What if we wanted re-code all s to some special value?
Like for an e-commerce company, if the number of sales was then treat it as 0? Or in a user-survey, where you ask people to rate statements from "strongly disagree" to "strongly agree" (likert scale), you could treat all s as "Neither Agree Nor Disagree"?
In these cases, to treat NULL values conditionally as something else, we can use the statement β covered in the next lesson!
Next Lesson
SQL CASE πΌ