The statement in SQL allows you to shape, transform, manipulate, and filter data based on specified conditions. It's a conditional expression tool that lets you customize query results, create new categories, and apply conditional logic.
Here's what we'll be covering:
The statement in the statement is used to create new columns, categorize data, or perform calculations based on specified conditions. It helps tailor the output of your query to meet specific requirements.
The statement evaluates the specified conditions sequentially and returns a value when it encounters the first true condition. If none of the conditions are met, the clause provides a default value.
The statement in the clause is used to filter rows based on specified conditions within the dataset.
The statement evaluates conditions for each row, determining whether it meets the filtering criteria. Rows satisfying the conditions specified in the statement are included in the result.
In this tutorial, we'll explore different ways of using the statement with the dataset. This dataset contains information about Marvel Avengers actors and their associated social media presence.
Example:
actor | character | superhero_alias | platform | followers | posts | engagement_rate | avg_likes | avg_comments |
---|---|---|---|---|---|---|---|---|
Robert Downey Jr. | Tony Stark | Iron Man | 500000 | 200 | 8.20 | 12000 | 800 | |
Chris Evans | Steve Rogers | Captain America | 300000 | 150 | 6.50 | 8000 | 500 | |
Scarlett Johansson | Natasha Romanoff | Black Widow | 700000 | 300 | 7.80 | 15000 | 1000 | |
Chris Hemsworth | Thor | Thor | YouTube | 400000 | 100 | 9.10 | 20000 | 1200 |
Mark Ruffalo | Bruce Banner | Hulk | 200000 | 80 | 5.30 | 6000 | 400 |
Here's an example that determines the popularity category of each Marvel character based on their number of followers ("Popular" if they have 500,000 or more followers).
Here's the output:
character | superhero_alias | platform | popularity_category |
---|---|---|---|
Tony Stark | Iron Man | Popular | |
Steve Rogers | Captain America | NULL | |
Natasha Romanoff | Black Widow | Popular | |
Thor | Thor | YouTube | NULL |
Bruce Banner | Hulk | NULL |
We can create groups using multiple conditions in the statement. For example, we can categorize the Marvel actors based on their social media engagement rates as follows:
Result
actor | character | engagement_category |
---|---|---|
Robert Downey Jr. | Tony Stark | High Engagement |
Scarlett Johansson | Natasha Romanoff | Moderate Engagement |
Jeremy Renner | Clint Barton | Moderate Engagement |
Tom Hiddleston | Loki | High Engagement |
Zoe Saldana | Gamora | Low Engagement |
When none of the previous conditions match, we can use the clause to provide a default value.
For the Marvel Avengers characters, let's classify their into three categories based on their number of followers:
Result
character | superhero_alias | platform | popularity_category |
---|---|---|---|
Tony Stark | Iron Man | Moderately Popular | |
Steve Rogers | Captain America | Moderately Popular | |
Natasha Romanoff | Black Widow | Highly Popular | |
Thor | Thor | YouTube | Moderately Popular |
Bruce Banner | Hulk | Less Popular |
Now, it's your turn to practice writing a SQL query in an exercise which uses and statements.
Analyze the marvel actors dataset, and categorize them based on the average number of likes:
Suppose we want to filter the dataset based on the social media platforms, but we want to include an option to filter based on different criteria for each platform. We'll use the statement in the clause to achieve this.
Here's the output:
actor | character | platform |
---|---|---|
Robert Downey Jr. | Tony Stark | |
Scarlett Johansson | Natasha Romanoff | |
Chris Hemsworth | Thor | YouTube |
Mark Ruffalo | Bruce Banner |
The aggregate function within a statement is used to count occurrences based on various conditions within the dataset. If you need a refresher, check out our SQL Aggregate Functions tutorial with COUNT(), SUM() and AVG()!
Assume we're given the following conditions:
Results shown:
platform | popular_count | less_popular_count |
---|---|---|
2 | 0 | |
0 | 2 | |
YouTube | 1 | 0 |
The aggregate function within a statement allows us to add values based on specific conditions within the dataset, providing a cumulative value.
In the example below, we're adding up the followers of actors based on their engagement rates grouped by platform:
Here's the results:
platform | high_engagement_followers_sum | low_engagement_followers_sum |
---|---|---|
1200000 | 0 | |
0 | 500000 | |
YouTube | 400000 | 0 |
Let's use the statement, along with function to solve a real SQL Interview Question from NYTimes. In this SQL assesment, you're given a table on user viewership categorized by device type where the three types are laptop, tablet, and phone.
Example Input:
user_id | device_type | view_time |
---|---|---|
123 | tablet | 01/02/2022 00:00:00 |
125 | laptop | 01/07/2022 00:00:00 |
128 | laptop | 02/09/2022 00:00:00 |
129 | phone | 02/09/2022 00:00:00 |
145 | tablet | 02/24/2022 00:00:00 |
Write a query that calculates the total viewership for laptops and mobile devices where mobile is defined as the sum of tablet and phone viewership. Output the total viewership for laptops as and the total viewership for mobile devices as .
Example Output:
laptop_views | mobile_views |
---|---|
2 | 3 |
Based on the example output, can you give the question a go?
The aggregate function within a statement calculates averages based on specific conditions within the dataset, providing an average value.
Let's say we're calculating the average number of followers of actors based on their engagement rates. If the engagement rate is 8.0 or higher, assign to "high_engagement_followers", otherwise, assign to "low_engagement_followers" for each platform.
Output shown:
platform | avg_high_engagement_followers | avg_low_engagement_followers |
---|---|---|
600000 | NULL | |
NULL | 250000 | |
YouTube | 400000 | NULL |
Up until now, we've focused on analyzing a single table of data at a time. Whether it be the , , or datasets, we've only been applying all these fancy arithmetic operators, hunting down nulls, and applying to data that come from a single table.
It's time to change it up, and start analyzing data from TWO tables using the keyword!
Next Lesson
SQL JOINS π€