The SQL command is used in conjunction with the statement to return only distinct (different) values. Here's an example DISTINCT SQL query used to find only unique names of pharmaceutical manufacturers:
Here's a comparison of the above query's output which uses the keyword, versus a query without the keyword:
As you can see, the keyword finds and returns only unique values in the "manufacturer" column, and removed all duplicate manufacturer names.
DISTINCT can be particularly helpful when exploring a new data set. In many real-world scenarios, you will generally end up writing several exploratory queries in order to figure out what data you have access too, and how you might want to group or filter the data.
If you include two (or more) columns in a SELECT DISTINCT clause, your results will contain all of the unique pairs of those two columns.
For example, imagine you worked at stock trading app Robinhood and had access to their trades dataset. Here's a SQL query that uses DISTINCT on two columns β user_id's and trade statuses:
Note: You only need to include DISTINCT once in your SELECT clauseβyou do not need to add it for each column name.
You can try this out yourself by running the with two columns query against data from a real Robinhood Data Scientist SQL Interview Question:
You can use with aggregate functions β the most common one being . Here's an example that finds the number of unique user's who made trades:
Here's that query in action:
Notice that goes inside the COUNT() aggregate function, rather at the beginning of the SELECT statement.
While you could use DISTINCT with SUM or AVG, in practice it's rare to want to just sum or average just the unique values. When it comes to MAX and MIN, they aren't affected by DISTINCT β whether there are duplicates or not, the lowest/highest value in the dataset will be the same.
Imagine you're given a table containing data on Amazon customers and their spending on products in different category. Write a query using to identify the number of unique products within each product category.
category | product | user_id | spend | transaction_date |
---|---|---|---|---|
appliance | refrigerator | 165 | 246.00 | 12/26/2021 12:00:00 |
appliance | refrigerator | 123 | 299.99 | 03/02/2022 12:00:00 |
appliance | washing machine | 123 | 219.80 | 03/02/2022 12:00:00 |
electronics | vacuum | 178 | 152.00 | 04/05/2022 12:00:00 |
electronics | wireless headset | 156 | 249.90 | 07/08/2022 12:00:00 |
category | count |
---|---|
appliance | 2 |
electronics | 2 |
So far, we've only been doing simple math, like , , covered in the aggregate functions tutorial.
Earlier, in the filtering data tutorial, we also covered simple comparison operators like (equals) and (less than or equal to).
In the next tutorial, we'll cover arithmetic, like , , etc.
Next Lesson
SQL ARITHMETIC π’