Now that you've thoroughly learned how to filter data in SQL, it's time to learn how to sort data in SQL with !
Theoretically, rows in a relational database such as Postgres, MySQL, SQL Server, etc. aren't stored in any particular or guaranteed sort order. In fact, executing an identical query twice on the same dataset doesn't ensure the results will come back in the same order.
This is quite different than how a spreadsheet application like Excel or Google Sheets operates, so don't say we didn't warn you! Fear not though, because SQL has the query where you explicitly specify the sort order for the resulting rows.
In SQL, the clause allows you to reorder your results based on the data in one or more columns.
For example, imagine you had a table of medicines, and how well they sold at CVS Pharmacy. Running would result in this un-ordered mess:
product_id | drug | units_sold |
---|---|---|
156 | Acyclovir | 89514 |
41 | Clarithromycin | 189925 |
63 | Pepcid AC Acid Reducer | 93513 |
148 | Motrin | 104637 |
9 | Zyprexa | 37410 |
54 | Diclofenac Sodium | 68593 |
We could sort this alphabetically based on the drug's name, using :
This query would yield the following alphabetically sorted result:
product_id | drug | units_sold |
---|---|---|
156 | Acyclovir | 89514 |
41 | Clarithromycin | 189925 |
54 | Diclofenac Sodium | 68593 |
148 | Motrin | 104637 |
63 | Pepcid AC Acid Reducer | 93513 |
9 | Zyprexa | 37410 |
Try this out yourself, by running some sorting queries on this dataset from a real CVS Pharmacy Interview question:
We can also sort in the reverse direction, which we'll cover in the next section.
By default, in SQL sorts the resulting rows in ascending () order. For text data, this means alphabetically, from A to Z. For numerical data, it goes from smallest (or most negative) numbers first, with the biggest number last. In the next section, we'll show you how to change that!
To reverse the default ascending sort order for , you can use the SQL keyword:
Adding to the end of the clause explicitly re-orders the records in descending order, with the biggest numbers coming first, and smallest (or most negative) numbers coming last. For text data, this makes it sort in reverse-alphabetical order.
Let's dive into a real-world use case for . Suppose you worked as a Data Analyst at CVS pharmacy, and wanted to order the pharmacy sales data, so that the best-selling medicines came first. You would use keywords as follows:
This would yield the following ordered result, in descending order based on the number of units each medicine sold:
product_id | drug | units_sold |
---|---|---|
41 | Clarithromycin | 189925 |
148 | Motrin | 104637 |
63 | Pepcid AC Acid Reducer | 93513 |
156 | Acyclovir | 89514 |
54 | Diclofenac Sodium | 68593 |
9 | Zyprexa | 37410 |
Now it's your turn β try running some queries yourself on this CVS Pharmacy sales dataset.
The clause doesn't just work with one column β you can sort on two, or even multiple columns! To do multi-column sort, simply add the name of the column by which youβd like to sort records first, add a comma, and then put the name of the next column(s). Here's the SQL syntax:
Let's jump into a real-world healthcare data example to see why multi-column sort is so useful!
Suppose you were a Data Scientist working at UnitedHealth, and had access to data generated from people calling the companies telephone support number. Here's what a sample of the callers data looks like:
policy_holder_id | call_category | call_received |
---|---|---|
52481621 | NULL | 01/17/2022 19:37:00 |
51435044 | n/a | 01/18/2022 02:46:00 |
52082925 | benefits | 01/18/2022 03:01:00 |
54624612 | IT_support | 01/19/2022 00:27:00 |
54624612 | claims | 01/19/2022 06:33:00 |
54624612 | benefits | 01/19/2022 09:39:00 |
Suppose you needed to sort this data with the most recent phone calls listed first. However, you also wanted to have all phone calls from the same person (policy_holder_id) grouped together. Here's how you'd write the two column SQL query:
This would give the following results, successfully ordered by the time the phone call was received, with each policy holder's information grouped together.
Try out multi-column ordering yourself by running a query against the UnitedHealth support call data.
Pro tip: you can substitute numbers for column names in the ORDER BY clause. The numbers correspond to the columns you specify in the SELECT clause.
For example, the query below is exactly the same as the previous two-column sort SQL query example:
1 maps to the column because it's 1st in the SELECT statement, and 3 represents because it's the 3rd column named in the SELECT query.
While we're on the topic of picking how we display our output, we might as well talk about limiting output too using the SQL keyword . Here's an example:
The above query will get you the 5 most recent phone calls received. We often use LIMIT in conjunction with ORDER BY, because frequently, we're looking for the top X things, like the top 3 highest-grossing Amazon products, or the top 5 most-streamed artists on Spotify.
Additionally, we can control the results returned by specifying an offset using the SQL keyword .
In the next example, we skip the first 10 phone calls and fetch the subsequent 5 phone calls received. This means that we're disregarding the first 10 rows of the sorted result, effectively starting our selection from the 11th row onwards.
and are the last things we're teaching you in the Basic SQL tutorial for Data Science & Analytics β congrats on getting this far!
Curious about what's next? Hop on over to the Intermediate SQL Tutorial where we'll cover more advanced commands, and start to work on problem-solving strategies so you can ace FAANG SQL interview questions!
Next Lesson
INTERMEDIATE SQL π