logo

SQL GROUP BY Tutorial With Examples & Practice Exercises

The SQL aggregate functions like , , and from the previous lesson process an entire column to come up with an answer. But what if you want to aggregate only part of a table?

For example, instead of finding the average price all Amazon products orders, what if you want to find the average spent by category (i.e. average spent on appliances vs. electronics?).

For Excel users, the"pivot table" comes to mind, and the SQL equivalent to that concept is the command which tells the database to separate your data into groups, which can then be aggregated independently.

Here's a example that groups Amazon product sales by category:


The output of that query yields this result:

categorysum
electronics1007.54
appliance1135.22

Try playing around with yourself by copy-pasting the earlier code and running it against the data in this Amazon SQL interview question: Amazon SQL Interview GROUP BY Example

GROUP BY vs. Excel Pivot Table

Both in SQL and Pivot Tables in Excel are tools designed to aggregate and summarize data. To demonstrate their similarities, let's first create a Pivot Table for a data set of FAANG stocks.

This pivot table finds the average open price for each stock, and the max close price for each stock: Pivot Table in Google Sheets Stock Price Example

You can notice that the "rows" dialog box in the spreadsheet tool is essentially asking you what column you want to group by ("ticker"), and the values box is asking you what column values you want to display ("open", "close"), and how they'll be aggregated (average, max).

This is the equivalent to the SQL query:


Notice how the SQL query output matches up to the pivot table! GROUP BY versus Excel Pivot Table Example

GROUP BY Two Columns

You can GROUP BY two columns, even multiple columns, in SQL. To group by multiple categories, just separate column names with commas (just like the syntax in the !).

For an example of grouping by multiple columns, here's the average stock open price grouped by each ticker symbol and group by each year:


Here's a sample of that output:

tickeryearavg
NFLX2023364.9100000000000000
META2023220.3487500000000000
AMZN2023109.2712500000000000
MSFT2023291.9550000000000000
GOOG2023108.0987500000000000
AAPL2023165.2937500000000000
MSFT2022276.7833333333333333
AAPL2022157.6975000000000000
META2022193.0625000000000000
.........

And again, just like with syntax for multiple columns, for shorthand, you can use numbers instead of typing out the full column names, to make your query more concise. Here's the exact same query in the format:


GROUP BY Practice Exercise #1

Given FAANG stock prices data, can you write a SQL query which uses GROUP BY to find the lowest price each stock ever opened at? Order your results by price, in descending order. Your output should look like this:

tickermin
NFLX176.49
MSFT153.00
META94.33
AMZN85.46
AAPL61.63
GOOG56.10

Click the below to practice this GROUP BY exercise question in the DataLemur interactive SQL editor:


Can GROUP BY be used to remove duplicates?

While isn't exactly used for the purpose of finding duplicates, the command does allow you to collapse multiple rows with the same values into a single row. So, in a way, it could be considered as a way to get rid of duplicates for certain columns.

To demonstrate this, imagine you were analyzing this website traffic data from a Google Analytics report:

browservisits
Chrome6
Safari2
Safari4
Edge3
Safari5
Chrome3
Chrome4

To get rid of duplicate browser data, and aggregate the visits information, you could use the following GROUP BY query:


This would eliminate the duplicate browser information, and output the following result:

browservisits
Chrome13
Safari11
Edge3

GROUP BY Practice Exercise #2

Suppose you are given a table of Data Science candidates, and their technical skills:

Sample Input:

candidate_idskill
123Python
234R
234Python
234SQL Server
345Python
......

How many candidates possess each of the different skills?

Sort your answers based on the count of candidates, from highest to lowest. Here's what the expected output looks like:

skillcount
Python3
R1
SQL Server1

Click below to solve this GROUP BY practice question in the DataLemur interactive SQL editor:


What's the difference between GROUP BY and ORDER BY?

People sometimes get confused between and because both commands have in them, but their actual function is quite different!

ORDER BY helps you output your rows in a specific order, such as alphabetically on some text column, or from smallest to biggest, for some text column.

, as you saw earlier, is all about grouping your data into categories! Because they are quite different commands, it's absolutely possible for a query to have both GROUP BY and ORDER BY, with GROUP BY coming first!

Here's a query that uses both GROUP BY and ORDER BY:


It takes the FAANG stock data, and finds the average stock open price for each ticker symbol. Then, the clause orders the average stock open price, from highest to lowest: GROUP BY vs. ORDER BY SQL Query Example

Next Lesson: GROUP BY with HAVING

Your now ready for the next lesson on HAVING which is 99.9% of the time used in conjunction with the GROUP BY clause. Curious about this dynamic duo?

Click on over to the next tutorial!


Next Lesson

SQL HAVING 🀏