Back to questions
This is the same question as problem #2 in the SQL Chapter of Ace the Data Science Interview!
Assume you're given the tables containing completed trade orders and user details in a Robinhood trading system.
Write a query to retrieve the top three cities that have the highest number of completed trade orders listed in descending order. Output the city name and the corresponding number of completed trade orders.
|111||San Franciscofirstname.lastname@example.org||08/03/2021 12:00:00|
|178||San Franciscoemail@example.com||01/05/2022 12:00:00|
|300||San Franciscofirstname.lastname@example.org||06/30/2022 12:00:00|
In the given dataset, San Francisco has the highest number of completed trade orders with 3 orders. Boston holds the second position with 2 orders, and Denver ranks third with 1 order.
The dataset you are querying against may have different input & output - this is just an example!
We begin by joining the and tables based on the related column . This is because the 'Completed' order is stored in the table, while the cities are stored in the table.
In the statement, we pull the field from the table and the field from the table.
Output (showing the first 5 rows only):
Next, we filter the 'Completed' orders and retrieve the number of orders for each city using the aggregate function. We group the results by the column using the statement.
Did you notice that our output is grouped by the column?
Based on the results, San Francisco has the highest number of completed orders with 4 orders. Boston has the second-highest number of completed orders with 3 orders and Denver has the third-highest number of completed orders with 2 orders.