At CME Group, SQL is typically used for analyzing trading volume patterns and predicting market trends based trade-related data. Because of this, CME Group asks SQL problems in interviews for Data Science and Data Engineering positions.
Thus, to help you practice, we've curated 10 CME Group SQL interview questions – how many can you solve?
CME Group is a global markets company providing a wide range of trading services and operates the largest options and futures exchange in the world. They have a vast number of trading clients, some of whom trade more frequently and in larger quantities than others. For this SQL question, write a query to identify the top trading clients in the last month. These are considered as the 'whale' users for CME Group.
Assume you have access to a 'trades' table that records every trade made by all clients. Find the clients who have made the most number of trades and have the highest total trading volume.
trade_id | client_id | trade_date | trade_volume |
---|---|---|---|
1012 | 890 | 06/23/2022 00:00:00 | 50 |
0245 | 567 | 06/30/2022 00:00:00 | 150 |
3412 | 890 | 07/01/2022 00:00:00 | 100 |
7653 | 123 | 07/10/2022 00:00:00 | 200 |
9801 | 567 | 07/20/2022 00:00:00 | 350 |
This query groups the trades made by each client in the last month, counts the number of trades (trade_frequency), calculates the total trading volume for each client, and sorts the clients based on the two measures. The clients with the highest number of trades and the highest total trading volume are considered 'whale' users. It then outputs the top 10 whale users.
To practice a super-customer analysis question on DataLemur's free online SQL coding environment, try this Microsoft SQL Interview problem:
Suppose there was a table of CME Group employee salary data. Write a SQL query to find the 2nd highest salary at the company.
employee_id | salary |
---|---|
1 | 2500 |
2 | 800 |
3 | 1000 |
4 | 1200 |
second_highest_salary |
---|
1200 |
Test your SQL query for this interview question and run your code right in the browser:
You can find a step-by-step solution here: 2nd Highest Salary.
First things first, figure out why the query is slow! You can use and commands in PostgreSQL to identify any performance bottlenecks. You might discover that your query is inefficient, or that there are many database writes at the same time you are doing a read, or maybe too many people are concurrently running queries on the same database server.
For Data Analyst and Data Science positions, knowing the ins-and-outs of SQL performance tuning is out-of-scope for the SQL interview round. However, knowing that joins are expensive, and indexes can speed up queries, is generally enough of an answer for CME Group SQL interviews.
CME Group is a global markets company and for them, understanding their daily trading activity can be instrumental when it comes to reporting and decision-making.
Given a table which tracks each trade placed on the CME Group platform, write a SQL query that will calculate the average trading volume per day of each product for the month of January 2023.
The table has the following columns:
Note on the "average trading volume per day": it is defined as the of for each distinct day for a product divided by the number of distinct trading days.
trade_id | product_id | trade_volume | trade_date |
---|---|---|---|
86527 | 5087 | 300 | 01/05/2023 10:30:00 |
98465 | 8974 | 500 | 01/05/2023 14:20:00 |
54698 | 5087 | 400 | 01/06/2023 11:25:00 |
37892 | 8974 | 600 | 01/06/2023 15:10:00 |
60503 | 5087 | 200 | 01/07/2023 09:45:00 |
84659 | 8974 | 700 | 01/07/2023 13:05:00 |
product_id | average_trade_vol_per_day |
---|---|
5087 | 300.00 |
8974 | 600.00 |
The inner query picks all trades in the month of January 2023, groups them by product and date (ignoring time), and sums up the trade volumes for each of these groups. The outer query then averages these sums over the unique trading days to get the average trading volume per day for each product.
Pro Tip: Window functions are a frequent SQL interview topic, so practice all the window function problems on DataLemur
While both and are used to rank rows, the key difference is in how they deal with ties.
RANK(): When there's a tie, leaves a gap in the ranking. For example, if three rows are tied for 2nd place, the RANK() function will assign a rank of 2 to the first of these rows, a rank of 3 to the 2nd row in the tie, and a rank of 4 to the the 3rd tie.
DENSE_RANK(): For ties, does not leave a gap in the ranking. Instead, it assigns the same rank to all tied rows, and then makes the next row 1 bigger. Confusing, I know, but here's an example to make it more clear: if three rows are tied for 3rd place, the function will assign a rank of 3 to all three rows, and then assign a rank of 4 to the next row.
Suppose we had data on how many deals different salespeople at CME Group closed, and wanted to rank the salespeople.
The result of this query would be:
name | deals_closed | rank | dense_rank |
---|---|---|---|
Akash | 50 | 1 | 1 |
Brittany | 50 | 2 | 1 |
Carlos | 40 | 3 | 2 |
Dave | 40 | 4 | 3 |
Eve | 30 | 5 | 3 |
Frank | 20 | 6 | 4 |
As you can see, the function assigns a rank of 1 to the first row (Akash), a rank of 2 to the second row (Brittany), and so on. In contrast, the , gives both Akash and Brittany a rank of 1, since they are tied, then moves on to rank 2 for Carlos.
CME Group is a financial organization dealing with derivative instruments, including futures. These are traded on a daily basis with varying volumes and values. Your goal is to propose a database model for storing and query this daily trading data. Specifically, you should be able to answer questions like "Which futures contract had the highest traded volume on a specific date?".
In order to solve this problem, we propose the following database tables:
Then, for the above SQL questions, you would write a PostgreSQL query to solve it which would look something like below.
This query first calculates the maximum traded volume for each day per contract. Then it joins with the contracts table to fetch the contract names. Finally, it filters by the date to get the contract with the maximum traded volume for a specific date.
Views are a lot like virtual tables, where you can take a base table and customize it (such as by hiding some data from non-admin users, or removing some random columns/rows based on business requirements).
Here's the PostgreSQL syntax for creating a view based on data in the table:
CME Group is one of the world's leading and most diverse derivatives marketplaces, handling a majority of futures trading and options trading across the globe. Suppose we have a customer transactions table. The columns are as follows:
Write an SQL query to fetch all the transactions made by customer with customer_id = 567 in the year 2021, where the transaction type was 'Buy', and the transaction price was higher than 50000.
transaction_id | customer_id | transaction_date | product_id | transaction_type | price |
---|---|---|---|---|---|
1234 | 567 | 2021-02-14 | 55555 | 'B' | 70000 |
2345 | 888 | 2021-05-11 | 44444 | 'S' | 50000 |
3456 | 567 | 2021-06-22 | 55555 | 'B' | 60000 |
4567 | 567 | 2020-10-31 | 44444 | 'S' | 80000 |
5678 | 999 | 2021-12-14 | 33333 | 'B' | 100000 |
This query first filters the transactions which were made by the customer with id 567. Then it adds a condition to only include transactions that happened in the year 2021. Next, it checks that the transaction type was 'Buy'. Finally, it filters the transactions where the price was higher than 50000.
Given the table which records every trade occurrence in CME Group, Write an SQL query to find out the average trading price of each product category in each month. Additionally, Aggregate all trades by product category and month, and calculate the average trade price during that period.
trade_id | product_category | trading_date | trader_id | trading_price |
---|---|---|---|---|
2001 | Futures | 2022-06-01 | 1001 | 220.40 |
3342 | Options | 2022-06-03 | 1202 | 340.10 |
4003 | Futures | 2022-06-29 | 1053 | 230.20 |
5678 | Futures | 2022-07-11 | 1077 | 225.50 |
6011 | Options | 2022-07-15 | 1061 | 350.20 |
mth | product_category | avg_trading_price |
---|---|---|
6 | Futures | 225.30 |
6 | Options | 340.10 |
7 | Futures | 225.50 |
7 | Options | 350.20 |
This query sorts all trades into groups based on the month of the trade and the product category. For each of these groups, the query will return the average trade price by using the function. The statement extracts the month from the and names it for further use.
There are several normal forms that define the rules for normalizing a database:
A database is in first normal form (1NF) if it meets the following criteria:
A database is in second normal form (2NF) if it meets the following criteria:
Said another way, to achieve 2NF, besides following all the rules from 1NF all the columns in a given table should be dependent only on that table's primary key.
A database is in third normal form (3NF) if it meets the following criteria:
A transitive dependency means that a piece of data in one column is derived from another column. For example, it wouldn't make sense to keep a column called "user's age" and "user's birthdate" (because age can be derived from birthdate.
While there's also a 4th and 5th normal form, it's too pedantic and hence out-of-scope to know for the CME Group SQL interview.
Also check out the CME careers page!
The best way to prepare for a CME Group SQL interview is to practice, practice, practice. Besides solving the earlier CME Group SQL interview questions, you should also solve the 200+ DataLemur SQL Interview Questions which come from companies like Google, Uber, and Microsoft.
Each interview question has multiple hints, detailed solutions and best of all, there's an online SQL code editor so you can instantly run your SQL query answer and have it executed.
To prep for the CME Group SQL interview it is also useful to practice SQL questions from other stock exchange & brokerage companies like:
However, if your SQL skills are weak, forget about diving straight into solving questions – refresh your SQL knowledge with this interactive SQL tutorial.
This tutorial covers things like Subqueries and UNION vs. joins – both of which show up frequently in CME Group SQL interviews.
In addition to SQL interview questions, the other topics covered in the CME Group Data Science Interview are:
To prepare for CME Group Data Science interviews read the book Ace the Data Science Interview because it's got: