Data Analysts & Data Scientists at Cboe Global Markets write SQL queries almost every single workday. They use SQL for analyzing real-time financial trading data and for managing their databases of market research and transaction histories. That's why Cboe Global Markets asks interviewees SQL interview problems.
So, to help you ace the Cboe Global Markets SQL interview, here’s 11 Cboe Global Markets SQL interview questions can you solve them?
Cboe Global Markets is one of the world's largest exchange holding companies, offering cutting-edge trading and investment solutions to investors around the world. For a data analyst role in such a company, one common task could be providing insights into trading volume. This SQL interview question is framed around the same task.
In a given dataset, you have trade information, including the trade date (trade_date), the traded security symbol (security_symbol), and trade volume (trade_volume). Your task is to write a SQL query using window functions to compute the average daily trading volume for each security over a rolling 7-day window, excluding the current day.
trade_id | trade_date | security_symbol | trade_volume |
---|---|---|---|
1001 | 2022-07-01 | AAPL | 1000 |
1002 | 2022-07-02 | AAPL | 1500 |
1003 | 2022-07-03 | AAPL | 2000 |
1004 | 2022-07-04 | AAPL | 2200 |
1005 | 2022-07-05 | AAPL | 1800 |
1006 | 2022-07-06 | AAPL | 2100 |
1007 | 2022-07-07 | AAPL | 2300 |
1008 | 2022-07-08 | AAPL | 1900 |
1009 | 2022-07-09 | AAPL | 2200 |
trade_date | security_symbol | avg_trade_volume_prev7days |
---|---|---|
2022-07-08 | AAPL | 1771.43 |
2022-07-09 | AAPL | 1928.57 |
This query computes the average trade volume over a rolling 7-day window for each security symbol. The window frame is defined as 7 rows (days) preceding the current row (excluding the current row itself) and the average is rounded off to 2 decimal places. Note that the query excludes the first 7 days of data since we wouldn't have a full 7-day window of prior data for those days.
For more window function practice, solve this Uber SQL problem within DataLemur's interactive coding environment:
Learn about CBOE globals data and access solutions!
Given a table of Cboe Global Markets employee salaries, write a SQL query to find the top 3 highest earning employees in each department.
employee_id | name | salary | department_id |
---|---|---|---|
1 | Emma Thompson | 3800 | 1 |
2 | Daniel Rodriguez | 2230 | 1 |
3 | Olivia Smith | 2000 | 1 |
4 | Noah Johnson | 6800 | 2 |
5 | Sophia Martinez | 1750 | 1 |
8 | William Davis | 6800 | 2 |
10 | James Anderson | 4000 | 1 |
department_id | department_name |
---|---|
1 | Data Analytics |
2 | Data Science |
department_name | name | salary |
---|---|---|
Data Analytics | James Anderson | 4000 |
Data Analytics | Emma Thompson | 3800 |
Data Analytics | Daniel Rodriguez | 2230 |
Data Science | Noah Johnson | 6800 |
Data Science | William Davis | 6800 |
Try this problem interactively on DataLemur:
We use the DENSE_RANK() window function to generate unique ranks for each employee's salary within their department, with higher salaries receiving lower ranks. Then, we wrap this up in a CTE and filter the employees with a ranking of 3 or lower.
If the code above is hard to understand, you can find a step-by-step solution here: Top 3 Department Salaries.
In database schema design, a one-to-one relationship is when each entity is associated with only one instance of the other. For instance, a US citizen's relationship with their social-security number (SSN) is one-to-one because each citizen can only have one SSN, and each SSN belongs to one person.
A one-to-many relationship, on the other hand, is when one entity can be associated with multiple instances of the other entity. An example of this is the relationship between a person and their email addresses - one person can have multiple email addresses, but each email address only belongs to one person.
As a analyst at Cboe Global Markets, you have been given a dataset containing all of the trades made in the past year. The dataset includes information such as trade id, trade date, trader id, trade volume and the security id.
Cboe Global Markets wants to perform an analysis to find out the trading volume of each security on a monthly basis. Write a SQL query to filter the trading records to show the total volume of each security, grouped by the month.
trade_id | trade_date | trader_id | volume | security_id |
---|---|---|---|---|
7980 | 2021-12-04 00:00:00 | 932 | 10000 | 82 |
4573 | 2022-01-11 00:00:00 | 459 | 5000 | 124 |
7853 | 2021-12-18 00:00:00 | 271 | 20000 | 549 |
3621 | 2022-02-20 00:00:00 | 839 | 7000 | 82 |
6471 | 2022-01-07 00:00:00 | 452 | 15000 | 435 |
Expected Output:
mth | security | total_volume |
---|---|---|
12 | 82 | 30000 |
12 | 549 | 20000 |
1 | 124 | 5000 |
1 | 435 | 15000 |
2 | 82 | 7000 |
This statement uses the PostgreSQL function to extract the month from the . And then it groups the results by the extracted month and to calculate the sum of for each group. The clause is used to order the results by and for better presentation.
Stored procedures in SQL are like recipes in a cookbook. Just like a recipe tells you the ingredients and instructions for making a particular dish, a stored procedure tells the DBMS the logic/statements needed to perform a specific task. Just like you can use a recipe to make the same dish over and over again, you can use a stored procedure to repeat the same task multiple times with different input parameters (which is why stored procedures are so damn useful!).
Say you were a Data Analyst working on a HR analytics project. A common sub-task you might have to do is calculate the average salary for a given department at Cboe Global Markets, which would be perfect for a stored procedure:
To call this stored procedure and find the average salary for the Data Science department you'd execute a query like this:
Given a database of stock trades made on the Cboe Global Markets exchange, write a SQL query to find the average daily trading volume for each stock over the past year.
Let's define the volume traded as the number of shares traded.
For an extra challenge, also find the stock with the highest average daily trading volume.
trade_id | trade_date | stock_id | shares_traded |
---|---|---|---|
0176 | 01/04/2022 | 1002 | 6600 |
0342 | 01/04/2022 | 1036 | 14000 |
0030 | 02/04/2022 | 1002 | 7000 |
0462 | 02/04/2022 | 1036 | 19000 |
0789 | 02/04/2022 | 1074 | 13500 |
0971 | 03/04/2022 | 1002 | 6600 |
0930 | 03/04/2022 | 1036 | 10500 |
0966 | 03/04/2022 | 1074 | 15200 |
stock_id | avg_daily_volume |
---|---|
1002 | 6733 |
1036 | 14500 |
1074 | 14350 |
This query calculates the average daily trading volume for each stock over the past year from the current date. The WHERE clause ensures we only consider trades from the previous year while the GROUP BY clause organizes the results by stock. Finally, the ORDER BY and LIMIT clauses allow us to find the stock with the highest average daily trading volume.
To practice a very similar question try this interactive Robinhood Cities With Completed Trades Question which is similar for analyzing trading data or this Microsoft Teams Power Users Question which is similar for <determining highest activity.
An index in a database is a data structure that helps to quickly find and access specific records in a table.
For example, if you had a database of Cboe Global Markets customers, you could create a primary index on the column.
Having a primary index on the column can speed up performance in several ways. For example, if you want to retrieve a specific customer record based on their , the database can use the primary index to quickly locate and retrieve the desired record. The primary index acts like a map, allowing the database to quickly find the location of the desired record without having to search through the entire table.
Additionally, a primary index can also be used to enforce the uniqueness of the column, ensuring that no duplicate values are inserted into the table. This can help to prevent errors and maintain the integrity of the data in the table.
Cboe Global Markets is a company that provides traders with the infrastructure to manage and execute trades. As an analyst at the company, you are tasked with understanding the user behaviors on the website. Specifically, you are asked to calculate the click-through rate, which is the proportion of users that click on specific links after viewing them.
For the purposes of this question, consider two tables: and . The table contains a timestamp column (), user_id, and link_id. The table has click_date, user_id and link_id.
view_date | user_id | link_id |
---|---|---|
2022-05-12 08:00:00 | 123 | 456 |
2022-05-12 09:00:00 | 123 | 789 |
2022-05-12 10:00:00 | 321 | 456 |
2022-05-12 11:00:00 | 321 | 789 |
click_date | user_id | link_id |
---|---|---|
2022-05-12 10:00:00 | 123 | 456 |
2022-05-12 11:00:00 | 123 | 789 |
Desired output is a single table that has the total views for each link, the total clicks for each link, and the click-through rate for each link (CT_rate = no. of clicks/ no. of views).
This code first aggregates the views and clicks into separate common table expressions (CTEs) by counting the unique user_ids for each action on each link. Then it joins these two CTEs on the basis of Link Id & computes the click-through rate after casting total clicks and total views into float to get a percentage representation.
To practice a similar problem about calculating rates, try this SQL interview question from TikTok within DataLemur's interactive SQL code editor:
As a data analyst at Cboe Global Markets, you're asked by a financial investigator to find all customer transactions whose Remarks field contains the word "Derivative" (not case-sensitive). The investigator is specifically interested in transactions made in the last quarter of 2021.
Please make use of the following database table - :
transaction_id | customer_id | transaction_date | amount | Remarks |
---|---|---|---|---|
8761 | 140 | 09/10/2021 00:00:00 | 98501 | "Derivative Contract Sold" |
9072 | 328 | 10/15/2021 00:00:00 | 104856 | "Cash Withdrawal" |
7073 | 414 | 11/26/2021 00:00:00 | 70205 | "Derivative Position Closed" |
6592 | 254 | 12/30/2021 00:00:00 | 69352 | "Dividend Paid" |
8557 | 322 | 12/31/2021 00:00:00 | 80420 | "Derivative Contract Bought" |
For the above SQL questions, write a PostgreSQL query to solve it.
You would write a query like this:
This query searches for any remarks that contain the term "Derivative" but is not case-sensitive due to the LOWER function. It ensures that we only select transactions that occurred from October to December in 2021.
Unique indexes help ensure that there are no duplicate key values in a table, maintaining data integrity. They enforce uniqueness whenever keys are added or changed within the index.
To define a unique index in PostgreSQL, you can use the following syntax:
To define a non-unique index in PostgreSQL, you can use the following syntax:
Non-unique indexes on the other hand, are used to improve query performance by maintaining a sorted order of frequently used data values, but they do not enforce constraints on the associated table.
For Cboe Global Markets, find all customers who have made orders worth more than $5000 in total. The aim is to identify high-value customers. Here, assumed data from two tables is being used: and .
Our table keeps record of all individual orders, along with their customer ID and total order value.
The table keeps customer information like customer ID, first name, and last name.
You should write a SQL query to return a list of full names of customers who've made orders worth more than $5000 in total.
order_id | customer_id | order_value |
---|---|---|
101 | 1 | 1200.00 |
102 | 2 | 3000.00 |
103 | 1 | 5000.00 |
104 | 3 | 1500.00 |
105 | 2 | 2500.00 |
customer_id | first_name | last_name |
---|---|---|
1 | John | Doe |
2 | Jane | Doe |
3 | Bob | Smith |
This query commences with indicating where our data is being pulled from. The statement then amalgamates our data from the table with the table, matching rows based on the in both tables.
The clause groups our selected columns by , , and , so we can apply aggregation functions to each group of customer orders.
The clause then filters out groups that don't meet the condition of having a total greater than 5000.
This will yield a table containing the full names of the customers along with their total order value, but only if their order totals are over $5000.
Because joins come up routinely during SQL interviews, try this Snapchat JOIN SQL interview question:
The best way to prepare for a SQL interview, besides making sure you have strong SQL fundamentals, is to practice a ton of real SQL questions that were asked in recent job interviews. Besides solving the above Cboe Global Markets SQL interview questions, you should also solve the 200+ FAANG SQL Questions on DataLemur which come from companies like Google, Uber, and Microsoft.
Each interview question has multiple hints, full answers and most importantly, there's an online SQL coding environment so you can right in the browser run your SQL query and have it graded.
To prep for the Cboe Global Markets SQL interview it is also useful to solve SQL problems from other stock exchange & brokerage companies like:
In case your SQL query skills are weak, don't worry about jumping right into solving questions – improve your SQL foundations with this SQL interview tutorial.
This tutorial covers SQL topics like rank window functions and handling NULLs in SQL – both of which come up often during SQL interviews at Cboe Global Markets.
In addition to SQL interview questions, the other types of problems to practice for the Cboe Global Markets Data Science Interview are:
I'm sorta biased, but I believe the optimal way to prepare for Cboe Global Markets Data Science interviews is to read the book I wrote: Ace the Data Science Interview.
The book solves 201 data interview questions sourced from tech companies like Netflix, Google, & Airbnb. It also has a refresher on Python, SQL & ML. And finally it's helped thousands of people land their dream job in data, which is why it's got over 1000+ 5-star reviews on Amazon.