At Morgan Stanley, SQL is used day-to-day for analyzing financial market trends and optimizing asset management strategies. For this reason Morgan Stanley almost always evaluates jobseekers on SQL questions during interviews for Data Science, Analytics, and & Data Engineering jobs.
Thus, to help you prepare for the Morgan Stanley SQL interview, here’s 11 Morgan Stanley SQL interview questions can you solve them?
Morgan Stanley utilizes a trading platform with a backend database that retains customer trade information. The business wants to identify its most active and valuable customers - these are the clients who make frequent trades and invest large amounts of money.
In this scenario, you need to write a SQL query that analyzes the "trades" table and identifies, for each month, the top 5 customers who have the highest total trade amounts.
Here is a mockup of this scenario:
trade_id | customer_id | trade_date | trade_amount |
---|---|---|---|
1001 | 123 | 06/10/2022 | 10000 |
1002 | 456 | 06/11/2022 | 20000 |
1003 | 789 | 06/12/2022 | 30000 |
1004 | 123 | 07/10/2022 | 10000 |
1005 | 456 | 07/12/2022 | 15000 |
1006 | 789 | 07/12/2022 | 35000 |
1007 | 123 | 07/15/2022 | 30000 |
month | customer_id | total_trade_amount |
---|---|---|
6 | 789 | 30000 |
6 | 456 | 20000 |
6 | 123 | 10000 |
7 | 789 | 35000 |
7 | 123 | 40000 |
7 | 456 | 15000 |
This query first groups the trades by month and customer id, then it calculates the total trade amount for each group. It orders the result by the total trade amount in descending order to have the customers with highest total trade amount at the top. The LIMIT clause is used to restrict the output to the top 5 customers for each month.
To practice a similar customer analytics SQL question where you can code right in the browser and have your SQL query instantly graded, try this Walmart SQL Interview Question:
Given a table of Morgan Stanley employee salary data, write a SQL query to find the top 3 highest paid employees in each department.
employee_id | name | salary | department_id |
---|---|---|---|
1 | Emma Thompson | 3800 | 1 |
2 | Daniel Rodriguez | 2230 | 1 |
3 | Olivia Smith | 2000 | 1 |
4 | Noah Johnson | 6800 | 2 |
5 | Sophia Martinez | 1750 | 1 |
8 | William Davis | 6800 | 2 |
10 | James Anderson | 4000 | 1 |
department_id | department_name |
---|---|
1 | Data Analytics |
2 | Data Science |
department_name | name | salary |
---|---|---|
Data Analytics | James Anderson | 4000 |
Data Analytics | Emma Thompson | 3800 |
Data Analytics | Daniel Rodriguez | 2230 |
Data Science | Noah Johnson | 6800 |
Data Science | William Davis | 6800 |
Solve this problem interactively on DataLemur:
We use the DENSE_RANK() window function to generate unique ranks for each employee's salary within their department, with higher salaries receiving lower ranks. Then, we wrap this up in a CTE and filter the employees with a ranking of 3 or lower.
If the solution above is tough, you can find a detailed solution with hints here: Top 3 Department Salaries.
To better understand the difference between a primary key and a foreign key, let's use an example from Morgan Stanley's marketing analytics database, which holds data on Google Ads campaigns:
:
+------------+------------+------------+------------+ | ad_id | campaign_id| keyword | click_count| +------------+------------+------------+------------+ | 1 | 100 | Morgan Stanley pricing | 10 | | 2 | 100 | Morgan Stanley reviews | 15 | | 3 | 101 | Morgan Stanley alternatives | 7 | | 4 | 101 | buy Morgan Stanley | 12 | +------------+------------+------------+------------+
In this table, serves as the primary key. It uniquely identifies each ad and cannot be null.
is a foreign key that connects to the of the corresponding Google Ads campaign. This establishes a relationship between the ads and their campaigns, enabling easy querying to find which ads belong to a specific campaign or which campaigns a specific ad belongs to.
The table may also have multiple foreign keys that reference primary keys in other tables. For example, and foreign keys could be used to link each ad to its ad group and the Google Ads account that the campaigns belong to, respectively.
As a reputable multinational investment bank and financial services company, Morgan Stanley relies on data to make crucial business decisions. This SQL window function question involves analyzing a hypothetical dataset on transactions executed by traders at Morgan Stanley.
Suppose you have a table named , with details of every trader's transactions. Each row represents a transaction, and the columns are (unique ID for each trade), (unique ID for each trader), (total trade amount), and (date of the transaction).
Your task is to write a SQL query to find out the maximum trade amount for each trader for each year, along with the date on which that maximum trade occurred.
Example Input:
trade_id | trader_id | trade_amount | trade_date |
---|---|---|---|
1001 | 12501 | 100000 | 2018-06-12 |
1002 | 12501 | 200000 | 2018-07-15 |
1010 | 12502 | 50000 | 2018-06-18 |
1011 | 12502 | 45000 | 2018-08-20 |
1012 | 12501 | 250000 | 2019-06-15 |
1013 | 12502 | 80000 | 2019-05-10 |
Example Output:
trader_id | year | max_trade_amount | trade_date |
---|---|---|---|
12501 | 2018 | 200000 | 2018-07-15 |
12502 | 2018 | 50000 | 2018-06-18 |
12501 | 2019 | 250000 | 2019-06-15 |
12502 | 2019 | 80000 | 2019-05-10 |
In this question, we first created a Common Table Expression which extracts the year of transaction and groups the transactions based on and year of transaction. For each group, we found out the maximum trade value.
We then join this table with the original table to fetch the corresponding date for the maximum trade amount and to handle cases where a trader might have different transactions with the same maximum value.
To solve a similar window function interview problem which uses RANK() on DataLemur's free online SQL coding environment, try this Amazon BI Engineer interview question:
A self-join is a type of join in which a table is joined to itself. To perform a self-join, you need to specify the table name twice in the FROM clause, and give each instance of the table a different alias. You can then join the two instances of the table using a JOIN clause, and use a WHERE clause to specify the relationship between the rows.
Self-joins are the go-to technique for any data analysis that involves pairs of the same thing, like identifying pairs of products that are frequently purchased together like in this Walmart SQL interview question.
For another example, say you were doing an HR analytics project and needed to analyze how much all Morgan Stanley employees in the same department interact with each other. Here's a self-join query you could use to retrieve all pairs of Morgan Stanley employees who work in the same department:
This query returns all pairs of Morgan Stanley employees who work in the same department, and excludes pairs where the employee's id is the same (since this would represent the same Morgan Stanley employee being paired with themselves).
Morgan Stanley, being one of the world's top global financial services firms, conducts millions of trades every day. Given the scale of transactions being processed, there is a need for robust data design and management, which will allow effective transactional analysis.
Consider a scenario where you are a Data Analyst at Morgan Stanley and tasked with analyzing daily trade volume. You have two different tables: and . The table holds information about each individual trade, including the type of trade, the date it took place, and the volume. The table holds information about different types of trades.
Trades:
trade_id | trade_type_id | trade_date | volume |
---|---|---|---|
1821 | 1 | 2022-07-01 | 43000 |
4935 | 2 | 2022-07-01 | 38500 |
3492 | 1 | 2022-07-02 | 25000 |
3753 | 2 | 2022-07-02 | 28000 |
2835 | 1 | 2022-07-03 | 60000 |
Trade Type:
trade_type_id | trade_desc |
---|---|
1 | Buy |
2 | Sell |
Given this information, you are asked to create a PostgreSQL query that returns the total volume of each trade type per day. The resultant table should contain columns for the trade date, the type of trade, and the total volume of that type on that date.
Create a PostgreSQL query to solve this problem.
This query joins the Trades and TradeType tables together on the column, and it groups the resulting rows by the date and the trade description (Buy or Sell). For each combination of date and trade type, it sums up the total trade volumes. Finally, it orders the resulting rows by date and total volume in descending order.
Read about Morgan Stanley's Diversity and Inclusion statement!
Unique indexes help ensure that there are no duplicate key values in a table, maintaining data integrity. They enforce uniqueness whenever keys are added or changed within the index.
Non-unique indexes on the other hand, are used to improve query performance by maintaining a sorted order of frequently used data values, but they do not enforce constraints on the associated table.
Unique Indexes are blazing fast. Non unique indexes can improve query performance, but they are often slower because of their non-unique nature.
Morgan Stanley has been running several digital ads across various platforms. The Marketing team is interested in understanding the click-through rates (CTR) for these ads to evaluate the effectiveness of their campaigns. As a part of this, they want to find out the total number of impressions, clicks, and the click-through rate (CTR) for each ad on each platform for the month of March.
You have two tables - and .
The table has the following structure:
click_id | ad_id | platform | click_date |
---|---|---|---|
101 | 1 | 03/01/2022 00:00:00 | |
102 | 2 | Google Ads | 03/01/2022 00:00:00 |
103 | 1 | 03/01/2022 00:00:00 | |
104 | 3 | 03/02/2022 00:00:00 | |
105 | 1 | 03/03/2022 00:00:00 |
The table has the following structure:
impression_id | ad_id | platform | impression_date |
---|---|---|---|
201 | 1 | 03/01/2022 00:00:00 | |
202 | 2 | Google Ads | 03/01/2022 00:00:00 |
203 | 3 | 03/01/2022 00:00:00 | |
204 | 1 | 03/01/2022 00:00:00 | |
205 | 2 | Google Ads | 03/01/2022 00:00:00 |
This SQL query first calculates the number of clicks and impressions for each ad on each platform for the month of March using CTE's and . Then, it joins these tables on ad_id and platform to calculate the CTR as the ratio of clicks to impressions.
Please note that we multiply by 1.0 to convert it into a float for accurate division (floating point arithmetic).
To practice another question about calculating rates, try this SQL interview question from TikTok within DataLemur's interactive SQL code editor:
As an analyst at Morgan Stanley, your team is interested in understanding the trading behavior of different stocks. Given a table with details of each trade that happened, your task is to write an SQL query that calculates the average traded volume of each stock grouped by the trading day and the stock ticker.
trade_id | trading_day | ticker | traded_volume |
---|---|---|---|
1 | 2022-06-08 | AAPL | 100000 |
2 | 2022-06-08 | AAPL | 150000 |
3 | 2022-06-08 | GOOG | 50000 |
4 | 2022-06-09 | AAPL | 120000 |
5 | 2022-06-09 | GOOG | 60000 |
6 | 2022-06-09 | GOOG | 70000 |
trading_day | ticker | avg_traded_volume |
---|---|---|
2022-06-08 | AAPL | 125000 |
2022-06-08 | GOOG | 50000 |
2022-06-09 | AAPL | 120000 |
2022-06-09 | GOOG | 65000 |
The SQL query makes use of the clause to group the trades by each trading day and ticker. Then, the aggregate function is used to calculate the average trading volume for each combination of trading day and ticker from the table.
A database index is a data structure that improves the speed of data retrieval operations on a database table.
There are few different types of indexes that can be used in a database:
For a concrete example, say you had a table of Morgan Stanley customer payments with the following columns:
Here's what a clustered index on the column would look like:
A clustered index on the column would determine the physical order of the records in the table based on the . This means that the records with the earliest values would be stored together physically in the table, followed by records with later payment_date values. This speeds up queries that filter or sort the data based on the , as the records are already physically sorted in the table. For example, if you want to retrieve all the payments made in the month of June, the database can use the clustered index to quickly locate and retrieve the desired records without having to search through the entire table.
As a financial analyst working at Morgan Stanley, you are tasked with the responsibility to find all the customers in the company's customer records database whose first names start with 'J' and reside in 'New York'. The database table 'customers' has records in the following format:
customer_id | first_name | last_name | city | state | account_balance |
---|---|---|---|---|---|
6759 | John | Smith | New York | New York | 12500.75 |
1231 | Jennifer | Lawrence | Los Angeles | California | 8750.80 |
4852 | James | Franklin | New York | New York | 10930.35 |
8654 | Janet | King | San Francisco | California | 9800.45 |
2331 | Jessica | Moore | Chicago | Illinois | 10450.60 |
7893 | Jacob | Martin | New York | New York | 11700.25 |
Using SQL, write a query to fetch these records.
This query uses the SQL keyword LIKE, followed by a pattern 'J%'. The '%' symbol is a wildcard that matches zero or more characters, so 'J%' matches any string that starts with 'J'. The AND operator is then used to specify multiple conditions – here, the city and state are both 'New York'.
Assuming that you've already got basic SQL skills, the next best tip we have to prepare for the Morgan Stanley SQL interview is to solve as many practice SQL interview questions as you can! Beyond just solving the above Morgan Stanley SQL interview questions, you should also solve the 200+ DataLemur interview questions which come from companies like Amazon, Microsoft, Meta, and smaller tech companies.
Each exercise has multiple hints, detailed solutions and crucially, there's an interactive SQL code editor so you can easily right in the browser your SQL query and have it checked.
To prep for the Morgan Stanley SQL interview it is also a great idea to practice SQL questions from other stock exchange & brokerage companies like:
However, if your SQL coding skills are weak, don't worry about going right into solving questions – go learn SQL with this SQL tutorial for Data Scientists & Analysts.
This tutorial covers SQL concepts such as INTERCEPT/EXCEPT and different types of joins – both of which come up often during Morgan Stanley SQL interviews.
Beyond writing SQL queries, the other question categories to practice for the Morgan Stanley Data Science Interview are:
I'm sort of biased, but I believe the optimal way to prep for Morgan Stanley Data Science interviews is to read the book Ace the Data Science Interview.
The book covers 201 interview questions sourced from FAANG tech companies. It also has a refresher covering Stats, SQL & ML. And finally it's helped thousands of people land their dream job in data, which is why it's got over 1000+ 5-star reviews on Amazon.