At State Street, SQL is utilized for analyzing financial data trends and for managing databases across global departments. That's why State Street typically asks SQL query questions during interviews for Data Analytics, Data Science, and Data Engineering jobs.
Thus, to help you prep, here’s 10 State Street SQL interview questions – can you solve them?
State Street is a financial services and bank holding company. A relevant dataset could be daily stock trading volumes. In the following question, we are going to calculate the average monthly trading volume for each stock.
Every day, many different stocks are traded at State Street. For every stock, the number of traded shares needs to be recorded. The schema for these trades may be stored in a table as follows:
trade_id | trade_date | stock_code | volume |
---|---|---|---|
1 | 2022-07-01 | AAPL | 2500 |
2 | 2022-07-01 | GOOG | 1500 |
3 | 2022-07-02 | AAPL | 3500 |
4 | 2022-07-03 | AMZN | 2400 |
5 | 2022-08-01 | AAPL | 4000 |
6 | 2022-08-02 | AMZN | 2100 |
7 | 2022-08-03 | GOOG | 1300 |
8 | 2022-08-04 | AAPL | 3000 |
9 | 2022-08-05 | AMZN | 2600 |
10 | 2022-08-06 | GOOG | 1800 |
In this question, you are asked to write a SQL query to calculate the average volume traded for each stock, over each month.
Expected Output
month | stock_code | avg_volume |
---|---|---|
7 | AAPL | 3000 |
7 | GOOG | 1500 |
7 | AMZN | 2400 |
8 | AAPL | 3500 |
8 | GOOG | 1550 |
8 | AMZN | 2350 |
This query uses the DATE_TRUNC function to round the trade_date down to the month. It then groups by the result and the stock_code to calculate the average (AVG) volume for each combination of month and stock. Finally, it sorts the result by month and stock_code.
For more window function practice, try this Uber SQL Interview Question on DataLemur's interactive coding environment:
Given a table of State Street employee salaries, write a SQL query to find the 2nd highest salary amongst all the employees.
employee_id | salary |
---|---|
1 | 2500 |
2 | 800 |
3 | 1000 |
4 | 1200 |
second_highest_salary |
---|
1200 |
Solve this question directly within the browser on DataLemur:
You can find a detailed solution here: 2nd Highest Salary.
One way to find duplicatesis to use a clause and then use to find groups
You could also use the operator:
State Street Corporation is an American financial services and bank holding company. It is one of the largest asset management companies in the world and provides investment management services to institutional investors such as mutual funds and retirement plans.
One key aspect of their operations is understanding the monthly performance of each investment fund they manage. Your task is to design a table which models the investment funds' data and returns, and then write a SQL query to calculate average monthly returns for each investment fund in their management.
Below is an example of how the investment data could be modeled:
fund_id | date | return_percentage |
---|---|---|
fh129 | 01/01/2022 | 1.2 |
kp908 | 01/01/2022 | 1.5 |
fg672 | 01/01/2022 | -0.5 |
fh129 | 02/01/2022 | 2.4 |
kp908 | 02/01/2022 | -1.2 |
fg672 | 02/01/2022 | 0.6 |
Your task is to determine the average monthly return for each investment fund.
month | fund_id | avg_return |
---|---|---|
1 | fh129 | 1.2 |
1 | kp908 | 1.5 |
1 | fg672 | -0.5 |
2 | fh129 | 2.4 |
2 | kp908 | -1.2 |
2 | fg672 | 0.6 |
The following SQL query could be used to solve this problem:
This query first extracts the month from the date column, then groups the return percentage by month and fund_id. It thus calculates the average return for each combination of month and fund, and orders the result by month and fund_id.
hile a correlated subquery relies on columns in the main query's FROM clause and cannot function independently, a non-correlated subquery operates as a standalone query and its results are integrated into the main query.
An example correlated sub-query:
This correlated subquery retrieves the names and salaries of State Street employees who make more than the average salary for their department. The subquery references the department column in the main query's FROM clause (e1.department) and uses it to filter the rows of the subquery's FROM clause (e2.department).
An example non-correlated sub-query:
This non-correlated subquery retrieves the names and salaries of State Street employees who make more than the average salary for the Data Science department (which honestly should be very few people since Data Scientists are awesome and deserve to be paid well).The subquery is considered independent of the main query can stand alone. Its output (the average salary for the Data Science department) is then used in the main query to filter the rows of the State Street employees table.
As a part of State Street Corp, you are given access to a client database. The database records details such as client ID, client name, and total assets. The management wants to analyze their high-value clients in the New York and Boston regions. Your task is to write a SQL query to filter clients from these regions with total assets more than $50,000,000.
client_id | client_name | region | total_assets |
---|---|---|---|
1 | Anderson Wealth Ltd. | New York | 80000000 |
2 | Bentley Holdings | Chicago | 30000000 |
3 | Carmichael & Co. | New York | 50000000 |
4 | Dawson Investments | Boston | 70000000 |
5 | Eccleston Group | Los Angeles | 40000000 |
6 | Freeman Capital | Boston | 90000000 |
client_id | client_name | region | total_assets |
---|---|---|---|
1 | Anderson Wealth Ltd. | New York | 80000000 |
3 | Carmichael & Co. | New York | 50000000 |
4 | Dawson Investments | Boston | 70000000 |
6 | Freeman Capital | Boston | 90000000 |
This query filters the clients table to only show records where the client's 'region' is either 'New York' or 'Boston' and where the 'total_assets' value exceeds $50,000,000.
A non-relational (NoSQL) database is any database that does not use the typical tabular format of rows and columns like in relational databases.
While knowing the four different types of NoSQL databases is beyond the scope of most Data Analyst & Data Science interviews, Data Engineers at State Street should vaguely refresh these concepts:
State Street is a leading financial services provider which serves institutional investors worldwide, including management of their investment portfolios. Suppose you are data engineer at State Street and your job is to provide insights about clients' transaction behaviours. Can you write a SQL query to find out the average transaction amount for each account for the year 2022?
transaction_id | account_id | transaction_date | transaction_amount |
---|---|---|---|
001 | 1001 | 02/01/2022 00:00:00 | 20000 |
002 | 1002 | 03/15/2022 00:00:00 | 30000 |
003 | 1001 | 04/30/2022 00:00:00 | 40000 |
004 | 1003 | 06/15/2022 00:00:00 | 50000 |
005 | 1002 | 07/01/2022 00:00:00 | 60000 |
account_id | avg_transaction_amount |
---|---|
1001 | 30000 |
1002 | 45000 |
1003 | 50000 |
In the SQL query above, we apply the clause on the column to find the average transaction amount () per account. We only consider transactions in the year 2022 by filtering out rows where the year part of is not 2022. This is done using the function in PostgreSQL. This gives us insight on the average transaction amount per account for the year 2022.
The 2 most similar questions to the State Street SQL question:
To practice a very similar question try this interactive Uber User's Third Transaction Question which is similar for dealing with user transactional data or the Stripe Repeated Payments Question which is similar for analyzing transaction details within a set timeframe.
Take a look at the State Street career page and see what type of roles use SQL in their day-to-day role.
Given a table containing ad impressions and clicks by users, we want to identify the click-through rate for each advertisement. For State Street, this would be particularly useful when analyzing the effectiveness of different advertisements.
The click-through rate is calculated as the total number of clicks that an ad receives divided by the number of times the ad is shown (impressions), expressed as a percentage.
Assume the following table :
ad_id | impression_date | user_id | clicked |
---|---|---|---|
1001 | 06/01/2022 | 123 | yes |
1002 | 06/01/2022 | 265 | no |
1001 | 06/02/2022 | 362 | yes |
1002 | 06/02/2022 | 192 | yes |
1001 | 06/03/2022 | 981 | no |
impression_date is the date when the ad was shown to a user, clicked indicates whether the user clicked on the ad (yes/no).
We would like to calculate click-through rate for each ad_id.
This SQL query first transforms the 'yes/no' clicked column into a 1/0 column. It then groups by ad_id to calculate the total clicks (sum of 'clicked') and total impressions (count of rows). This is then returned as a percentage to two decimal places. As a result, we could see the click-through rate for each ad.
To solve another question about calculating rates, try this TikTok SQL Interview Question within DataLemur's interactive coding environment:
In SQL, both and are used to rank rows within a result set. The key difference between the two functions is how deal with two or more rows having the same value in the ranked column (aka how the break ties).
RANK(): When there's a tie, leaves a gap in the ranking. For example, if three rows are tied for 2nd place, the RANK() function will assign a rank of 2 to the first of these rows, a rank of 3 to the second row, and a rank of 4 to the third row.
DENSE_RANK(): For ties, does not leave a gap in the ranking. Instead, it assigns the same rank to all tied rows, and then makes the next row 1 bigger. Confusing, I know, but here's an example to make it more clear: if three rows are tied for 3rd place, the function will assign a rank of 3 to all three rows, and then assign a rank of 4 to the next row.
The key to acing a State Street SQL interview is to practice, practice, and then practice some more! Beyond just solving the above State Street SQL interview questions, you should also solve the 200+ tricky sql questions on DataLemur which come from companies like FAANG tech companies and tech startups.
Each problem on DataLemur has multiple hints, detailed solutions and crucially, there is an online SQL coding environment so you can easily right in the browser your query and have it checked.
To prep for the State Street SQL interview you can also be helpful to solve SQL problems from other banking & finanacial services companies like:
But if your SQL coding skills are weak, forget about jumping right into solving questions – improve your SQL foundations with this DataLemur SQL tutorial.
This tutorial covers things like filtering on multiple conditions using AND/OR/NOT and CASE/WHEN statements – both of which pop up routinely in State Street SQL assessments.
In addition to SQL interview questions, the other question categories covered in the State Street Data Science Interview are:
To prepare for State Street Data Science interviews read the book Ace the Data Science Interview because it's got: