Data Analysts and Data Engineers at Goldman Sachs write SQL queries all the time as part of their job. They use it for extracting and analyzing financial data for reporting needs, and managing relational databases. That's why Goldman Sachs asks prospective hires SQL interview questions.
So, to help you ace the Goldman Sachs SQL interview, we'll cover 10 Goldman Sachs SQL interview questions – scroll down to start solving them!
Sure. Here's is an example of a SQL window function question that could possibly be used in a Goldman Sachs interview, this question is based on analysing sales data:
Goldman Sachs has various products and they want to analyse their sales of each product. They asked you to write a SQL query that calculates the running total of sales for each product, ordered by the month of the year.
Here's some sample data:
sale_id | product_id | sale_date | sale_amount |
---|---|---|---|
1 | 100 | 01/01/2022 | 5000 |
2 | 200 | 01/15/2022 | 3000 |
3 | 100 | 02/01/2022 | 7000 |
4 | 300 | 02/15/2022 | 1000 |
5 | 200 | 03/01/2022 | 6000 |
6 | 300 | 03/15/2022 | 4000 |
7 | 100 | 04/01/2022 | 8000 |
8 | 200 | 04/15/2022 | 9000 |
month | product | running_total |
---|---|---|
1 | 100 | 5000 |
1 | 200 | 3000 |
2 | 100 | 12000 |
2 | 300 | 1000 |
3 | 200 | 9000 |
3 | 300 | 5000 |
4 | 100 | 20000 |
4 | 200 | 18000 |
Here is a PostgreSQL query:
This query uses the window function to calculate the running total of sales for each product. It partitions the data by product_id, and then order the data by month. It then sums up all the product's sales till the current row, inclusive. The is used to get the month part of the sale_date. The ORDER BY clause at the end is simply to order the output in a readable manner.
p.s. Window functions show up pretty frequently during SQL interviews, so practice the 27+ window function questions on DataLemur
Assume you had a table of Goldman Sachs employee salaries, along with which department they belonged to. Write a query to compare the average salary of employees in each department to the company's average salary. Return the comparison result as 'higher', 'lower', or 'same' for each department. Display the department ID, payment month (in MM-YYYY format), and the comparison.
Code your solution to this question interactively on DataLemur:
The solution is LONG – 30+ lines of SQL. You can find a detailed solution here: Department Salaries.
Make sure you study up on everything Goldman Sachs before the interview, read about the Goldman Sachs products and services!
Denormalization is the process of modifying a database schema in a way that deviates from the typical rules of normalization (1NF, 2NF, 3NF, etc.).
Denormalization is often used to improve the performance of a database, particularly when it is being used for reporting and analytical purposes (rather than in an Online Transaction Processing (OLTP) manager).
By duplicating data, denormalization can reduce the number of expensive joins required to retrieve data, which can improve query performance. However, denormalization can also cause problems such as increased data redundancy and the need for more complex update and delete operations.
Goldman Sachs is interested in analyzing data regarding client transactions in order to uncover insights for their investment strategies. Specifically, they want to track the total investment amount in each asset (stocks, bonds, derivatives), from each client per month.
They have a table named with columns , , , , and .
Design a database schema suitable for this problem and write a PostgreSQL query to provide a summary of the total invested amount in each asset type, by each client, per month.
transaction_id | client_id | transaction_date | asset_id | asset_type | amount_invested |
---|---|---|---|---|---|
1001 | 50 | 02/03/2021 | 22001 | Stocks | 20000 |
1002 | 50 | 02/10/2021 | 30005 | Bonds | 50000 |
1003 | 80 | 03/20/2021 | 22001 | Stocks | 60000 |
1004 | 50 | 04/11/2021 | 45001 | Derivatives | 30000 |
month | client_id | asset_type | total_invested |
---|---|---|---|
2 | 50 | Stocks | 20000 |
2 | 50 | Bonds | 50000 |
3 | 80 | Stocks | 60000 |
4 | 50 | Derivatives | 30000 |
This query groups the transaction data by month, client id, and asset type. It then uses the aggregate function SUM to calculate the total amount invested for each group.
In SQL, both a left and right join is used to combine rows from different tables based on a shared key or set of keys, resulting in a single merged table of data. However, here's the difference:
LEFT (OUTER) JOIN: Retrieves all the records/rows from the left and the matched records/rows from the right table.
RIGHT (OUTER) JOIN: Retrieves all the records/rows from the right and the matched records/rows from the left table.
As an analyst in Goldman Sachs, you are asked to filter the customer transactions database to find out the transactions that shaped the company’s fortunes in the last fiscal quarter.
The conditions to filter the transactions are:
Using the two tables and , write an SQL query that will filter these transactions.
customer_id | join_date |
---|---|
1 | 02/02/2021 |
2 | 12/10/2020 |
3 | 07/15/2021 |
4 | 11/25/2020 |
5 | 08/19/2021 |
transaction_id | customer_id | transaction_date | transaction_value | transaction_status |
---|---|---|---|---|
101 | 1 | 09/15/2021 | 15000 | Success |
102 | 2 | 09/10/2021 | 9999 | Success |
103 | 3 | 09/18/2021 | 20000 | Failed |
104 | 4 | 09/15/2021 | 25000 | Success |
105 | 5 | 09/20/2021 | 30000 | Success |
transaction_id | customer_id | transaction_date | transaction_value |
---|---|---|---|
101 | 1 | 09/15/2021 | 15000 |
105 | 5 | 09/20/2021 | 30000 |
In the PostgreSQL query above, we are joining the table with the table on the . Our WHERE clause filters the transactions by: those greater than $10,000, made by customers who joined after January 1, 2021, and are successful. The resulting table consists of transaction id, customer id, transaction date, and transaction value in accordance with the conditions specified.
The clause works similarly to the clause, but it is used to filter the groups of rows created by the clause rather than the rows of the table themselves.
For example, say you were analyzing Goldman Sachs sales data:
This query retrieves the total sales for all products in each region, and uses the clause to only sales made after January 1, 2023. The rows are then grouped by region and the clause filters the groups to include only those with total sales greater than $500k.
Regularly, Goldman Sachs hosts various online seminars for their prospective clients. The marketing team sends email invitations to a list of potential attendees. It wants to know the click-through rate (CTR) of these invitations. The CTR is calculated as the number of clicks on the invitation's link divided by the number of emails sent out, expressed as a percentage.
Consider two tables; which contains information about each email sent, and that holds data about each click on the invitation link.
email_id | recipient | email_date | seminar_id |
---|---|---|---|
5011 | a.john@mail.com | 06/08/2022 | 2001 |
5242 | b.smith@mail.com | 06/10/2022 | 2001 |
5933 | c.williams@mail.com | 06/18/2022 | 3002 |
6412 | d.brown@mail.com | 07/26/2022 | 3002 |
7890 | e.taylore@mail.com | 07/05/2022 | 4003 |
click_id | email_id |
---|---|
8011 | 5011 |
9822 | 5242 |
9633 | 6412 |
This query joins the and tables based on the email id. For each seminar, it counts the number of clicks and the number of emails sent. It then calculates the click-through rate by dividing the number of clicks by the number of emails sent. The result is the click-through rate for each seminar.
To solve another question about calculating rates, try this SQL interview question from TikTok within DataLemur's online SQL coding environment:
As a data analyst at Goldman Sachs, one important task could be to understand the trading behavior of different segments of clients. Certain segments may trade more heavily than others, and identifying these trends can provide valuable business insights.
Suppose you have access to a table called which records every trade executed by a client. Each trade has a , , , ('Retail', 'Corporate', 'Institutional') and (in USD). Using this data, write a SQL query to calculate the average amount of trades for each client segment for every month.
trade_id | client_id | trade_date | client_segment | trade_amount |
---|---|---|---|---|
55002 | 157 | 02/16/2022 | Retail | 10500 |
89052 | 815 | 02/28/2022 | Institutional | 30000 |
92024 | 621 | 02/10/2022 | Corporate | 20000 |
74581 | 328 | 03/01/2022 | Retail | 25000 |
55033 | 415 | 03/20/2022 | Institutional | 15000 |
This SQL query groups trades by both the client segment and month, while calculating the average trade amount for each group. The data is ordered by month in descending order and within each month, by the average trade amount in descending order. This allows us to see the more recent data first, and within each month, the client segments that trade the most on average.
{#Question-10}
The UNIQUE constraint is used to ensure the uniqueness of the data in a column or set of columns in a table. It prevents the insertion of duplicate values in the specified column or columns and helps to ensure the integrity and reliability of the data in the database.
For example, say you were on the Marketing Analytics team at Goldman Sachs and were doing some automated keyword research:
Your keyword database might store SEO data like this:
In this example, the UNIQUE constraint is applied to the "keyword" field to ensure that each keyword is unique. This helps to ensure the integrity of the data in the database and prevents errors that could occur if two rows had the same keyword.
The key to acing a Goldman Sachs SQL interview is to practice, practice, and then practice some more! Beyond just solving the above Goldman Sachs SQL interview questions, you should also solve the 200+ tricky sql questions on DataLemur which come from companies like Microsoft, Google, and Facebook.
Each interview question has hints to guide you, step-by-step solutions and crucially, there is an online SQL code editor so you can right in the browser run your SQL query answer and have it checked.
To prep for the Goldman Sachs SQL interview it is also wise to practice SQL questions from other stock exchange & brokerage companies like:
In case your SQL foundations are weak, forget about going right into solving questions – go learn SQL with this interactive SQL tutorial.
This tutorial covers topics including aggregate window functions and joining a table to itself – both of which show up routinely during Goldman Sachs SQL interviews.
In addition to SQL interview questions, the other types of questions to prepare for the Goldman Sachs Data Science Interview are:
To prepare for Goldman Sachs Data Science interviews read the book Ace the Data Science Interview because it's got: