At Getty Images, SQL is crucial for analyzing customer usage patterns and managing the metadata of millions of digital assets. That's why Getty Images asks SQL problems in interviews for Data Science, Data Engineering and Data Analytics jobs.
To help you practice for the Getty Images SQL interview, we've collected 9 Getty Images SQL interview questions – able to solve them?
Getty Images, a popular stock photo agency, wants to analyze their sales data. They've collected a detailed list of all image purchases made over a period of time. Given their sales records, write a SQL query to compute the average monthly sales for each image.
Here are the tables:
purchase_id | image_id | user_id | purchase_date | price |
---|---|---|---|---|
8731 | 3051 | 123 | 06/08/2022 00:00:00 | 50 |
7345 | 4200 | 265 | 06/10/2022 00:00:00 | 60 |
9453 | 3051 | 362 | 06/18/2022 00:00:00 | 50 |
3421 | 4200 | 192 | 07/26/2022 00:00:00 | 60 |
5129 | 4950 | 981 | 07/05/2022 00:00:00 | 70 |
month | image_id | avg_price |
---|---|---|
6 | 3051 | 50 |
6 | 4200 | 60 |
7 | 4200 | 60 |
7 | 4950 | 70 |
This SQL query uses the function with a window over both and the month extracted from to compute the average price per image per month. Extracting the month from the purchase date allows us to group sales by month. Using a window function lets us compute the average price independently for each image-month pair, giving us the average monthly sales for each image.
p.s. Window functions show up super often during SQL interviews, so practice the 27+ window function questions on DataLemur
As a data analyst for Getty Images, you are tasked to analyze customer behavior with a focus on the key performance indicator, the active subscription status. Specifically, you need to filter out the records of customers who are currently subscribed to Getty images' services and are from countries where the company has a strong market share such as the United States, Canada, and the United Kingdom.
Create a SQL query that can filter out the customer data based on these conditions using the 'customers' table.
customer_id | country | subscription_status | subscription_start_date | subscription_end_date |
---|---|---|---|---|
101 | United States | Active | 01/01/2022 00:00:00 | NULL |
102 | Canada | Active | 05/12/2021 00:00:00 | NULL |
103 | India | Inactive | 03/15/2021 00:00:00 | 11/29/2022 00:00:00 |
104 | United Kingdom | Active | 10/24/2021 00:00:00 | NULL |
105 | Germany | Inactive | 09/08/2021 00:00:00 | 03/09/2022 00:00:00 |
This SQL query filters the 'customers' table and selects all records of customers who are currently 'Active' members and are from either the 'United States', 'Canada', or 'United Kingdom'. The WHERE clause is used to specify the 'subscription_status' condition, and the AND operator is used to combine it with the 'country' condition. The IN keyword in SQL allows us to test multiple values in the WHERE clause, offering an efficient way to select records based on multiple conditions.
hile a correlated subquery relies on columns in the main query's FROM clause and cannot function independently, a non-correlated subquery operates as a standalone query and its results are integrated into the main query.
An example correlated sub-query:
This correlated subquery retrieves the names and salaries of Getty Images employees who make more than the average salary for their department. The subquery references the department column in the main query's FROM clause (e1.department) and uses it to filter the rows of the subquery's FROM clause (e2.department).
An example non-correlated sub-query:
This non-correlated subquery retrieves the names and salaries of Getty Images employees who make more than the average salary for the Data Science department (which honestly should be very few people since Data Scientists are awesome and deserve to be paid well).The subquery is considered independent of the main query can stand alone. Its output (the average salary for the Data Science department) is then used in the main query to filter the rows of the Getty Images employees table.
As an analyst at Getty Images, you are asked to find the average number of views by category for images. Getty Images is a database of images grouped by different categories (e.g., nature, people, sports, etc) and every image view is logged in the system, your task is to write a SQL query that calculates the average number of views for each category.
image_id | category_id | image_name |
---|---|---|
101 | 1 | Sunset at Sea |
102 | 2 | Happy People |
103 | 1 | Mountainscape |
104 | 3 | Soccer Game |
view_id | image_id | view_date |
---|---|---|
501 | 101 | 06/08/2022 00:00:00 |
502 | 102 | 06/08/2022 00:00:00 |
503 | 103 | 06/09/2022 00:00:00 |
504 | 104 | 06/09/2022 00:00:00 |
505 | 101 | 06/09/2022 00:00:00 |
category | avg_views |
---|---|
1 | 1.5 |
2 | 1 |
3 | 1 |
This SQL query first calculates the total number of views per image in the subquery . It then calculates the average number of views by category by joining the subquery with the table and grouping by .
To practice a very similar question try this interactive Amazon Average Review Ratings Question which is similar for or this New York Times Laptop vs. Mobile Viewership Question which is similar for .
A join in SQL combines rows from two or more tables based on a shared column or set of columns.
Four types of JOINs exist in SQL. To demonstrate each one, say you had a table of Getty Images orders and Getty Images customers.
INNER JOIN: When there is a match in the shared key or keys, rows from both tables are retrieved. In this example, an between the Orders and Customers tables would retrieve rows where the in the Orders table matches the in the Customers table.
LEFT JOIN: A retrieves all rows from the left table (in this case, the Orders table) and any matching rows from the right table (the Customers table). If there is no match in the right table, NULL values will be returned for the right table's columns.
RIGHT JOIN: A retrieves all rows from the right table (in this case, the Customers table) and any matching rows from the left table (the Orders table). If there is no match in the left table, NULL values will be returned for the left table's columns.
FULL OUTER JOIN: A retrieves all rows from both tables, regardless of whether there is a match in the shared key or keys. If there is no match, NULL values will be returned for the columns of the non-matching table.
Consider the scenario where you are an analyst at Getty Images handling digital marketing. To assess the effectiveness of ads and the convenience of the website's user interface, you decide to evaluate the Click-Through Rate (CTR) of customers viewing an image to adding that image to the cart.
You have two tables. One table, , documents each ad click. It includes the , the which identifies which user clicked the ad, the when the ad was clicked, and the indicating which image was advertised in the ad.
Another table, , logs every add-to-cart event. It includes the , the which identifies which user added an image to their cart, the when the add-to-cart event happened, and the indicating which image was added to the cart.
Example Input:
event_id | user_id | click_timestamp | image_id |
---|---|---|---|
1001 | 575 | 06/08/2022 11:32:23 | 34560 |
1002 | 981 | 06/10/2022 15:42:12 | 23890 |
1003 | 662 | 06/18/2022 10:37:08 | 34560 |
1004 | 325 | 07/26/2022 09:15:42 | 89145 |
1005 | 981 | 07/05/2022 13:20:10 | 23890 |
Example Input:
event_id | user_id | add_timestamp | image_id |
---|---|---|---|
2001 | 575 | 06/14/2022 08:42:27 | 34560 |
2002 | 391 | 06/16/2022 18:20:00 | 23890 |
2003 | 662 | 06/20/2022 12:37:48 | 34560 |
2004 | 325 | 07/27/2022 17:15:36 | 89145 |
2005 | 981 | 07/06/2022 14:40:05 | 23890 |
Calculate the Click-Through Rate (CTR) for each image. We will assume a simple model where a click proceeded an add-to-cart event if the click happened at any previous time by the same user on the same image.
This query first summarizes the total clicks and add-to-cart events for each image. The clause then combines these two summaries together. The result is then used to calculate the CTR for each image by dividing the number of add-to-cart events by the total number of clicks. Note that we are counting only add-to-cart events that happen after a click event by the same user on the same image.
To practice a related SQL interview question on DataLemur's free interactive SQL code editor, solve this SQL interview question asked by Facebook:
A database view is a virtual table that is created based on the results of a SELECT statement, and provides you a customized, read-only version of your data that you can query just like a regular table.
Views in SQL can help you enforce data security requirements by hiding sensitive data from certain users, and can improve performance for some queries by pre-computing the results for an intermediate step and storing them in a view (which can be faster than executing the intermediate query each time). However, their read-only nature means that on any underlying update, the view has to be re-computed.
Given two tables, and , write a SQL query to determine the total spending of each customer on Getty Images products. Consider that each purchase made by a customer is recorded in the table and each customer is uniquely identified by a in the table. Also, assume that every purchase has a price.
customer_id | first_name | last_name | |
---|---|---|---|
101 | John | Doe | johndoe@gmail.com |
102 | Jane | Doe | janedoe@gmail.com |
103 | Bob | Smith | bobsmith@gmail.com |
purchase_id | customer_id | product_id | price |
---|---|---|---|
501 | 101 | 201 | 10.99 |
502 | 101 | 202 | 15.99 |
503 | 102 | 203 | 20.99 |
504 | 103 | 204 | 25.99 |
505 | 103 | 205 | 30.99 |
This query is joining the and tables together on to link each customer to their purchase information. It then groups the result by and sums up the price of each group (i.e., each customer) to yield the total spending of each customer.
Since joins come up so often during SQL interviews, try this SQL join question from Spotify:
Getty Images would like to assign a base score to each image, then adjust the score based on the number of downloads and views an image has. The final score for an image is calculated as follows:
Final Score = Base Score + SQRT(No. of Downloads) + log(No. of Views)
Write a SQL query that calculates the final score for each image in the database.
image_id | base_score | num_downloads | num_views |
---|---|---|---|
1001 | 20 | 100 | 5000 |
1002 | 25 | 200 | 10000 |
1003 | 30 | 300 | 15000 |
1004 | 35 | 400 | 20000 |
1005 | 40 | 500 | 25000 |
image_id | final_score |
---|---|
1001 | 45.76 |
1002 | 51.30 |
1003 | 57.48 |
1004 | 64.32 |
1005 | 71.81 |
In this query, SQRT(num_downloads) and LOG(num_views) utilize math functions to calculate the scaled rating influenced by the number of downloads and views respectively. The base_score is added to these values to calculate the final_score for every image. The ROUND function is used to round off the final_score to two decimal places for better readability. The absolute function could be useful if there were negative values in base_score or the computed SQRT and LOG values, but in this specific example it isn't required.
The key to acing a Getty Images SQL interview is to practice, practice, and then practice some more! In addition to solving the earlier Getty Images SQL interview questions, you should also solve the 200+ FAANG SQL Questions on DataLemur which come from companies like Netflix, Airbnb, and Amazon.
Each interview question has hints to guide you, detailed solutions and most importantly, there is an online SQL code editor so you can right in the browser run your query and have it graded.
To prep for the Getty Images SQL interview it is also helpful to solve SQL questions from other tech companies like:
But if your SQL query skills are weak, don't worry about going right into solving questions – strengthen your SQL foundations with this free SQL tutorial.
This tutorial covers SQL topics like GROUP BY and UNION – both of which come up routinely in SQL job interviews at Getty Images.
Besides SQL interview questions, the other types of problems covered in the Getty Images Data Science Interview are:
To prepare for Getty Images Data Science interviews read the book Ace the Data Science Interview because it's got: