At Sony, SQL is used often for analyzing customer behavior data for gaming patterns and for managing and manipulating databases of films and music. Because of this, Sony frequently asks SQL coding questions during interviews for Data Science and Data Engineering positions.
In case you're trying to prepare for the SQL Interview, here’s 11 Sony SQL interview questions to practice, which are similar to recently asked questions at Sony – can you solve them?
Sony is interested in identifying its most active users, also referred to as "power users", "VIP users", or "whale users". The company defines these users as those who purchase the most frequently. Write an SQL query to identify the top 10 users with the highest number of purchases in the last one year.
Based on the tables users
and purchases
.
users
Example Input:user_id | username | join_date |
---|---|---|
125 | user1 | 2019-01-01 |
452 | user2 | 2018-12-15 |
763 | user3 | 2020-02-20 |
984 | user4 | 2019-11-30 |
235 | user5 | 2018-06-18 |
purchases
Example Input:purchase_id | user_id | purchase_date | product_id | amount |
---|---|---|---|---|
1001 | 125 | 2021-08-17 | 105 | 2999 |
1002 | 452 | 2021-08-17 | 106 | 2999 |
1003 | 763 | 2021-08-18 | 107 | 4999 |
1004 | 984 | 2022-08-19 | 108 | 3999 |
1005 | 235 | 2021-08-19 | 109 | 1999 |
SELECT u.username, COUNT(p.purchase_id) as purchase_frequency FROM users u JOIN purchases p ON u.user_id = p.user_id WHERE p.purchase_date TO_CHAR(p.purchase_date, 'YYYY') = '2022' GROUP BY u.username ORDER BY purchase_frequency DESC LIMIT 10;
In above query, we first join users
table with the purchases
table on user_id
. We filter the data for the year 2022 using the WHERE
clause. Then, we group the result by username and count the number of purchases for each user. Finally, we order the result in descending order by purchase frequency and limit the output to top 10 users.
To practice a related customer analytics question on DataLemur's free interactive coding environment, try this Microsoft SQL Interview problem:
As a data analyst at Sony, you are tasked with analyzing the wide array of products they sell. Write a single SQL query that returns the product by product ID and month with its average stars rating.
reviews
Example Input:review_id | user_id | submit_date | product_id | stars |
---|---|---|---|---|
6171 | 123 | 06/08/2022 00:00:00 | 50001 | 4 |
7802 | 265 | 06/10/2022 00:00:00 | 69852 | 4 |
5293 | 362 | 06/18/2022 00:00:00 | 50001 | 3 |
6352 | 192 | 07/26/2022 00:00:00 | 69852 | 3 |
4517 | 981 | 07/05/2022 00:00:00 | 69852 | 2 |
SELECT EXTRACT(MONTH FROM submit_date) as mth, product_id as product, AVG(stars) as avg_stars FROM reviews GROUP BY mth, product ORDER BY mth ASC, avg_stars DESC;
This query groups the review data by month and product_id, and uses the AVG() function to compute the average rating per product per month. The results are then ordered in ascending order of the month, and in descending order of the average rating. Thus, you can easily see which products perform best in terms of user ratings each month.
To solve another window function question on DataLemur's free online SQL coding environment, solve this Google SQL Interview Question:
A self-join is a type of JOIN where a table is joined to itself. To execute a self-join, you must include the table name twice in the FROM clause and assign a different alias to each instance. You can then join the two copies of the table using a JOIN clause, and use a WHERE clause to specify the relationship between the rows.
For example, say you had website visitor data for Sony, exported from the company's Google Analytics account. In support of the web-dev team, you had to analyze pairs of pages for UX or navigational issues. As part of that analysis, you wanted to wanted to generate all pairs of URLs, but needed to avoid pairs where both the URLs were the same since that's not a valid pair.
The self-join query would like the following:
SELECT page1.url AS page_url, page2.url AS referred_from FROM google_analytics AS page1 JOIN google_analytics AS page2 ON page1.referrer_id = page2.id WHERE page1.id <> page2.id;
This query returns the url of each page (page1.url
) along with the url of the page that referred to it (page2.url
). The self-join is performed using the referrer_id
field, which specifies the id of the page that referred the visitor to the current page, and avoids any pages that referred themself (aka data anomalies).
Sony, a major electronics company, wants to understand its customer base better for targeted advertising. They are interested in finding all customers who have purchased at least 2 different kinds of electronics (TV, Playstation, etc.) between January 2021 and December 2021, and are located in California.
Given a purchases
table consisting of rows representing the purchase records of customers, and a customers
table providing the customer's location, can you write a SQL query to obtain the list of these customers?
Please take into account the following tables structure for the purchases
and customers
tables:
purchases
Example Input:purchase_id | customer_id | product_type | purchase_date |
---|---|---|---|
1 | 64321 | 'TV' | '2021-07-01' |
2 | 64322 | 'TV' | '2021-05-12' |
3 | 64321 | 'Playstation' | '2021-06-25' |
4 | 64323 | 'Radio' | '2021-02-20' |
5 | 64324 | 'TV' | '2021-09-15' |
6 | 64324 | 'Playstation' | '2021-08-27' |
customers
Example Input:customer_id | location |
---|---|
64321 | 'California' |
64322 | 'New York' |
64323 | 'California' |
64324 | 'California' |
SELECT DISTINCT p.customer_id FROM purchases p JOIN customers c ON p.customer_id = c.customer_id WHERE p.product_type IN ('Playstation', 'TV') AND p.purchase_date BETWEEN '2021-01-01' AND '2021-12-31' AND c.location = 'California' GROUP BY p.customer_id HAVING COUNT(DISTINCT p.product_type) >= 2;
The SQL query first joins the purchases
table to the customers
table on the customer_id
field. It then filters out records where the product_type is either 'Playstation' or 'TV', the purchase_date is between '2021-01-01' and '2021-12-31', and the customer location is 'California'. The GROUP BY
statement groups records by customer_id
, and the HAVING
clause filters out groups having less than 2 unique product_type
, representing customers who have purchased at least 2 different types of products.
Constraints are used to specify the rules concerning data in the table. It can be applied for single or multiple fields in an SQL table during the creation of the table or after creating using the ALTER TABLE command. The constraints are:
Say you were storing sales analytyics data from Sony's CRM inside a database. Here's some example constraints you could use:
PRIMARY KEY constraint: You might use a PRIMARY KEY constraint to ensure that each record in the database has a unique identifier. For example, you could use the "opportunity_id" field as the primary key in the "opportunities" table.
FOREIGN KEY constraint: You might use a FOREIGN KEY constraint to link the data in one table to the data in another table. For example, you could use a foreign key field in the "opportunities" table to reference the "account_id" field in the "accounts" table.
NOT NULL constraint: You might use a NOT NULL constraint to ensure that a field cannot contain a NULL value. For example, you could use a NOT NULL constraint on the "opportunity_name" field in the "opportunities" table to ensure that each opportunity has a name.
UNIQUE constraint: You might use a UNIQUE constraint to ensure that the data in a field is unique across the entire table. For example, you could use a UNIQUE constraint on the "email" field in the "contacts" table to ensure that each contact has a unique email address.
CHECK constraint: You might use a CHECK constraint to ensure that the data in a field meets certain conditions. For example, you could use a CHECK constraint to ensure that the "deal_probability" field in the "opportunities" table is a value between 0 and 100.
DEFAULT constraint: You might use a DEFAULT constraint to specify a default value for a field. For example, you could use a DEFAULT constraint on the "stage" field in the "opportunities" table to set the default value to "prospecting"
As a data analyst at Sony, one of your tasks is to help the sales and marketing team understand the average selling price of each category of Sony products. Using the Sony product sales dataset, can you write a SQL query that calculates the average selling price of each product category?
sony_sales
Example Input:sale_id | sale_date | product_id | product_category | sale_price |
---|---|---|---|---|
101 | 2021/01/01 | 001 | TVs | 500 |
102 | 2021/01/02 | 002 | Cameras | 200 |
103 | 2021/01/05 | 003 | Phones | 800 |
104 | 2021/01/07 | 004 | TVs | 600 |
105 | 2021/01/11 | 005 | Cameras | 300 |
product_category | avg_sale_price |
---|---|
TVs | 550.00 |
Cameras | 250.00 |
Phones | 800.00 |
In PostgreSQL, you would use the AVG function to calculate the average selling price:
SELECT product_category, AVG(sale_price) AS avg_sale_price FROM sony_sales GROUP BY product_category;
The query initializes by selecting the product_category
and sale_price
columns from the sony_sales
table. It then calculates the average sale_price
for each product_category
using the AVG function. Lastly, the query groups the data by product_category
, so the output reveals the average selling price for each distinct category of Sony products.
To practice a very similar question try this interactive Amazon Highest-Grossing Items Question which is similar for calculating category-wise metrics or this Wayfair Y-on-Y Growth Rate Question which is similar for analysis of sales data.
A primary key is a special type of column or set of columns in a database table that is used to identify each row in the table. It helps to ensure that all the rows are distinct and prevents null values. Additionally, primary keys play a crucial role in maintaining the accuracy of the data by making sure that there are no duplicate rows. They are also used to establish relationships between tables.
To create a primary key in a SQL database, you can use the PRIMARY KEY
constraint. For example, in the table sony_employees
, the EmployeeID
column is the primary key.
CREATE TABLE sony_employees ( EmployeeID INTEGER PRIMARY KEY, LastName VARCHAR(255), FirstName VARCHAR(255), Age INTEGER, Salary DECIMAL(8,2) );
Sony, a multi-national company that sells a huge range of electronic products, is running a digital marketing campaign. They want to understand the click-through conversion rate of their digital ads that drive users from viewing a product to adding a product to the cart.
You are given two tables 'Clicks' and 'Carts', which show those users who clicked to view a product and those users who then added the product to their cart.
Clicks
Example Input:click_id | user_id | click_time | product_id |
---|---|---|---|
101 | 678 | 06/08/2022 11:00:00 | 90301 |
102 | 356 | 06/10/2022 14:00:00 | 90306 |
103 | 789 | 06/18/2022 16:00:00 | 90301 |
104 | 678 | 07/26/2022 15:00:00 | 90306 |
Carts
Example Input:cart_id | user_id | cart_time | product_id |
---|---|---|---|
201 | 356 | 06/10/2022 14:05:00 | 90306 |
202 | 789 | 06/18/2022 16:10:00 | 90301 |
SELECT p.product_id, COUNT(DISTINCT c.user_id) AS click_users, COUNT(DISTINCT ca.user_id) AS cart_users, ROUND((100.0 * COUNT(DISTINCT ca.user_id)) / COUNT(DISTINCT c.user_id), 2) AS conversion_rate FROM (SELECT click_id, user_id, product_id FROM Clicks WHERE click_time BETWEEN '2022-06-01' AND '2022-06-30') c LEFT JOIN (SELECT cart_id, user_id, product_id FROM Carts WHERE cart_time BETWEEN '2022-06-01' AND '2022-06-30') ca ON c.user_id = ca.user_id AND c.product_id = ca.product_id GROUP BY p.product_id;
This SQL block helps Sony calculate the conversion rate of their digital ads for the month of June 2022. It considers only these users who clicked to view a product and those users who then added the product to their shopping carts within June. The conversion rate is then calculated as the percentage of users who, after clicking on a product, added it to their cart.
To solve a similar problem about calculating rates, solve this SQL interview question from TikTok on DataLemur's online SQL coding environment:
Consider a case where Sony wants to know the average total sales achieved for each of their products on a monthly basis. You are given a table named sales
with the following schema:
sale_id
(integer): the id of the sale.product_id
(integer): the id of the product.sale_date
(timestamp): the date of sale.quantity
(integer): the number of product units sold.price_per_unit
(decimal): the price per unit of the product.Given the above, write a SQL query that will return each product's id and its average total sales per month. The total sale for each sale_id is calculated as quantity
*price_per_unit
.
sales
Example Input:sale_id | product_id | sale_date | quantity | price_per_unit |
---|---|---|---|---|
1214 | 321 | 06/08/2022 00:00:00 | 10 | 100 |
9321 | 627 | 06/12/2022 00:00:00 | 5 | 200 |
7891 | 321 | 06/22/2022 00:00:00 | 12 | 100 |
6782 | 981 | 07/02/2022 00:00:00 | 6 | 150 |
1734 | 627 | 07/25/2022 00:00:00 | 8 | 200 |
month | product_id | avg_sales |
---|---|---|
6 | 321 | 1100.00 |
6 | 627 | 1000.00 |
7 | 981 | 900.00 |
7 | 627 | 1600.00 |
SELECT EXTRACT(MONTH FROM sale_date) AS month, product_id, AVG(quantity*price_per_unit) as avg_sales FROM sales GROUP BY month, product_id;
What does this query do?
This query first extracts the month from the sale_date
and then groups the sales by this month and product_id
. For each created group, it calculates the average of total sales for each product. The total sale for each sale entry is calculated as the product of quantity
and price_per_unit
. The AVG
function then calculates the average total sale of each product per month.
A correlated sub-query is one that relies on the outer query and cannot be run independently. It uses the outer query to filter or transform data by referencing a column from it, while the outer query uses the results of the inner query. On the other hand, a non-correlated sub-query is not linked to the outer query and can be run on its own. It does not reference any columns from the outer query and is used to retrieve additional data for the outer query.
Here is an example of a correlated sub-query:
SELECT t1.customer_id, t1.total_sales FROM sony_sales t1 WHERE t1.total_sales > ( SELECT AVG(t2.total_sales) FROM sony_sales t2 WHERE t2.customer_id = t1.customer_id );
This query selects the customer_id
and total sales of all Sony customers in the sales table whose total_sales
are greater than the average total_sales
of their own customer group. The sub-query in this case is correlated with the outer query, as it references the customer_id
column from the outer query (t1.customer_id
).
Here is an example of a non-correlated sub-query:
SELECT t1.customer_id, t1.total_sales FROM sony_sales t1 WHERE t1.total_sales > ( SELECT AVG(t2.total_sales) FROM sony_sales t2 );
The sub-query in this case is non-correlated, as it does not reference any columns from the outer query.
Correlated sub-queries are slower to execute, as they have to be re-evaluated for each row of the outer query, while non-correlated sub-queries are faster, as they only have to be executed once.
As a Sony data analyst, you are responsible for checking Sony product reviews and identifying top-rated products. You notice the star ratings are biased by the quantity of reviews a product receives. To solve this issue, you decide to implement a rating calculation method based on the IMDb Weighted Rating formula. Applying this formula requires using SQL math functions and arithmetic operators.
The IMDb weighted-rating formula is:
weighted rating (WR) = (v/(v+m))*R+(m/(v+m))*C
Here,
Using the below reviews
dataset, find the top three Sony products with the highest weighted ratings for the month of June 2022:
reviews
Example Input:review_id | user_id | submit_date | product_id | stars |
---|---|---|---|---|
6171 | 123 | 06/08/2022 00:00:00 | 50001 | 4 |
7802 | 265 | 06/10/2022 00:00:00 | 69852 | 4 |
5293 | 362 | 06/18/2022 00:00:00 | 50001 | 3 |
6352 | 192 | 07/26/2022 00:00:00 | 69852 | 3 |
4517 | 981 | 07/05/2022 00:00:00 | 69852 | 2 |
Consider the following PostgreSQL query:
WITH avg_reviews AS ( SELECT product_id, COUNT(review_id) as num_reviews, AVG(stars) as avg_rating, DATE_PART('month', submit_date) as month FROM reviews WHERE DATE_PART('month', submit_date) = 6 AND DATE_PART('year', submit_date) = 2022 GROUP BY product_id, month ), overall_avg AS ( SELECT AVG(stars) as total_avg_rating FROM reviews WHERE DATE_PART('month', submit_date) = 6 AND DATE_PART('year', submit_date) = 2022 ) SELECT product_id, (num_reviews / (num_reviews + 50)) * avg_rating + (50 / (num_reviews + 50)) * total_avg_rating as weighted_rating, month FROM avg_reviews, overall_avg ORDER BY weighted_rating DESC LIMIT 3;
The query calculates the weighted rating for each product for the month of June 2022. The WITH
clause defines two subqueries: avg_reviews
, which calculates the average rating and count of reviews for each product, and overall_avg
, which calculates the overall average rating for all products. The final SELECT
query calculates the weighted rating and orders the products by this rating, outputting the top three products.
To practice a very similar question try this interactive Amazon Average Review Ratings Question which is similar for calculating average rating or this Amazon Highest-Grossing Items Question which is similar for identifying top products.
The best way to prepare for a SQL interview, besides making sure you have strong SQL fundamentals, is to practice a ton of real SQL questions that were asked in recent job interviews. In addition to solving the earlier Sony SQL interview questions, you should also solve the 200+ SQL Interview Questions on DataLemur which come from companies like FAANG and tech startups.
Each SQL question has multiple hints, full answers and crucially, there's an online SQL coding environment so you can instantly run your SQL query answer and have it checked.
To prep for the Sony SQL interview it is also a great idea to solve SQL problems from other tech companies like:
In case your SQL skills are weak, don't worry about going right into solving questions – improve your SQL foundations with this interactive SQL tutorial.
This tutorial covers things like aggregate window functions and GROUP BY – both of these come up frequently in Sony SQL interviews.
For the Sony Data Science Interview, beyond writing SQL queries, the other types of questions which are covered:
The best way to prepare for Sony Data Science interviews is by reading Ace the Data Science Interview. The book's got: