At Gen Digital (formerly Symantec), SQL is used across the company for analyzing customer behavior patterns and for analyzing cybersecurity data. Because of this, Gen Digital asks SQL problems during interviews for Data Science, Analytics, and & Data Engineering jobs.
As such, to help you practice for the Gen Digital SQL interview, we've curated 11 Gen Digital SQL interview questions – able to answer them all?
Gen Digital is an online marketplace for various digital products. Provided below are two tables: and . The table contains records of each registered user in Gen Digital with their respective user IDs and registration dates. The table holds records of all transactions made including user IDs, product IDs, and the dates when each transaction was made.
A power user in Gen Digital is defined as a user who makes at least 10 transactions per month. Write a SQL query to identify these power users for each month.
user_id | registration_date |
---|---|
1001 | 01/01/2022 00:00:00 |
1002 | 02/02/2022 00:00:00 |
1003 | 03/03/2022 00:00:00 |
1004 | 04/04/2022 00:00:00 |
1005 | 05/05/2022 00:00:00 |
transaction_id | user_id | transaction_date | product_id |
---|---|---|---|
2001 | 1001 | 06/05/2022 00:00:00 | 3001 |
2002 | 1002 | 06/05/2022 00:00:00 | 3002 |
2003 | 1002 | 06/06/2022 00:00:00 | 3003 |
2004 | 1002 | 06/07/2022 00:00:00 | 3004 |
2005 | 1002 | 06/08/2022 00:00:00 | 3005 |
... | ... | ... | ... |
Note: Assume there are at least 20 rows for each user per month in the transactions table.
In this query, we first use the function to extract the month from the transaction_date field. Next, we group the transactions by user per month and use the clause to filter the groups with transaction counts of 10 or more. Lastly, we sort the results by month and transaction count (in descending order) to neatly present our power users.
To practice a similar customer analytics SQL question where you can solve it right in the browser and have your SQL code instantly graded, try this Walmart Labs SQL Interview Question:
Gen Digital has a 'reviews' table where users' reviews for various digital products are stored. Each row in the table represents a single review by a user on a product. The reviews are graded in stars from 1 (worst) to 5 (best).
Your task is to create a SQL query that calculates the average review star rating per product per month.
review_id | user_id | submit_date | product_id | stars |
---|---|---|---|---|
1 | 123 | 01/11/2022 | A | 3 |
2 | 265 | 01/12/2022 | A | 4 |
3 | 362 | 01/13/2022 | B | 3 |
4 | 192 | 02/10/2022 | A | 2 |
5 | 981 | 02/14/2022 | B | 4 |
6 | 653 | 03/01/2022 | C | 5 |
7 | 472 | 03/02/2022 | C | 4 |
month | product | avg_rating |
---|---|---|
1 | A | 3.5 |
1 | B | 3 |
2 | A | 2 |
2 | B | 4 |
3 | C | 4.5 |
Here is the PostgreSQL query:
This query uses the PostgreSQL built-in function to get the month component from the column. It then groups by the obtained month together with to get the average star rating for each product per month. The result is ordered by month and in descending order of to rank the products by average stars within each month.
p.s. Window functions show up super frequently during SQL interviews, so practice the 27+ window function questions on DataLemur
Database views are created to provide customized, read-only versions of your data that you can query just like a regular table. So why even use one if they're just like a regular table?
Views are useful for creating a simplified version of your data for specific users, or for hiding sensitive data from certain users while still allowing them to access other data.
Gen Digital is an online electronics retailer. Currently, Gen Digital sells three types of products: computers, mobile devices and accessories.
Gen Digital wants to track the monthly sales and the remaining inventory for each product type. Using the and tables, write a SQL query that returns a report with the month, the product type, the total sales amount, and the inventory value at the end of that month.
Table definitions and sample data are as follows:
sale_id | product_type | sale_date | sale_amount |
---|---|---|---|
1345 | computers | 06/05/2022 | 1200.00 |
1896 | mobile device | 06/15/2022 | 800.00 |
2357 | accessories | 06/22/2022 | 50.00 |
2783 | computers | 07/10/2022 | 1300.00 |
3246 | mobile device | 07/25/2022 | 850.00 |
inventory_id | product_type | inventory_date | inventory_value |
---|---|---|---|
2013 | computers | 06/30/2022 | 5000.00 |
2564 | mobile device | 06/30/2022 | 4000.00 |
3089 | accessories | 06/30/2022 | 500.00 |
3598 | computers | 07/31/2022 | 4800.00 |
4117 | mobile device | 07/31/2022 | 3900.00 |
The query first calculates the monthly sales for each product type using the table. Then, it calculates the inventory value at the end of the month for each product type using the table. These two intermediate results are combined in the final result, presenting the month, the product type, the total sales amount, and the inventory value at the end of that month.
A database index is a data structure that improves the speed of data retrieval operations on a database table.
There are few different types of indexes that can be used in a database:
For a concrete example, say you had a table of Gen Digital customer payments with the following columns:
Here's what a clustered index on the column would look like:
A clustered index on the column would determine the physical order of the records in the table based on the . This means that the records with the earliest values would be stored together physically in the table, followed by records with later payment_date values. This speeds up queries that filter or sort the data based on the , as the records are already physically sorted in the table. For example, if you want to retrieve all the payments made in the month of June, the database can use the clustered index to quickly locate and retrieve the desired records without having to search through the entire table.
As a data analyst at Gen Digital, you are tasked to analyze the user behavior on the website. Gen Digital is interested to understand how much time each user typically spends on the website.
Example Input:
session_id | user_id | start_time | end_time |
---|---|---|---|
1 | 1001 | 2022-07-01 08:00:00 | 2022-07-01 08:30:00 |
2 | 1001 | 2022-07-01 09:00:00 | 2022-07-01 09:20:00 |
3 | 1002 | 2022-07-01 10:00:00 | 2022-07-01 10:30:00 |
4 | 1002 | 2022-07-01 11:00:00 | 2022-07-01 11:40:00 |
The question would be to calculate the average time spent per user on the website in hours.
user_id | avg_time_spent |
---|---|
1001 | 0.333 |
1002 | 0.5 |
This SQL query works by first calculating the time spent in each session for each user (end_time - start_time) in (seconds), which is then converted to hours by dividing by 3600. The function is then used to calculate the average time spent per user.
NOTE: This question assumes the and fields are timestamp data types.
To practice a very similar question try this interactive Snapchat Sending vs. Opening Snaps Question which is similar for analyzing time spent on activities or this Amazon Server Utilization Time Question which is similar for calculating total time from time inputs.
"One creative way is to use the window function ROW_NUMBER() and partition by whichver columns you are loooking for duplicates in. If any row has a row_number() more than 1, then it's a duplicate!
You could also use COUNT(DISTINCT col1) vs. COUNT(*) and see if they return different numbers, indicating the prescence of duplicates!
Gen Digital is a company that sells digital products. We have two tables: which provides a detailed description of each product, and which provides information about each sale, including the , and . The goal of the query is to find the average monthly sales per product.
product_id | product_name | product_type |
---|---|---|
1 | Digital Book: SQL for Beginners | Book |
2 | Online Course: Mastering PostegresSQL | Course |
3 | Digital Book: Advanced SQL Techniques | Book |
sale_id | product_id | sale_date | unit_price |
---|---|---|---|
1001 | 1 | 2022-02-10 | 49.99 |
1002 | 2 | 2022-02-15 | 199.99 |
1003 | 1 | 2022-03-02 | 49.99 |
1004 | 3 | 2022-03-20 | 59.99 |
1005 | 2 | 2022-03-25 | 199.99 |
1006 | 1 | 2022-03-30 | 49.99 |
The in the table is a timestamp in 'YYYY-MM-DD' format.
This PostgreSQL query joins the and tables on the field. It then groups the rows by both the month of the and , and calculates the average for each group. The TO_CHAR function is used to convert the sale date to a string in 'YYYY-MM' format. The result is ordered by the month and the average monthly sale in descending order.
Gen Digital would like to know which products are generating the most revenue. To determine this, they would like a SQL query that returns the total revenue generated by each product. To get this result, join the Customers table and the Purchases table.
Remember, the revenue generated by a product is the product price times the quantity sold.
Consider the tables below:
customer_id | name | |
---|---|---|
101 | Alice | alice@example.com |
102 | Bob | bob@example.com |
103 | Charlie | charlie@example.com |
104 | David | david@example.com |
105 | Erin | erin@example.com |
purchase_id | customer_id | product_id | quantity | price |
---|---|---|---|---|
1 | 101 | 201 | 2 | 15.00 |
2 | 104 | 202 | 3 | 30.00 |
3 | 103 | 203 | 1 | 45.00 |
4 | 105 | 202 | 1 | 30.00 |
5 | 102 | 201 | 5 | 15.00 |
6 | 101 | 203 | 2 | 45.00 |
This query joins the Customers table and the Purchases table using the common column . The function calculates the total revenue generated by each product. The clause groups the result by . The clause sorts the result in descending order of , showing the product that generates the highest revenue first.
Because joins come up so often during SQL interviews, try an interactive Spotify JOIN SQL question:
When using , only rows that are identical in both sets will be returned.
For a concrete example, say you were a Data Analyst supporting the Sales Analytics team at Gen Digital, and data on potential sales leads lived in both Salesforce and Hubspot CRMs. To write a query to analyze leads created before 2023 started, that show up in BOTH CRMs, you would use the command:
Using the following table of circles with their radii, calculate the area of each circle. Round the area to the nearest whole number. Additionally, display the power of the rounded area and the radius, and find the remainder when this result is divided by 7.
The area of a circle can be calculated using the formula:
Area = π*(radius)^2
You can use Pi=3.14 for your computations.
circle_id | radius |
---|---|
1 | 4 |
2 | 7 |
3 | 6 |
4 | 9 |
5 | 2 |
This query calculates the area of each circle using the given formula and rounds it up. Then it takes the power of the area and the radius, gets the modulo 7 of this power (using function), and displays all these results along with the .
The most similar questions to your SQL question about calculating the area of circles are:
To practice a very similar question try this interactive Amazon Maximize Prime Item Inventory Question which is similar for its need for calculation of area and division computations or this Alibaba Compressed Mean Question which is similar for its use of mathematical computations and rounding numbers.
The best way to prepare for a Gen Digital SQL interview is to practice, practice, practice. Beyond just solving the earlier Gen Digital SQL interview questions, you should also solve the 200+ DataLemur SQL Interview Questions which come from companies like Netflix, Airbnb, and Amazon.
Each DataLemur SQL question has hints to guide you, detailed solutions and best of all, there's an interactive SQL code editor so you can easily right in the browser your SQL query answer and have it executed.
To prep for the Gen Digital SQL interview it is also helpful to practice SQL problems from other tech companies like:
In case your SQL skills are weak, don't worry about going right into solving questions – improve your SQL foundations with this SQL interview tutorial.
This tutorial covers things like filtering data with WHERE and removing NULLs – both of these show up often in Gen Digital SQL interviews.
In addition to SQL query questions, the other topics to prepare for the Gen Digital Data Science Interview are:
To prepare for Gen Digital Data Science interviews read the book Ace the Data Science Interview because it's got: