At Chegg, SQL used for analyzing user behavior insights to improve services and managing the vast educational content database. Because of this, Chegg LOVES to ask SQL questions during interviews for Data Science and Data Engineering positions.
Thus, to help you study, here's 8 Chegg SQL interview questions – can you answer each one?
Chegg is interested in tracking how well their educational products are faring on a monthly basis. They're interested in looking at the performance of each product based on the average rating that it received in each month.
You have been given a "reviews" table with the following columns:
Write a SQL query to find the monthly average rating for each product. Round any averages to two digits after the decimal point.
review_id | user_id | submit_date | product_id | stars |
---|---|---|---|---|
6171 | 123 | 2022-06-08 | 50001 | 4 |
7802 | 265 | 2022-06-10 | 69852 | 4 |
5293 | 362 | 2022-06-18 | 50001 | 3 |
6352 | 192 | 2022-07-26 | 69852 | 3 |
4517 | 981 | 2022-07-05 | 69852 | 2 |
month | product | avg_stars |
---|---|---|
6 | 50001 | 3.50 |
6 | 69852 | 4.00 |
7 | 69852 | 2.50 |
The SQL query uses the function to get the month from the and function to find the average stars given by the users. The function is used to round off the average value to two decimal places. The result is then grouped by and .
To practice a similar window function question on DataLemur's free online SQL code editor, try this Google SQL Interview Question:
Assume you had a table of Chegg employee salaries. Write a SQL query to find the employees who earn more than their own manager.
employee_id | name | salary | department_id | manager_id |
---|---|---|---|---|
1 | Emma Thompson | 3800 | 1 | |
2 | Daniel Rodriguez | 2230 | 1 | 10 |
3 | Olivia Smith | 8000 | 1 | 8 |
4 | Noah Johnson | 6800 | 2 | 8 |
5 | Sophia Martinez | 1750 | 1 | 10 |
8 | William Davis | 7000 | 2 | NULL |
10 | James Anderson | 4000 | 1 | NULL |
employee_id | employee_name |
---|---|
3 | Olivia Smith |
This is the output because Olivia Smith earns 8,000, surpassing her manager, William Davis who earns $7,800.
Code your solution to this question and run your code right in DataLemur's online SQL environment:
First, we perform a SELF-JOIN where we treat the first table () as the managers' table and the second table () as the employees' table. Then we use a clause to filter the results, ensuring we only get employees whose salaries are higher than their manager's salary.
If the solution above is confusing, you can find a detailed solution with hints here: Employees Earning More Than Their Boss.
A cross-join, also known as a cartesian join, is like a mad scientist's laboratory experiment gone wild. It takes two tables and mixes them together to create a crazy new table with every possible combination of rows from the original tables.
Here's an example:
If you have 20 products and 10 colors, that's 200 rows right there! Cross-joins are great for generating all possible combinations, but they can also create really big tables if you're not careful. Just like a mad scientist, use your powers wisely!
For the online text-book rental company Chegg, an important part of analyzing their products (the text-books) is understanding how textbooks are reviewed by its users. For this question, you are given a table with the following columns: , , , , .
Write a SQL query to find each textbook's average star rating for each month. The output should include the month, the book (represented by the ), and the average star rating for that book for the given month.
review_id | user_id | submit_date | book_id | stars |
---|---|---|---|---|
1274 | 125 | 2019-01-05 | 70001 | 4 |
7832 | 230 | 2019-01-07 | 70001 | 5 |
9501 | 354 | 2019-01-15 | 70002 | 2 |
1250 | 195 | 2019-01-20 | 70002 | 3 |
3591 | 981 | 2019-02-04 | 70001 | 4 |
6503 | 137 | 2019-02-08 | 70002 | 3 |
9235 | 502 | 2019-02-14 | 70001 | 5 |
7200 | 824 | 2019-02-20 | 70002 | 4 |
mth | book_id | avg_stars |
---|---|---|
1 | 70001 | 4.50 |
1 | 70002 | 2.50 |
2 | 70001 | 4.50 |
2 | 70002 | 3.50 |
The above query extracts the month from the using the function and groups the result by this month and the book id. For each grouping (i.e., for each month and book), it calculates the average rating using the function rounded to 2 decimal places. And the result is sorted by month in ascending order and average stars in the descending order which means the textbooks with the higher ratings will appear in the front for each month.
A database index is a data structure that improves the speed of data retrieval operations on a database table.
There are few different types of indexes that can be used in a database:
For a concrete example, say you had a table of Chegg customer payments with the following columns:
Here's what a clustered index on the column would look like:
A clustered index on the column would determine the physical order of the records in the table based on the . This means that the records with the earliest values would be stored together physically in the table, followed by records with later payment_date values. This speeds up queries that filter or sort the data based on the , as the records are already physically sorted in the table. For example, if you want to retrieve all the payments made in the month of June, the database can use the clustered index to quickly locate and retrieve the desired records without having to search through the entire table.
As a Data Analyst at Chegg, you are asked to gather data on users who have signed up to the website. Specifically, the Product Team wants to know how many users have an email from a specific domain, like 'gmail.com'. The users table has a column 'user_email' containing the user's email address.
You may assume table structure as follows:
user_id | username | sign_up_date | user_email |
---|---|---|---|
101 | JohnD23 | 04/03/2020 | john.doe@gmail.com |
102 | JaneD33 | 05/10/2021 | jane_doe@yahoo.com |
103 | TomW12 | 02/23/2020 | twilliams@hotmail.com |
104 | SaraJ21 | 06/14/2022 | sjohnson@gmail.com |
105 | MikeG45 | 11/18/2021 | gmichael@yahoo.com |
Write a SQL query to find out the total number of users signed up with the email domain 'gmail.com'.
This query filters the 'users' table where the 'user_email' contains 'gmail.com', it counts the total number of records that satisfy this criterion. The '%' sign is a wildcard character that matches any sequence of characters in the email. Before 'gmail.com', it ensures that any email containing 'gmail.com' regardless of what comes before it, will be selected.
"One creative way is to use the window function ROW_NUMBER() and partition by whichver columns you are loooking for duplicates in. If any row has a row_number() more than 1, then it's a duplicate!
You could also use COUNT(DISTINCT col1) vs. COUNT(*) and see if they return different numbers, indicating the prescence of duplicates!
Consider two tables in Chegg's database, and . The table contains the information about the customers including their ids, names, and location. The table contains all the information related to customer orders, including OrderID, CustomerID, ProductName, Quantity, and OrderDate.
Write a SQL query to find the average monthly expenditure of the customers by joining the Customers and Orders tables. The output should give the Customer's name, Month, and their average monthly expenditure.
CustomerID | Name | Location |
---|---|---|
1 | John Doe | California |
2 | Jane Smith | Texas |
3 | Mike Davis | New York |
OrderID | CustomerID | ProductName | Quantity | Price | OrderDate |
---|---|---|---|---|---|
111 | 1 | TestBook1 | 2 | 10.00 | 06/10/2022 |
222 | 2 | TestBook2 | 1 | 15.00 | 06/12/2022 |
333 | 1 | TestBook3 | 1 | 25.00 | 07/15/2022 |
444 | 3 | TestBook4 | 3 | 20.00 | 07/25/2022 |
555 | 2 | TestBook3 | 2 | 25.00 | 08/05/2022 |
This SQL query joins the Orders and Customers tables on the CustomerID. It then groups the data by the customer name and month (extracted from the Order date). Finally, it calculates the average monthly expenditure by multiplying the quantity of products ordered with their price, and then taking the average of this total cost.
Because joins come up routinely during SQL interviews, practice an interactive Spotify JOIN SQL question:
The best way to prepare for a SQL interview, besides making sure you have strong SQL fundamentals, is to practice a ton of real SQL questions that were asked in recent job interviews. Beyond just solving the above Chegg SQL interview questions, you should also solve the 200+ DataLemur interview questions which come from companies like Microsoft, Google, and Facebook.
Each DataLemur SQL question has multiple hints, detailed solutions and crucially, there's an interactive coding environment so you can instantly run your SQL query answer and have it executed.
To prep for the Chegg SQL interview it is also a great idea to solve SQL questions from other tech companies like:
But if your SQL query skills are weak, forget about going right into solving questions – go learn SQL with this SQL tutorial for Data Scientists & Analysts.
This tutorial covers SQL concepts such as window functions and handling missing data (NULLs) – both of which show up often during Chegg SQL assessments.
Besides SQL interview questions, the other topics to prepare for the Chegg Data Science Interview include:
I'm a tad biased, but I think the best way to study for Chegg Data Science interviews is to read the book I wrote: Ace the Data Science Interview.
The book solves 201 interview questions sourced from FAANG, tech startups, and Wall Street. It also has a refresher on SQL, Product-Sense & ML. And finally it's helped thousands of people land their dream job in data, which is why it's got over 1000+ 5-star reviews on Amazon.