At Rumble, SQL is often for querying and manipulating large datasets for analysis and developing and optimizing complex database schemas to improve system performance. So, it shouldn't surprise you that Rumble almost always asks SQL questions in interviews for Data Analytics, Data Science, and Data Engineering jobs.
Thus, to help you prep, we've collected 11 Rumble SQL interview questions – can you answer each one?
Rumble is an online platform that sells various digital products. The company categorizes the users who make purchases more than 50 times in a month as 'VIP users'. Write a SQL query to identify these 'VIP users' by analyzing the customer database for the month of August 2022.
Here's some sample data:
purchase_id | user_id | purchase_date | product_id | quantity |
---|---|---|---|---|
101 | 123 | 08/01/2022 00:00:00 | 50001 | 5 |
102 | 265 | 08/02/2022 00:00:00 | 69852 | 2 |
103 | 362 | 08/05/2022 00:00:00 | 50001 | 1 |
104 | 485 | 08/07/2022 00:00:00 | 69852 | 6 |
105 | 123 | 08/08/2022 00:00:00 | 50001 | 2 |
106 | 198 | 08/08/2022 00:00:00 | 69852 | 3 |
107 | 123 | 08/10/2022 00:00:00 | 50001 | 3 |
... | ... | ... | ... | ... |
Notice that this table has one row for each purchase.
The provided query uses the function to filter the purchases made in August 2022. The function is used in conjunction with the clause to count the purchases for each user and get the users that made more than 50 purchases. The needed details are returned by the statement.
To work on another SQL customer analytics question where you can code right in the browser and have your SQL solution automatically checked, try this Walmart SQL Interview Question:
Imagine you had a table of Rumble employee salary data. Write a SQL query to find the 2nd highest salary among all employees.
employee_id | salary |
---|---|
1 | 2500 |
2 | 800 |
3 | 1000 |
4 | 1200 |
second_highest_salary |
---|
1200 |
Write a SQL query for this interview question and run your code right in DataLemur's online SQL environment:
You can find a step-by-step solution here: 2nd Highest Salary.
Database views are created to provide customized, read-only versions of your data that you can query just like a regular table. So why even use one if they're just like a regular table?
Views are useful for creating a simplified version of your data for specific users, or for hiding sensitive data from certain users while still allowing them to access other data.
Rumble's product team wants a breakdown of each product's average rating per month for better understanding the user feedback trends. They want this to be reported in descending order of the average ratings.
Your task is to write a SQL query using PostgreSQL to find the average rating of each product for each month.
review_id | user_id | submit_date | product_id | stars |
---|---|---|---|---|
6171 | 123 | 2022-06-08 00:00:00 | 50001 | 4 |
7802 | 265 | 2022-06-10 00:00:00 | 69852 | 4 |
5293 | 362 | 2022-06-18 00:00:00 | 50001 | 3 |
6352 | 192 | 2022-07-26 00:00:00 | 69852 | 3 |
4517 | 981 | 2022-07-05 00:00:00 | 69852 | 2 |
mth | product | avg_stars |
---|---|---|
6 | 50001 | 3.50 |
6 | 69852 | 4.00 |
7 | 69852 | 2.50 |
This query first extracts the month from the column using the function for grouping. It then calculates the average rating for each group of month and product using the function, and then it orders the result in descending order of average rating using the clause.
To solve another window function question on DataLemur's free online SQL coding environment, try this Amazon SQL question asked in a BI Engineer interview:
A non-relational (NoSQL) database is any database that does not use the typical tabular format of rows and columns like in relational databases.
While knowing the four different types of NoSQL databases is beyond the scope of most Data Analyst & Data Science interviews, Data Engineers at Rumble should vaguely refresh these concepts:
Assume that you work for Rumble, a social sharing platform where users can share, like, and comment on posts. The posts can be of different types like text, image, or video. The company wishes to analyze user activities viz. number of shares, likes, and comments on posts by each user.
Design the database schema including what tables would your database have and what columns would belong to which tables.
Assuming that we would need 3 tables as follows, but you can craft a different schema as well if you feel appropriate.
user_id | username | join_date |
---|---|---|
1 | RumbleFan | 2022-06-01 |
2 | RumblePro | 2022-04-05 |
3 | RumbleStar | 2022-05-17 |
post_id | content_type | content | user_id | post_date |
---|---|---|---|---|
101 | text | Hello, World! | 1 | 2022-06-15 |
201 | image | my_pic.png | 2 | 2022-06-18 |
301 | video | my_video.mp4 | 3 | 2022-06-20 |
action_id | user_id | post_id | action_type | action_date |
---|---|---|---|---|
1001 | 2 | 101 | like | 2022-06-16 |
3001 | 3 | 201 | like | 2022-06-19 |
2001 | 1 | 301 | share | 2022-06-21 |
To solve, you can write the below PostgreSQL query:
This query will give an output similar to
username | total_actions | total_likes | total_shares |
---|---|---|---|
RumbleFan | 26 | 12 | 14 |
RumblePro | 45 | 39 | 6 |
RumbleStar | 33 | 25 | 8 |
The output can be used by the company to analyze individual user's activities.
A join in SQL allows you to retrieve data from multiple tables and combine it into a single set of results.
In SQL, there are four different types of JOINs. To demonstrate each kind, imagine you had two database tables, an advertising campaigns table which had information on Google Ads keywords and how much was bid for each keyword, and a sales table, which has data on how many products were sold and which Google Ads keyword drove that sale.
: An INNER JOIN retrieves rows from both tables where there is a match in the shared key or keys. For example, an INNER JOIN between the table and the table could be performed using the keyword column as the shared key. This would retrieve only the rows where the keyword in the table matches the keyword in the table.
: A LEFT JOIN retrieves all rows from the left table (in this case, the table) and any matching rows from the right table (the Sales table). If there is no match in the right table, values will be returned for the right table's columns.
: A RIGHT JOIN retrieves all rows from the right table (in this case, the Sales table) and any matching rows from the left table (the table). If there is no match in the left table, values will be returned for the left table's columns.
: A FULL OUTER JOIN retrieves all rows from both tables, regardless of whether there is a match in the shared key or keys. If there is no match, values will be returned for the columns of the non-matching table.
Given the tables and , write a SQL query to retrieve all customers who live in "New York" and have made a purchase of over $50 within the last month (including today). The table has a column which is in the format 'YYYY-MM-DD'.
customer_id | name | location |
---|---|---|
10001 | John Smith | New York |
10002 | Jane Doe | New York |
10003 | Tom Johnson | Los Angeles |
10004 | Emily Davis | New York |
10005 | Robert Brown | Chicago |
order_id | customer_id | purchase | date |
---|---|---|---|
20001 | 10001 | 70 | 2022-06-26 |
20002 | 10002 | 40 | 2022-06-25 |
20003 | 10002 | 30 | 2022-06-20 |
20004 | 10003 | 120 | 2022-04-15 |
20005 | 10004 | 55 | 2022-06-17 |
20006 | 10002 | 60 | 2022-06-30 |
The above SQL query first filters down the customers who live in New York. Then, it also checks if the customer_id is in the list of customer_ids who have made a purchase of over $50 in the last month. The subquery used creates a list of all such customer_ids from the table. By using the PostgreSQL "INTERVAL" keyword, we're able to easily get dates that are one month from the current date.
Rumble is a video-sharing platform. As an analyst for Rumble, you are tasked to find the average number of views for each video on the site.
To do this, use the and tables. The table contains a video_id column and a title column. Each record in the table has an id, a video_id, and a viewed_at date time column.
video_id | title |
---|---|
1 | "Cute Puppies Playing" |
2 | "Funny Cats Compilation" |
3 | "Amazing Nature Footage" |
id | video_id | viewed_at |
---|---|---|
1 | 1 | "2021-07-01 00:00:00" |
2 | 1 | "2021-07-01 00:01:00" |
3 | 1 | "2021-07-01 00:02:00" |
4 | 2 | "2021-07-01 00:00:00" |
5 | 2 | "2021-07-01 00:10:00" |
6 | 3 | "2021-07-01 00:00:00" |
This query joins the and tables on , and then uses the AVG function to calculate the average number of views per video. The result is grouped by and , giving you the average number of views for each video on the site.
To practice a very similar question try this interactive Facebook App Click-through Rate (CTR) Question which is similar for handling viewer-based calculations or this New York Times Laptop vs. Mobile Viewership Question which is similar for dealing with asset views from different sources.
A database index is a way to optimize the performance of a database by reducing the amount of data that needs to be searched to retrieve a record.
There are several types of indexes:
Rumble's platform hosts various online games. Given data about user ratings for those games, can you write a query that groups the games by their ID and calculates the average user rating per game?
To facilitate this, we've provided sample data in markdown-formatted tables:
rating_id | user_id | game_id | rating | submitted_date |
---|---|---|---|---|
7353 | 124 | 30001 | 3 | 2022-01-10 |
7804 | 266 | 30001 | 4 | 2022-01-20 |
5294 | 363 | 30001 | 4 | 2022-01-30 |
7952 | 193 | 30001 | 5 | 2022-01-31 |
4518 | 982 | 30002 | 4 | 2022-01-25 |
8157 | 148 | 30002 | 1 | 2022-01-25 |
8724 | 245 | 30002 | 3 | 2022-01-30 |
game_id | avg_rating |
---|---|
30001 | 4.0 |
30002 | 2.67 |
This PostgreSQL query uses the clause to partition the table by . This creates separate groups for each game. It then uses the function, an aggregate function, to calculate the arithmetic mean of for each group. The result is a table with each game's ID and its average user rating. The output table is rounded to two decimal places for readability.
The best way to prepare for a SQL interview, besides making sure you have strong SQL fundamentals, is to practice a ton of real SQL questions that were asked in recent job interviews.
Beyond just solving the earlier Rumble SQL interview questions, you should also solve the 200+ FAANG SQL Questions on DataLemur which come from companies like Netflix, Airbnb, and Amazon.
Each SQL question has hints to guide you, full answers and most importantly, there is an interactive SQL code editor so you can easily right in the browser your SQL query and have it graded.
To prep for the Rumble SQL interview you can also be helpful to solve interview questions from other tech companies like:
However, if your SQL foundations are weak, don't worry about diving straight into solving questions – refresh your SQL knowledge with this free SQL for Data Analytics course.
This tutorial covers SQL concepts such as aggregate functions and removing NULLs – both of these come up often during Rumble SQL interviews.
In addition to SQL query questions, the other question categories to practice for the Rumble Data Science Interview are:
I'm a bit biased, but I think the optimal way to prep for Rumble Data Science interviews is to read the book Ace the Data Science Interview.
The book has 201 data interview questions sourced from tech companies like Netflix, Google, & Airbnb. It also has a refresher on SQL, Product-Sense & ML. And finally it's helped thousands of people land their dream job in data, which is why it's earned 1000+ reviews on Amazon and a 4.5-star rating.