Docebo employees use SQL often for querying and managing educational data for personalized learning experiences, and analyzing user engagement metrics for platform optimization. That's the reason behind why Docebo almost always evaluates jobseekers on SQL query questions in interviews for Data Science, Data Engineering and Data Analytics jobs.
Thus, to help you practice, here's 10 Docebo SQL interview questions – can you answer each one?
Docebo is a software company that offers a cloud-based Learning Management System (LMS) for e-learning. Consider that they have two tables: - that contains the information about their users and - that records when users log into their LMS and the duration they spend on each session. Where a session is regarded as very important if its duration is above 120 minutes.
The structure of the tables are as follows:
user_id | user_name | country | |
---|---|---|---|
100 | Mike | USA | mike@example.com |
101 | Jane | Canada | jane@example.com |
102 | Ahmed | Egypt | ahmed@example.com |
103 | Maria | Spain | maria@example.com |
104 | Ming | China | ming@example.com |
activity_id | user_id | activity_date | duration |
---|---|---|---|
201 | 100 | 2022-07-01 | 128 |
202 | 100 | 2022-07-08 | 75 |
203 | 101 | 2022-07-02 | 138 |
204 | 103 | 2022-07-03 | 121 |
205 | 104 | 2022-07-04 | 149 |
Write a query to find the , and of the power users, considering a power user as a user who performed a very important activity more than 3 times in the last 30 days.
In this query, we first create a subquery to filter the table for the activities that are considered important (duration > 120) and occurred in the last 30 days, grouping by and counting the number of such important activities. Then we join this result with the table to get the and .
Note that the syntax is specific to PostgreSQL. Other SQL databases may vary slightly in their syntax for date arithmetic. The query assumes that there is a system date we can refer to.
Also, remember to clarify the assumptions you made (like the definition of important activity and power user, the timeframe for the last 30 days) with the interviewer.
To solve a similar power-user data analysis problem question on DataLemur's free interactive coding environment, try this recently asked Microsoft SQL interview question:
Imagine you had a table of Docebo employee salaries. Write a SQL query to find the employees who earn more than their direct manager.
employee_id | name | salary | department_id | manager_id |
---|---|---|---|---|
1 | Emma Thompson | 3800 | 1 | |
2 | Daniel Rodriguez | 2230 | 1 | 10 |
3 | Olivia Smith | 8000 | 1 | 8 |
4 | Noah Johnson | 6800 | 2 | 8 |
5 | Sophia Martinez | 1750 | 1 | 10 |
8 | William Davis | 7000 | 2 | NULL |
10 | James Anderson | 4000 | 1 | NULL |
employee_id | employee_name |
---|---|
3 | Olivia Smith |
This is the output because Olivia Smith earns 8,000, which is more than her manager William Davis who earns 7,800.
Solve this question interactively on DataLemur:
First, we perform a SELF-JOIN where we treat the first table () as the managers' table and the second table () as the employees' table. Then we use a clause to filter the results, ensuring we only get employees whose salaries are higher than their manager's salary.
If the solution above is hard to understand, you can find a detailed solution with hints here: Employee Salaries Higher Than Their Manager.
{#Question-3}
A database index is a data structure that provides a quick lookup of data in a column or columns of a table.
There are several types of indexes that can be used in a database:
In Docebo, you're frequently required to analyze course engagement and performance. An important metric that we track is the average course ratings given by our users on a monthly basis.
Suppose you have a reviews table that records every review submitted by users for different courses. For this question, you are asked to write a SQL query that calculates the average rating given to each course per month. The reviews table has the following schema:
review_id | user_id | submit_date | course_id | stars |
---|---|---|---|---|
6171 | 123 | 2022-06-08 | 50001 | 4 |
7802 | 265 | 2022-06-10 | 69852 | 4 |
5293 | 362 | 2022-06-18 | 50001 | 3 |
6352 | 192 | 2022-07-26 | 69852 | 3 |
4517 | 981 | 2022-07-05 | 69852 | 2 |
This query works by grouping the reviews by month and course. For each group, it calculates the average star rating, rounding to two decimal places, and lists it along with the month and the course ID in the result set. The final result set is ordered by month and course in ascending order.
The displayed measurement is the average rating of each course per month, allowing us to track the performance and reception of our online courses over time.
Pro Tip: Window functions are a frequent SQL interview topic, so practice all the window function problems on DataLemur
Cross join and natural join are like two sides of a coin in the world of SQL.
Cross joins is like the wild and reckless cousin who creates a giant new table by combining every row from table A with every row from table B, no questions asked, no common key needed.
Natural joins are like the more refined and selective cousin who only combines rows from multiple tables if they have something in common (i.e., common columns/keys).
While cross join doesn't discriminate and will create a massive table if given the chance, natural join is more selective and only returns a table with the number of rows equal to the number of matching rows in the input tables. So, choose your JOIN wisely!
Docebo is a learning management system company providing a platform for training needs. Let's assume you are given two tables; one is that details the company's global users and the other is which reveals when a user accessed a course. Your task is to determine the most accessed courses for each month.
The table is structured as follows:
user_id | country | signup_date |
---|---|---|
1001 | Canada | 2022-01-01 |
1002 | USA | 2021-12-01 |
1003 | UK | 2022-03-01 |
1004 | Canada | 2022-01-15 |
1005 | USA | 2021-12-15 |
The table includes these columns:
access_id | user_id | access_date | course_id |
---|---|---|---|
8001 | 1001 | 2022-07-01 | 4001 |
8002 | 1002 | 2022-06-01 | 4001 |
8003 | 1003 | 2022-07-02 | 4002 |
8004 | 1001 | 2022-07-03 | 4001 |
8005 | 1002 | 2022-07-04 | 4003 |
You can find the most accessed course for each month from the table using the following SQL command:
This query first groups the access records by month and course_id, and then counts the number of accesses for each unique (month, course_id). The order by command then lists down the results in the order of months and in descending order of access_count for each month.
The expected output would then be a table showing the most accessed course for each month:
month | course_id | access_count |
---|---|---|
2022-06 | 4001 | 1 |
2022-07 | 4001 | 2 |
2022-07 | 4002 | 1 |
2022-07 | 4003 | 1 |
Keep in mind that this output assumes the example input from the table above. Your actual output will depend on the actual data in your table.
A transaction is a one or more SQL commands which are executed as a singular unit if the transaction -commits- (or no execution hapens if the transaction -aborts-).
For transactions, a DBMS is supposed to enforce the follwing ACID properties: Atomicity, Consistency, Isolation, & Durability.
Here's what each one means:
As you can see, it's pretty important for the multiple databases where Docebo store's it's data to be ACID-compliant!
Consider the marketing team at Docebo, an online learning platform. They have been running a series of digital advertisements and would like to understand the click-through-rate (CTR) for each ad campaign. The CTR is calculated as the number of users who click on the ad divided by the number of users who view the ad.
For this scenario, you have the following tables:
view_id | user_id | ad_id | view_date |
---|---|---|---|
1234 | 567 | 101 | 06/01/2022 |
2345 | 678 | 101 | 06/01/2022 |
3456 | 789 | 102 | 06/01/2022 |
4567 | 890 | 102 | 06/02/2022 |
5678 | 901 | 103 | 06/02/2022 |
click_id | user_id | ad_id | click_date |
---|---|---|---|
4321 | 567 | 101 | 06/01/2022 |
5432 | 678 | 101 | 06/01/2022 |
6543 | 901 | 103 | 06/02/2022 |
Write a PostgreSQL query to determine the CTR for each ad.
Here is one way to do it:
This script first aggregates total views and clicks for each ad in the 'ad_views_count' and 'ad_clicks_count' tables respectively. By then joining these two tables we can calculate the click-through rate. If an ad was seen but not clicked on, this query will not include it in the final results. Thus, click_through_rate for such ads will be read as NULL. To handle this, coalesce can be used to display such rates as 0 instead of NULL.
To solve a related SQL interview question on DataLemur's free online SQL code editor, solve this SQL interview question asked by Facebook:
Given a database of customer records, write a SQL query to find all customers whose last names begin with 'M' and live in 'Toronto'. Use the LIKE keyword to filter the customer names.
customer_id | first_name | last_name | city | country | join_date |
---|---|---|---|---|---|
101 | Alex | Morgan | Toronto | Canada | 2022-01-01 |
102 | Bruce | Miller | New York | USA | 2022-01-03 |
103 | Charles | Muller | Berlin | Germany | 2022-01-05 |
104 | David | Macleod | Toronto | Canada | 2022-01-07 |
105 | Elizabeth | Morris | Vancouver | Canada | 2022-01-09 |
customer_id | first_name | last_name | city | country | join_date |
---|---|---|---|---|---|
101 | Alex | Morgan | Toronto | Canada | 2022-01-01 |
104 | David | Macleod | Toronto | Canada | 2022-01-07 |
This SQL query filters the table for entries where the begins with 'M' (achieved by using 'M%' in the LIKE statement) and the is equal to 'Toronto'. It returns the full records of the matching customers.
No, in almost all cases, and for all practical purposes, and do NOT produce the same result.
While both are similar, in that they combine two tables, you can think of joins as increasing the width of the resulting table (you'll have more columns in the result set for a left/inner/right join), whereas a union is used to combine rows which increases the height of the result set but keeps the column count the same.
Assuming that you've already got basic SQL skills, the next best tip we have to prepare for the Docebo SQL interview is to solve as many practice SQL interview questions as you can! In addition to solving the above Docebo SQL interview questions, you should also solve the 200+ DataLemur interview questions which come from companies like Google, Facebook, Microsoft and Amazon.
Each exercise has multiple hints, fully explained answers along with a discussion board to see how others solved it and best of all, there's an interactive coding environment so you can right in the browser run your SQL query answer and have it graded.
To prep for the Docebo SQL interview you can also be helpful to practice interview questions from other tech companies like:
But if your SQL query skills are weak, don't worry about going right into solving questions – go learn SQL with this free SQL tutorial.
This tutorial covers SQL concepts such as CASE/WHEN statements and handling NULLs in SQL – both of which show up often in SQL interviews at Docebo.
Besides SQL interview questions, the other types of questions covered in the Docebo Data Science Interview include:
I'm a bit biased, but I think the optimal way to study for Docebo Data Science interviews is to read the book Ace the Data Science Interview.
The book solves 201 data interview questions sourced from Facebook, Google & startups. It also has a crash course on Product Analytics, SQL & ML. And finally it's vouched for by the data community, which is why it's got over 1000+ 5-star reviews on Amazon.