At Instructure, SQL is often used for analyzing educational data trends, and managing the datasets within their cloud-based learning management systems. That's why Instructure frequently asks SQL coding questions during interviews for Data Science and Data Engineering positions.
To help you ace the Instructure SQL interview, we've curated 10 Instructure SQL interview questions – can you answer each one?
Instructure is a technology company that specializes in learning management systems, and their flagship product is Canvas. In Canvas, students can enroll in courses, and courses have various enrollments statuses like "active", "completed", "dropped", etc.
Suppose you are given a dataset with the following schema:
course_id | course_name |
---|---|
101 | Math 101 |
102 | Physics 101 |
103 | Chemistry 101 |
enrollment_id | student_id | course_id | enroll_date | status |
---|---|---|---|---|
1 | 101 | 101 | 2022-01-01 | active |
2 | 102 | 102 | 2022-01-02 | active |
3 | 103 | 101 | 2022-02-01 | completed |
4 | 104 | 103 | 2022-03-01 | active |
5 | 105 | 101 | 2022-03-02 | active |
The problem is to write a PostgreSQL query to find the total enrollments for each course for each month. The month should be based on the and the count should include all enrollment statuses.
In this query, we are using a Window function to count the enrollments per course per month. The Window function is . This function counts the number of rows that have the same course_id and enroll_date (formatted as YYYY-MM for monthly granularity). The function is used to format the enroll_date to a monthly granularity. We then join the courses table to get the course names.
For more window function practice, solve this Uber SQL Interview Question on DataLemur's online SQL code editor:
Instructure is a technology company that specializes in developing learning management systems (LMS). Assume that you are given a database of their users, memberships, and courses. The tables , , and look like this:
user_id | user_name | created_at |
---|---|---|
001 | UserA | 2021-01-05 12:00:00 |
002 | UserB | 2021-05-12 08:00:00 |
003 | UserC | 2022-02-22 13:00:00 |
004 | UserD | 2022-07-07 09:00:00 |
005 | UserE | 2022-08-11 15:00:00 |
user_id | membership_type |
---|---|
001 | Free |
002 | Premium |
003 | Premium |
004 | Free |
005 | Premium |
course_id | course_name | user_id | completion_date |
---|---|---|---|
1001 | Math101 | 001 | 2021-06-15 12:00:00 |
1002 | History101 | 002 | 2021-08-12 18:00:00 |
1003 | Science101 | 003 | 2022-03-09 14:00:00 |
1004 | English101 | 004 | 2022-09-07 13:00:00 |
1005 | Music101 | 005 | Still in Progress |
Your task is to retrieve the details of the "Premium" users who have completed at least one course in 2022. Arrange the output starting from the user who has most recently completed a course.
This SQL query will return the user_id, user_name, course_name, and completion_date of all Premium users from the , , and tables who have completed at least one course in 2022. The results are sorted in descending order based on the completion_date, showing the most recent course completions first. If a user completed more than one course in 2022, they would appear in the output for each course completed.
In SQL, the operator combines the result of two or more SELECT statements into a single result set. Note that each statement within the UNION must have the same number of columns and the columns must have similar data types. The operator removes duplicates from the final result set. If you want to include duplicates in the final result set, you can use the operator instead.
Here's an example of using the UNION operator to combine the results of two SELECT statements that retrieve data from tables of Instructure's Facebook ads and their Google ads:
This statement would return a combined result set of both Facebook and Google ads that have a spend greater than $500.
Instructure is a company that offers a learning management system, called "Canvas." Each course on Canvas can be rated by students on a 1-5 scale. As a data scientist at Instructure, you are asked to find the average rating of all the courses.
Suppose we have a table and a table. The table contains , , and . The table contains , , , and . Write a SQL query to calculate the average score for each course.
course_id | name | creation_date |
---|---|---|
101 | Introduction to SQL | 2022-01-12 |
102 | Data Science with Python | 2022-01-20 |
103 | Machine Learning Basics | 2022-02-02 |
rating_id | course_id | student_id | score |
---|---|---|---|
1 | 101 | 1 | 5 |
2 | 101 | 2 | 4 |
3 | 102 | 1 | 3 |
4 | 102 | 2 | 4 |
5 | 102 | 3 | 4 |
6 | 103 | 1 | 4 |
7 | 103 | 2 | 5 |
This query first joins the and tables on . Then, for each and , it calculates the average rating score. The result will be a table where each row contains a , , and the corresponding .
A foreign key is a column or group of columns in a table that refers to the primary key in another table. The foreign key constraint helps maintain referential integrity between the two tables.
For example, let's look at the Instructure sales database:
instructure_sales:
+------------+------------+------------+------------+ | order_id | product_id | customer_id| quantity | +------------+------------+------------+------------+ | 1 | 222 | 1 | 2 | | 2 | 333 | 1 | 1 | | 3 | 444 | 2 | 3 | | 4 | 555 | 3 | 1 | +------------+------------+------------+------------+
In this table, and could both be foreign keys. They reference the primary keys of other tables, such as a Products table and a Customers table, respectively. This establishes a relationship between the table and the other tables, such that each row in the sales database corresponds to a specific product and a specific customer.
"
Instructure, an educational technology company, offers several products for which they track user interactions. One key metric for Instructure is the click-through to purchase conversion rate, i.e., the percentage of users who add a product to the cart after viewing it, and then proceed to make a purchase.
You are given two tables, and . contains information about all the views on their products by unique users, including the and the of the viewer. The table then lists all of the purchases that have been made, again including the and the of the purchaser.
Calculate the percentage conversion rate for each product from to .
view_id | user_id | view_date | product_id |
---|---|---|---|
101 | 113 | 06/08/2022 00:00:00 | 50001 |
102 | 265 | 06/10/2022 00:00:00 | 69852 |
103 | 265 | 06/18/2022 00:00:00 | 50001 |
104 | 192 | 07/26/2022 00:00:00 | 69852 |
105 | 981 | 07/05/2022 00:00:00 | 69852 |
purchase_id | user_id | purchase_date | product_id |
---|---|---|---|
201 | 113 | 06/08/2022 00:30:00 | 50001 |
202 | 362 | 06/20/2022 10:00:00 | 50001 |
203 | 192 | 07/26/2022 02:00:00 | 69852 |
204 | 981 | 07/05/2022 14:00:00 | 69852 |
205 | 123 | 07/10/2022 18:00:00 | 50001 |
This query first connects the and tables where the and match in both tables. It then counts the unique number of in both tables for each . Finally, it calculates the conversion rate as the percentage of unique purchaser to unique viewer for each in Instructure.
To practice another question about calculating rates, solve this SQL interview question from TikTok on DataLemur's interactive SQL code editor:
While both types of databases are used to store data (obviously), there's some key differences in how they store and organize data.
Relational databases try to represent the world into neat little tables, with rows and columns. Non-relational (NoSQL) databases use a variety of data models to represent data, including document, key-value, columnar, and graph storage formats.
While the exact types of NoSQL databases is beyond the scope of a Data Analyst and Data Scientist SQL interview at Instructure, it's good to know that companies generally choose to use NoSQL databases:
Instructure is a company that specializes in learning management systems, such as "Canvas", which are often used in educational settings. Each course in a university department can have many students, and students can leave ratings for these courses. For this question, assume that there are two tables, (which includes , , ) and (which includes , , , ).
Your task is to write an SQL query that retrieves the average rating per department. Your output should include the and the .
Please use the following sample data to test your query:
course_id | department_id | course_name |
---|---|---|
1 | 1 | History 101 |
2 | 1 | History 102 |
3 | 2 | Mathematics 101 |
4 | 2 | Mathematics 102 |
5 | 3 | Philosophy 101 |
rating_id | course_id | student_id | rating |
---|---|---|---|
1 | 1 | 23 | 4 |
2 | 1 | 34 | 3 |
3 | 2 | 45 | 5 |
4 | 3 | 56 | 2 |
5 | 4 | 67 | 3 |
6 | 5 | 78 | 5 |
This query first joins the table with the table based on the . Then it groups the result by the and calculates the average for each department.
Consider the company 'Instructure'. The company wishes to understand more about its revenue per user by incorporating user purchasing data. Given tables , and , you are assigned to find the average revenue per user in 2021.
Presence of data is as follows:
user_id | username | created_at |
---|---|---|
1 | alice | 2020-12-30 |
2 | bob | 2021-03-23 |
3 | charlie | 2021-06-11 |
4 | david | 2021-09-02 |
order_id | user_id | order_date |
---|---|---|
100 | 1 | 2021-02-23 |
101 | 2 | 2021-04-20 |
102 | 1 | 2021-08-01 |
103 | 3 | 2021-07-11 |
order_id | item_price | quantity |
---|---|---|
100 | 10 | 2 |
101 | 15 | 1 |
102 | 20 | 3 |
103 | 25 | 4 |
The following SQL query can be used to calculate the average revenue per user in 2021:
The above query does an inner join on , and tables by linking in table with in table and then linking in table with in table. The clause filters out the orders that are not from the year 2021. The clause groups the result by and the function computes the average revenue per user. We multiple and to get the total item revenue in each order.
Because joins come up routinely during SQL interviews, take a stab at this SQL join question from Spotify:
A DBMS (database management system), in order to ensure transactions are relaible and don't ruin the integrity of the data, tries to mantain the following ACID properties: Atomicity, Consistency, Isolation, and Durability.
To make this concept more concrete, here is what each of the ACID properties would mean in the context of banking transactions:
Assuming that you've already got basic SQL skills, the next best tip we have to prepare for the Instructure SQL interview is to solve as many practice SQL interview questions as you can! Beyond just solving the earlier Instructure SQL interview questions, you should also solve the 200+ SQL questions from real Data Science & Analytics interviews which come from companies like Google, Microsoft and Silicon Valley startups.
Each DataLemur SQL question has multiple hints, full answers and best of all, there's an interactive coding environment so you can instantly run your query and have it executed.
To prep for the Instructure SQL interview you can also be wise to practice interview questions from other tech companies like:
In case your SQL foundations are weak, don't worry about jumping right into solving questions – refresh your SQL knowledge with this SQL interview tutorial.
This tutorial covers SQL concepts such as filtering strings based on patterns and sorting data with ORDER BY – both of these come up routinely during Instructure interviews.
In addition to SQL interview questions, the other types of problems covered in the Instructure Data Science Interview are:
To prepare for Instructure Data Science interviews read the book Ace the Data Science Interview because it's got: