10 GitLab SQL Interview Questions (Updated 2025)

(Ex-Facebook & Best-Selling Data Science Author)

Updated on

March 1, 2025

At GitLab, SQL is used frequently for analyzing product usage data to improve source-control software features. Because they are an open-source company, the put out neat SQL style docs for the world to learn from. So when it comes to the technical interview process, it shouldn't surpirse you that GitLab LOVES to ask SQL coding questions during interviews for Data Science, Data Engineering and Data Analytics jobs.

Thus, to help you ace the GitLab SQL interview, here’s 10 GitLab SQL interview questions – can you answer each one?

10 GitLab SQL Interview Questions

SQL Question 1: Calculate the Monthly Active Users

Assume GitLab keeps track of its users' activities in a table . Each time a user performs an important activity (e.g. commit code, create merge request, or push code), a new row is added to the table. Write a SQL query to calculate the number of monthly active users (MAU), where an active user in a given month is defined as a user who has performed at least one activity in that month.

Here is some sample data for :

Example Input:

activity_id	user_id	activity_date	activity_type
101	123	2022-08-06 00:00:00	commit
102	265	2022-10-06 00:00:00	merge_request
103	362	2022-11-18 00:00:00	commit
104	192	2023-02-26 00:00:00	push
105	981	2023-05-05 00:00:00	merge_request

The output should have one row for each month (with the month formatted as yyyy-mm), and include the total count of active users for that month.

Example Output:

month	active_users
2022-08	1
2022-10	1
2022-11	1
2023-02	1
2023-05	1

Answer:

This SQL query first creates a column for the month (formatted as 'yyyy-mm') of each activity using the function. It then groups by this month column and in each group, it counts the unique users (). The result is ordered by the month.

p.s. Window functions show up super frequently during SQL interviews, so practice the 27+ window function questions on DataLemur

SQL Question 2: Filtering GitLab User Activities

As an analyst at GitLab, one of your task is to monitor user activity on the platform. Create an SQL query that filters GitLab users who had logged in within the last 30 days but have not used any GitLab features within the same period.

Please consider the following markdown-formatted tables for this task:

Example Input:

user_id	username	login_date
6171	user123	08/28/2022
7802	user265	09/08/2022
5293	user362	07/20/2022
6352	user192	08/30/2022
4517	user981	07/12/2022

Example Input:

activity_id	user_id	feature_used	date
8521	6171	Repository Creation	08/15/2022
3102	7802	Merge Requests	09/08/2022
4209	5293	CI/CD Pipeline	05/18/2022
7305	6352	Issue Tracker	07/26/2022
9701	4517	Repository Creation	07/05/2022

Answer:

This query combines data from the and tables using a operation. It filters the users who have logged in in the last 30 days using . Among these users, it filters those who have not used any GitLab features within the same 30-day period by checking that the user activity date is either more than 30 days ago or there is null activity (), indicating no activity from the user.

SQL Question 3: What is denormalization?

Denormalization is the practice of altering a database schema in a way that breaks the normalization rules (1NF, 2NF, 3NF, etc.).

This is typically done to improve the performance of the database when it is being used for online analytics processing (OLAP), rather than online transaction processing (OLTP) use cases.

Denormalization can improve query performance by reducing the number of expensive joins required to retrieve data, but it comes with it's own drawbacks too. By adding redundant columns, you incur more data storage costs, and there's need for more complex update and delete operations in order to maintain data integrity across all the duplicated data. Thus, it's important to carefully consider the trade-offs involved before implementing denormalization.

GitLab SQL Interview Questions

SQL Question 4: Calculating Average Merge Request Time in GitLab

GitLab uses the concept of "merge requests" for proposing changes from one branch to another. In a typical setup, developers make changes in their local branches and then create a merge request to propose those changes into a shared branch (like 'master' or 'main').

Assuming we have a table that records each merge request, including its , the (the developer who proposed the change), the (the date and time when the merge request was created), the (the date and time when the merge request was accepted and merged), and the (the id of the branch into which the changes are proposed), can you write a SQL query to find the average time (in hours) it takes for a merge request to be accepted for each branch?

Example Input:

id	requester_id	creation_date	merge_date	branch_id
1	101	2022-06-01 10:00:00	2022-06-01 12:00:00	5001
2	102	2022-06-02 09:00:00	2022-06-02 11:30:00	5001
3	101	2022-07-01 14:00:00	2022-07-02 10:00:00	5002
4	103	2022-07-02 09:00:00	2022-07-03 09:30:00	5002
5	104	2022-07-03 11:00:00	NULL	5002

Example Output:

branch_id	avg_merge_time_in_hours
5001	2.25
5002	20.25

Answer:

The above PostgreSQL query first calculates the duration between and in hours using the function for each row where is not null. Then it takes the average of these durations grouped by to get the average merge time in hours for each branch. The clause is used to exclude merge requests that have not been accepted yet (i.e., is null). These are not included in the average merge time calculation. The clause is required because we want the average for each individual branch (not the overall average).

To practice a very similar question try this interactive Amazon Server Utilization Time Question which is similar for calculating total time or this Twitter Tweets' Rolling Averages Question which is similar for using window function.

SQL Question 5: What sets the 'BETWEEN' and 'IN' operators apart?

The operator is used to select rows that fall within a certain range of values, while the operator is used to select rows that match values in a specified list.

For example, suppose you are a data analyst at GitLab and have a table of advertising campaign data. To find campaigns with a spend between $1k and$ 5k, you could use BETWEEN:

To find advertising campaigns that were video and image based (as opposed to text or billboard ads), you could use the operator:

SQL Question 6: Analyze Click-Through-Rate for GitLab

In GitLab, there are two important touchpoints users go through. The first is visiting the home page (), and the second is clicking on a particular repository ().

The click-through rate (CTR) is calculated as the number of repository clicks divided by the number of page views. Your task is to write an SQL query to determine the CTR for each day.

Example Input:

view_id	user_id	view_date	page_id
12345	567	2022/06/08	0101
12346	568	2022/06/08	0101
12347	569	2022/06/09	0101
12348	570	2022/06/09	0101
12349	571	2022/06/10	0101

Example Input:

click_id	user_id	click_date	repo_id
56789	567	2022/06/08	0202
56790	568	2022/06/08	0101
56791	569	2022/06/09	0101
56792	560	2022/06/10	0202
56793	561	2022/06/10	0101

Answer:

In this answer, we first form two Common Table Expressions (CTE) named and . Both CTEs perform a GROUP BY operation to sum up the total page views and repository clicks for each date.

Next, we join these two CTEs on the date and use the formula to calculate the CTR (number of clicks divided by the number of page views). We use the NULLIF function to avoid division by zero errors if there were no page views on a particular date. This will result in a table with the date and the CTR for each date.

To practice a similar SQL problem on DataLemur's free online SQL code editor, attempt this Facebook SQL Interview question:

SQL Question 7: How do cross joins and natural joins differ?

Cross joins and natural joins are two types of JOIN operations in SQL that are used to combine data from multiple tables. A cross join creates a new table by combining each row from the first table with every row from the second table, and is also known as a cartesian join. On the other hand, a natural join combines rows from two or more tables based on their common columns, forming a new table. One key difference between these types of JOINs is that cross joins do not require common columns between the tables being joined, while natural joins do.

Here's an example of a cross join:

If you have 20 products and 10 colors, that's 200 rows right there!

Here's a natural join example using two tables, GitLab employees and GitLab managers:

This natural join returns all rows from GitLab employees where there is no matching row in managers based on the column.

SQL Question 8: Finding GitLab Users Based on Email Domain

GitLab is a popular online service that allows developers to collaborate on projects. Suppose GitLab has a users database, and you're given a task to find all users who are using a specific company email domain (for example, 'gitlab.com') for their account.

Please filter the data from the database table to return only those records that have an email address ending in '@gitlab.com'. Additionally, we are interested in the 'username', 'email', 'confirmed_at', and 'last_sign_in_at' fields.

Example Input:

user_id	username	email	confirmed_at	last_sign_in_at
1001	user1	user1@company.com	2020-01-01	2021-12-30
1002	user2	user2@company.com	2020-02-02	2022-01-15
1003	user3	user3@gitlab.com	2020-03-03	2022-01-20
1004	user4	user4@gitlab.com	2020-04-04	2021-12-25
1005	user5	user5@anothercompany.com	2020-05-05	2022-01-25

Answer:

This SQL query uses the LIKE operator with a wildcard to match any email addresses that ends with '@gitlab.com'. It will return the 'username', 'email', 'confirmed_at', and 'last_sign_in_at' fields for those matching records from the table.

Example Output:

username	email	confirmed_at	last_sign_in_at
user3	user3@gitlab.com	2020-03-03	2022-01-20
user4	user4@gitlab.com	2020-04-04	2021-12-25

SQL Question 9: Determining the Project Engagement Level

As a data analyst at GitLab, one company data you might be interested in is the engagement level of different projects happening within the company. You are given two tables, and . The table contains information about each user's name and user ID. The table contains data about each project's commit history: the user ID of the contributor, the project ID they worked on, and the date they made the commit.

Your task is to write an SQL query to find the total number of commits each user has made per project, join that with the users' table, and order the result by the project ID in ascending order and then the total number of commits in descending order.

Example Input:

user_id	user_name
101	Alice
102	Bob
103	Charlie
104	David

Example Input:

user_id	project_id	commit_date
101	1001	2022-01-02
101	1001	2022-01-05
102	1001	2022-01-07
101	1002	2022-01-08
103	1002	2022-01-11
103	1002	2022-01-14
104	1003	2022-01-22

Answer:

In the above query, we first find the total number of commits each user has made per project using the clause and the . Then, we join this result with the table to get the names of the users. Lastly, we order the result by the project ID in ascending order and the total number of commits in descending order using the clause.

Because joins come up frequently during SQL interviews, take a stab at an interactive Spotify JOIN SQL question:

SQL Question 10: What are database views used for?

Database views are created to provide customized, read-only versions of your data that you can query just like a regular table. So why even use one if they're just like a regular table?

Views are useful for creating a simplified version of your data for specific users, or for hiding sensitive data from certain users while still allowing them to access other data.

How To Prepare for the GitLab SQL Interview

The key to acing a GitLab SQL interview is to practice, practice, and then practice some more! Beyond just solving the above GitLab SQL interview questions, you should also solve the 200+ SQL questions on DataLemur which come from companies like Netflix, Google, and Amazon.

Each interview question has hints to guide you, fully explained answers along with a discussion board to see how others solved it and best of all, there is an interactive coding environment so you can right online code up your SQL query answer and have it checked.

To prep for the GitLab SQL interview it is also useful to practice interview questions from other tech companies like:

In case your SQL skills are weak, don't worry about jumping right into solving questions – improve your SQL foundations with this interactive SQL tutorial.

This tutorial covers SQL topics like filtering on multiple conditions using AND/OR/NOT and aggreage functions like MIN()/MAX() – both of which come up routinely in SQL interviews at GitLab.

GitLab Data Science Interview Tips

What Do GitLab Data Science Interviews Cover?

In addition to SQL interview questions, the other question categories tested in the GitLab Data Science Interview are:

Statistics and Probability Questions
Python or R Programming Questions
Product Analytics Questions
Machine Learning Questions
Resume-Based Behavioral Questions

GitLab Data Scientist

How To Prepare for GitLab Data Science Interviews?

To prepare for GitLab Data Science interviews read the book Ace the Data Science Interview because it's got:

201 interview questions sourced from Microsoft, Amazon & startups
a crash course covering SQL, Product-Sense & ML
over 900+ 5-star reviews on Amazon

10 GitLab SQL Interview Questions (Updated 2025)

10 GitLab SQL Interview Questions

SQL Question 1: Calculate the Monthly Active Users

Example Input:

Example Output:

Answer:

SQL Question 2: Filtering GitLab User Activities

Example Input:

Example Input:

Answer:

SQL Question 3: What is denormalization?

SQL Question 4: Calculating Average Merge Request Time in GitLab

Example Input:

Example Output:

Answer:

SQL Question 5: What sets the 'BETWEEN' and 'IN' operators apart?

SQL Question 6: Analyze Click-Through-Rate for GitLab

Example Input:

Example Input:

Answer:

SQL Question 7: How do cross joins and natural joins differ?

SQL Question 8: Finding GitLab Users Based on Email Domain

Example Input:

Answer:

Example Output:

SQL Question 9: Determining the Project Engagement Level

Example Input:

Example Input:

Answer:

SQL Question 10: What are database views used for?

How To Prepare for the GitLab SQL Interview

GitLab Data Science Interview Tips

What Do GitLab Data Science Interviews Cover?

How To Prepare for GitLab Data Science Interviews?

Career Resources

Support

Interview Questions