At Atlassian, SQL crucial for extracting and analyzing user behavior data to enhance software tools, and managing databases for efficient retrieval and update of product information. Unsurprisingly this is why Atlassian asks SQL problems during interviews for Data Science, Data Engineering and Data Analytics jobs.
So, if you're studying for a SQL Interview, we've curated 9 Atlassian SQL interview questions to practice, which are similar to recently asked questions at Atlassian – how many can you solve?
Atlassian provides products like JIRA that helps keep track of tasks, bugs and issues among other features. Assume you have two tables, teams
and bugs
. The teams
table contains information about each team and its team_id, and the bugs
table contains information about each bug, its bug_id, the team who resolved it and the start and resolution date.
The objective is to write a SQL query to calculate the average resolution time for bugs for each team.
teams
Example Input:team_id | team_name |
---|---|
1 | Team Alpha |
2 | Team Bravo |
3 | Team Charlie |
bugs
Example Input:bug_id | team_id | created_at | resolved_at |
---|---|---|---|
1 | 1 | 2018-06-09 00:00:00 | 2018-06-12 00:00:00 |
2 | 1 | 2018-06-10 00:00:00 | 2018-06-15 00:00:00 |
3 | 2 | 2018-06-11 00:00:00 | 2018-06-13 00:00:00 |
4 | 2 | 2018-06-12 00:00:00 | 2018-06-16 00:00:00 |
5 | 3 | 2018-06-13 00:00:00 | 2018-06-15 00:00:00 |
The SQL query will perform a JOIN to connect the two tables on team_id, calculate the difference between the resolved_at
and created_at
for each bug, then group by team_id
to calculate average resolution time.
Here's a possible PostgreSQL query to solve this:
SELECT t.team_name, AVG(EXTRACT(EPOCH FROM (b.resolved_at - b.created_at)))/3600 AS average_resolution_time_hours FROM bugs b JOIN teams t ON b.team_id = t.team_id GROUP BY t.team_name ORDER BY average_resolution_time_hours;
This query first measures resolution time in seconds for each bug by subtracting created_at
from resolved_at
and converting the interval to seconds using the EXTRACT(EPOCH FROM interval)
function. It then averages these resolution times by team. The output will be a list of teams and their average resolution time in hours. Please note that this query calculates the time in hours. To show the time in days or weeks, adjust the divisor accordingly.
To practice a similar problem about calculating rates, solve this TikTok SQL question within DataLemur's online SQL code editor:
Atlassian provides a range of software development tools. Suppose we have a database that tracks the usage of each tool by different users. We are interested in finding out which tool was most frequently used over the last month.
To track this usage, we will use a simple structure: a 'User' table that stores the user data and a 'Usage' table that stores each occurrence of a user interacting with a software tool.
users
Sample Input:user_id | user_name |
---|---|
1 | John |
2 | Paul |
3 | George |
4 | Ringo |
usage
Sample Input:usage_id | user_id | product_name | date |
---|---|---|---|
1 | 1 | Jira | 06/22/2022 |
2 | 2 | Jira | 06/15/2022 |
3 | 1 | Confluence | 06/02/2022 |
4 | 3 | Bitbucket | 06/15/2022 |
5 | 4 | Trello | 06/25/2022 |
6 | 3 | Bitbucket | 06/30/2022 |
7 | 1 | Jira | 06/19/2022 |
8 | 2 | Jira | 06/15/2022 |
9 | 3 | Bitbucket | 06/15/2022 |
10 | 1 | Confluence | 06/23/2022 |
Your task is to write a query that will return the name of the software tool that was most frequently used over the last month and the total number of usages.
SELECT product_name, count(*) AS usage_count FROM usage WHERE EXTRACT(MONTH FROM date) = EXTRACT(MONTH FROM DATE_TRUNC('month', NOW())) GROUP BY product_name ORDER BY usage_count DESC LIMIT 1;
This query groups the records in the 'usage' table by 'product_name', counts the number of occurrences for each group, and limits the result to the one with the highest count. The WHERE clause ensures that only the records from the current month are considered. The EXTRACT function is used to get the month part of the date in PostgreSQL. The DATE_TRUNC function returns the first day of the current month. If there are ties for the most frequent usage, this query will return one of them. If all usages should be returned in case of ties, the LIMIT clause can be removed.
To solve a similar problem on DataLemur's free online SQL code editor, try this SQL interview question asked by Facebook:
UNIQUE
?{#Question-3}
The UNIQUE constraint makes sure that all values in a column are distinct. It is often paired with other constraints, like NOT NULL, to ensure that the data follows certain rules.
For example, say you were an analyst on the marketing team at Atlassian, and had access to a database on marketing campaigns:
CREATE TABLE atlassian_campaigns ( campaign_id INTEGER PRIMARY KEY, campaign_name VARCHAR(255) NOT NULL UNIQUE, start_date DATE NOT NULL, end_date DATE NOT NULL, budget DECIMAL(10,2) NOT NULL );
In this example, the UNIQUE constraint is applied to the "campaign_name" field to ensure that each campaign has a unique name. This helps to ensure the integrity of the data in the atlassian_campaigns
table and prevents errors that could occur if two campaigns had the same name.
Atlassian, being a software development company, often uses issue tracking systems like Jira. Let's assume your role would involve analyzing patterns across resolved issues. The question is: "What is the average duration in days it takes for each project to resolve an issue?"
issues
Example Input:issue_id | project_id | status | reported_date | resolved_date |
---|---|---|---|---|
151 | 789 | resolved | 01/02/2022 | 03/04/2022 |
262 | 456 | resolved | 01/05/2022 | 02/06/2022 |
734 | 789 | open | 02/03/2022 | null |
456 | 123 | resolved | 06/01/2022 | 06/20/2022 |
789 | 456 | resolved | 08/01/2022 | 08/30/2022 |
102 | 123 | resolved | 05/23/2022 | 07/14/2022 |
project_id | avg_duration |
---|---|
789 | 60.00 |
456 | 64.00 |
123 | 78.50 |
SELECT project_id, AVG(resolved_date - reported_date) AS avg_duration FROM issues WHERE status = 'resolved' GROUP BY project_id;
This query first filters by the status of the issues, excluding any issues that aren't resolved. It then subtracts the reported date from resolved date to get the duration it took to resolve each issue. It groups by project_id and finds the average duration for each group (i.e., each project). The AVG function takes into account that the dates are treated as numbers in this context, providing the average duration in days.
To practice a related SQL problem on DataLemur's free online SQL coding environment, solve this Facebook SQL Interview question:
{#Question-5}
A database index is a way to optimize the performance of a database by reducing the amount of data that needs to be searched to retrieve a record.
There are several types of indexes that can be used in a database:
For a concrete example, say you had a table of Atlassian customer payments with the following columns: payment_id, customer_id, payment_amount, and payment_date.
Here's what a clustered index on the payment_date
column would look like:
CREATE CLUSTERED INDEX payment_date_index ON atlassian_customer_payments (payment_date)
A clustered index on the payment_date
column would determine the physical order of the records in the table based on the payment_date
. This means that the records with the earliest payment_date
values would be stored together physically in the table, followed by records with later payment_date values.
Having a clustered index on the payment_date
column can speed up queries that filter or sort the data based on the payment_date, as the records are already physically sorted in the table. For example, if you want to retrieve all the payments made in the month of January, the database can use the clustered index to quickly locate and retrieve the desired records without having to search through the entire table.
Atlassian sells a variety of software development products. As a data analyst at Atlassian, you are asked to find out the average monthly sales per product. To answer this, you'll need to group the data by month and product, and then calculate the average sales using the AVG function.
sales
Example Input:sale_id | product_id | sale_date | unit_price | quantity |
---|---|---|---|---|
1 | Jira | 06/08/2022 00:00:00 | 100 | 5 |
2 | Confluence | 06/10/2022 00:00:00 | 200 | 3 |
3 | Bitbucket | 06/18/2022 00:00:00 | 50 | 10 |
4 | Jira | 07/12/2022 00:00:00 | 100 | 8 |
5 | Bitbucket | 07/22/2022 00:00:00 | 50 | 4 |
month | product | avg_sales |
---|---|---|
6 | Jira | 500 |
6 | Confluence | 600 |
6 | Bitbucket | 500 |
7 | Jira | 800 |
7 | Bitbucket | 200 |
SELECT EXTRACT(MONTH FROM sale_date) as month, product_id as product, AVG(unit_price * quantity) as avg_sales FROM sales GROUP BY month, product ORDER BY month, product;
This query functions by first extracting the month from each sale date using the EXTRACT function. It then groups the data by month and product id, and finally calculates the average sales for each group using PostgreSQL's AVG function. The ORDER BY clause is used to order the results first by month, then by product for easier readability.
To practice another question about calculating rates, try this SQL interview question from TikTok within DataLemur's online SQL code editor:
A cross join is a type of JOIN operation in SQL that creates a new table by combining each row from the first table with every row from the second table. It is also known as a cartesian join.
For example, say you worked on the Marketing Analytics team at <comapany_name>, and needed to understand what advertising copy (text) goes best with what advertising creative (the photo that gets posted with the advertisement copy).
Here is an example of a cross-join between a table of ad_copy and a table of ad_creative:
SELECT ad_copy.copy AS ad_copy, ad_creative.image_url AS ad_creative FROM ad_copy CROSS JOIN ad_creative;
A natural join, on the other hand, is a type of JOIN that combines rows from two or more tables based on their common columns. It is called a "natural" join because it is based on the natural relationship that exists between the common columns in the tables being joined.
For an example of each one, say you had sales data exported from Atlassian's Salesforce CRM stored in a datawarehouse which had two tables: sales
and atlassian_customers
.
An INNER JOIN
(which is a type of natural join) combines the two tables on the common customer_id field
SELECT * FROM sales INNER JOIN atlassian_customers ON sales.customer_id = atlassian_customers.id
This query will return rows from the sales and atlassian_customers
tables that have matching customer id values. Only rows with matching customer_id
values will be included in the results.
One main difference between cross joins and natural joins is that cross joins do not require any common columns between the tables being joined, while natural joins do. Another difference is that cross joins can create very large tables if the input tables have a large number of rows, while natural joins will only return a table with the number of rows equal to the number of matching rows in the input tables.
You are given two tables, customers
and products
. The customers
table holds customer_id, region and signup_time. The products
table contains product_id, customer_id and purchase_time.
Write a SQL query to find out the number of customers from each region who made their first purchase within one month of signing up.
customers
Example Input:customer_id | region | signup_time |
---|---|---|
1 | APAC | 2022-01-01 00:00:00 |
2 | AMER | 2022-02-01 00:00:00 |
3 | EMEA | 2022-03-10 00:00:00 |
4 | AMER | 2022-04-15 00:00:00 |
5 | APAC | 2022-05-20 00:00:00 |
products
Example Input:product_id | customer_id | purchase_time |
---|---|---|
1001 | 1 | 2022-01-20 00:00:00 |
2002 | 2 | 2022-03-05 00:00:00 |
3003 | 3 | 2022-03-15 00:00:00 |
4004 | 4 | 2022-04-02 00:00:00 |
5005 | 2 | 2022-04-01 00:00:00 |
SELECT c.region, count(DISTINCT p.customer_id) as num_customers FROM customers c JOIN products p ON c.customer_id = p.customer_id WHERE p.purchase_time BETWEEN c.signup_time AND c.signup_time + INTERVAL '1 month' GROUP BY c.region;
This query joins the customers and products table on the customer_id. It then filters the rows where the purchase_time is within one month of the signup_time. It finally groups the data by region and counts the distinct number of customer_ids for each group. This gives us the number of customers from each region who made their first purchase within one month of signing up.
To solve another question about calculating rates, solve this SQL interview question from TikTok on DataLemur's online SQL code editor:
Atlassian is a software company that offers several popular products like Jira and Confluence which are used by development teams to track issues and manage documents respectively. They would be interested in understanding how users are utilizing their tools over time.
Suppose you are given a table called activity_logs
containing log entries each time a user creates an issue in Jira or edits a document in Confluence. The table includes columns for log_id
(a unique identifier for the log entry), user_id
(identifying the user who made the entry), product
(the name of the product that was used, either 'Jira' or 'Confluence'), timestamp
(the date and time of the activity), and duration
(the time in minutes the user spent on the activity).
Write a SQL query to find the average duration of user activity for each product for each month of the year 2022.
activity_logs
Example Input:log_id | user_id | product | timestamp | duration |
---|---|---|---|---|
4001 | 56 | Jira | 01/15/2022 10:00:00 | 15 |
7896 | 78 | Confluence | 02/05/2022 11:30:00 | 30 |
6445 | 98 | Jira | 03/10/2022 16:00:00 | 45 |
2109 | 56 | Confluence | 04/12/2022 09:00:00 | 60 |
5207 | 67 | Jira | 05/17/2022 14:00:00 | 120 |
3728 | 82 | Confluence | 06/26/2022 15:00:00 | 10 |
1246 | 98 | Jira | 07/02/2022 13:30:00 | 15 |
5894 | 67 | Confluence | 08/31/2022 10:00:00 | 20 |
month | product | avg_duration |
---|---|---|
1 | Jira | 15.00 |
2 | Confluence | 30.00 |
3 | Jira | 45.00 |
4 | Confluence | 60.00 |
5 | Jira | 120.00 |
6 | Confluence | 10.00 |
7 | Jira | 15.00 |
8 | Confluence | 20.00 |
SELECT EXTRACT(MONTH FROM timestamp) as month, product, AVG(duration) as avg_duration FROM activity_logs WHERE EXTRACT(YEAR FROM timestamp) = 2022 GROUP BY month, product ORDER BY month, product;
This query first filters out activities from 2022. It then groups the data by the month and product, and calculates the average duration for each group. The result is ordered by month and then product.
To solve a related problem on DataLemur's free online SQL coding environment, attempt this Facebook SQL Interview question:
The best way to prepare for a SQL interview, besides making sure you have strong SQL fundamentals, is to practice a ton of real SQL questions that were asked in recent job interviews.
Beyond just solving the earlier Atlassian SQL interview questions, you should also solve the 200+ FAANG SQL Questions on DataLemur which come from companies like Microsoft, Google, and Facebook.
Each SQL question has hints to guide you, full answers and best of all, there's an interactive coding environment so you can right in the browser run your query and have it graded.
To prep for the Atlassian SQL interview you can also be helpful to practice SQL problems from other tech companies like:
In case your SQL coding skills are weak, forget about diving straight into solving questions – improve your SQL foundations with this free SQL for Data Analytics course.
This tutorial covers topics including CASE/WHEN statements and different types of joins – both of these show up frequently in Atlassian SQL interviews.
For the Atlassian Data Science Interview, in addition to SQL query questions, the other types of questions to prepare for are:
The best way to prepare for Atlassian Data Science interviews is by reading Ace the Data Science Interview. The book's got: