At Atlassian, SQL crucial for extracting and analyzing user behavior data to enhance software tools, and managing databases for efficient retrieval and update of product information. Unsurprisingly this is why Atlassian asks SQL problems during interviews for Data Science, Data Engineering and Data Analytics jobs.
So, if you're studying for a SQL Interview, we've curated 9 Atlassian SQL interview questions to practice, which are similar to recently asked questions at Atlassian – how many can you solve?
Atlassian provides products like JIRA that helps keep track of tasks, bugs and issues among other features. Assume you have two tables, and . The table contains information about each team and its team_id, and the table contains information about each bug, its bug_id, the team who resolved it and the start and resolution date.
The objective is to write a SQL query to calculate the average resolution time for bugs for each team.
team_id | team_name |
---|---|
1 | Team Alpha |
2 | Team Bravo |
3 | Team Charlie |
bug_id | team_id | created_at | resolved_at |
---|---|---|---|
1 | 1 | 2018-06-09 00:00:00 | 2018-06-12 00:00:00 |
2 | 1 | 2018-06-10 00:00:00 | 2018-06-15 00:00:00 |
3 | 2 | 2018-06-11 00:00:00 | 2018-06-13 00:00:00 |
4 | 2 | 2018-06-12 00:00:00 | 2018-06-16 00:00:00 |
5 | 3 | 2018-06-13 00:00:00 | 2018-06-15 00:00:00 |
The SQL query will perform a JOIN to connect the two tables on team_id, calculate the difference between the and for each bug, then group by to calculate average resolution time.
Here's a possible PostgreSQL query to solve this:
This query first measures resolution time in seconds for each bug by subtracting from and converting the interval to seconds using the function. It then averages these resolution times by team. The output will be a list of teams and their average resolution time in hours. Please note that this query calculates the time in hours. To show the time in days or weeks, adjust the divisor accordingly.
To practice a similar problem about calculating rates, solve this TikTok SQL question within DataLemur's online SQL code editor:
Atlassian provides a range of software development tools. Suppose we have a database that tracks the usage of each tool by different users. We are interested in finding out which tool was most frequently used over the last month.
To track this usage, we will use a simple structure: a 'User' table that stores the user data and a 'Usage' table that stores each occurrence of a user interacting with a software tool.
user_id | user_name |
---|---|
1 | John |
2 | Paul |
3 | George |
4 | Ringo |
usage_id | user_id | product_name | date |
---|---|---|---|
1 | 1 | Jira | 06/22/2022 |
2 | 2 | Jira | 06/15/2022 |
3 | 1 | Confluence | 06/02/2022 |
4 | 3 | Bitbucket | 06/15/2022 |
5 | 4 | Trello | 06/25/2022 |
6 | 3 | Bitbucket | 06/30/2022 |
7 | 1 | Jira | 06/19/2022 |
8 | 2 | Jira | 06/15/2022 |
9 | 3 | Bitbucket | 06/15/2022 |
10 | 1 | Confluence | 06/23/2022 |
Your task is to write a query that will return the name of the software tool that was most frequently used over the last month and the total number of usages.
This query groups the records in the 'usage' table by 'product_name', counts the number of occurrences for each group, and limits the result to the one with the highest count. The WHERE clause ensures that only the records from the current month are considered. The EXTRACT function is used to get the month part of the date in PostgreSQL. The DATE_TRUNC function returns the first day of the current month. If there are ties for the most frequent usage, this query will return one of them. If all usages should be returned in case of ties, the LIMIT clause can be removed.
To solve a similar problem on DataLemur's free online SQL code editor, try this SQL interview question asked by Facebook:
{#Question-3}
The UNIQUE constraint makes sure that all values in a column are distinct. It is often paired with other constraints, like NOT NULL, to ensure that the data follows certain rules.
For example, say you were an analyst on the marketing team at Atlassian, and had access to a database on marketing campaigns:
In this example, the UNIQUE constraint is applied to the "campaign_name" field to ensure that each campaign has a unique name. This helps to ensure the integrity of the data in the table and prevents errors that could occur if two campaigns had the same name.
Atlassian, being a software development company, often uses issue tracking systems like Jira. Let's assume your role would involve analyzing patterns across resolved issues. The question is: "What is the average duration in days it takes for each project to resolve an issue?"
issue_id | project_id | status | reported_date | resolved_date |
---|---|---|---|---|
151 | 789 | resolved | 01/02/2022 | 03/04/2022 |
262 | 456 | resolved | 01/05/2022 | 02/06/2022 |
734 | 789 | open | 02/03/2022 | null |
456 | 123 | resolved | 06/01/2022 | 06/20/2022 |
789 | 456 | resolved | 08/01/2022 | 08/30/2022 |
102 | 123 | resolved | 05/23/2022 | 07/14/2022 |
project_id | avg_duration |
---|---|
789 | 60.00 |
456 | 64.00 |
123 | 78.50 |
This query first filters by the status of the issues, excluding any issues that aren't resolved. It then subtracts the reported date from resolved date to get the duration it took to resolve each issue. It groups by project_id and finds the average duration for each group (i.e., each project). The AVG function takes into account that the dates are treated as numbers in this context, providing the average duration in days.
To practice a related SQL problem on DataLemur's free online SQL coding environment, solve this Facebook SQL Interview question:
{#Question-5}
A database index is a way to optimize the performance of a database by reducing the amount of data that needs to be searched to retrieve a record.
There are several types of indexes that can be used in a database:
For a concrete example, say you had a table of Atlassian customer payments with the following columns: payment_id, customer_id, payment_amount, and payment_date.
Here's what a clustered index on the column would look like:
A clustered index on the column would determine the physical order of the records in the table based on the . This means that the records with the earliest values would be stored together physically in the table, followed by records with later payment_date values.
Having a clustered index on the column can speed up queries that filter or sort the data based on the payment_date, as the records are already physically sorted in the table. For example, if you want to retrieve all the payments made in the month of January, the database can use the clustered index to quickly locate and retrieve the desired records without having to search through the entire table.
Atlassian sells a variety of software development products. As a data analyst at Atlassian, you are asked to find out the average monthly sales per product. To answer this, you'll need to group the data by month and product, and then calculate the average sales using the AVG function.
sale_id | product_id | sale_date | unit_price | quantity |
---|---|---|---|---|
1 | Jira | 06/08/2022 00:00:00 | 100 | 5 |
2 | Confluence | 06/10/2022 00:00:00 | 200 | 3 |
3 | Bitbucket | 06/18/2022 00:00:00 | 50 | 10 |
4 | Jira | 07/12/2022 00:00:00 | 100 | 8 |
5 | Bitbucket | 07/22/2022 00:00:00 | 50 | 4 |
month | product | avg_sales |
---|---|---|
6 | Jira | 500 |
6 | Confluence | 600 |
6 | Bitbucket | 500 |
7 | Jira | 800 |
7 | Bitbucket | 200 |
This query functions by first extracting the month from each sale date using the EXTRACT function. It then groups the data by month and product id, and finally calculates the average sales for each group using PostgreSQL's AVG function. The ORDER BY clause is used to order the results first by month, then by product for easier readability.
To practice another question about calculating rates, try this SQL interview question from TikTok within DataLemur's online SQL code editor:
A cross join is a type of JOIN operation in SQL that creates a new table by combining each row from the first table with every row from the second table. It is also known as a cartesian join.
For example, say you worked on the Marketing Analytics team at <comapany_name>, and needed to understand what advertising copy (text) goes best with what advertising creative (the photo that gets posted with the advertisement copy).
Here is an example of a cross-join between a table of ad_copy and a table of ad_creative:
A natural join, on the other hand, is a type of JOIN that combines rows from two or more tables based on their common columns. It is called a "natural" join because it is based on the natural relationship that exists between the common columns in the tables being joined.
For an example of each one, say you had sales data exported from Atlassian's Salesforce CRM stored in a datawarehouse which had two tables: and .
An (which is a type of natural join) combines the two tables on the common
This query will return rows from the sales and tables that have matching customer id values. Only rows with matching values will be included in the results.
One main difference between cross joins and natural joins is that cross joins do not require any common columns between the tables being joined, while natural joins do. Another difference is that cross joins can create very large tables if the input tables have a large number of rows, while natural joins will only return a table with the number of rows equal to the number of matching rows in the input tables.
You are given two tables, and . The table holds customer_id, region and signup_time. The table contains product_id, customer_id and purchase_time.
Write a SQL query to find out the number of customers from each region who made their first purchase within one month of signing up.
customer_id | region | signup_time |
---|---|---|
1 | APAC | 2022-01-01 00:00:00 |
2 | AMER | 2022-02-01 00:00:00 |
3 | EMEA | 2022-03-10 00:00:00 |
4 | AMER | 2022-04-15 00:00:00 |
5 | APAC | 2022-05-20 00:00:00 |
product_id | customer_id | purchase_time |
---|---|---|
1001 | 1 | 2022-01-20 00:00:00 |
2002 | 2 | 2022-03-05 00:00:00 |
3003 | 3 | 2022-03-15 00:00:00 |
4004 | 4 | 2022-04-02 00:00:00 |
5005 | 2 | 2022-04-01 00:00:00 |
This query joins the customers and products table on the customer_id. It then filters the rows where the purchase_time is within one month of the signup_time. It finally groups the data by region and counts the distinct number of customer_ids for each group. This gives us the number of customers from each region who made their first purchase within one month of signing up.
To solve another question about calculating rates, solve this SQL interview question from TikTok on DataLemur's online SQL code editor:
Atlassian is a software company that offers several popular products like Jira and Confluence which are used by development teams to track issues and manage documents respectively. They would be interested in understanding how users are utilizing their tools over time.
Suppose you are given a table called containing log entries each time a user creates an issue in Jira or edits a document in Confluence. The table includes columns for (a unique identifier for the log entry), (identifying the user who made the entry), (the name of the product that was used, either 'Jira' or 'Confluence'), (the date and time of the activity), and (the time in minutes the user spent on the activity).
Write a SQL query to find the average duration of user activity for each product for each month of the year 2022.
log_id | user_id | product | timestamp | duration |
---|---|---|---|---|
4001 | 56 | Jira | 01/15/2022 10:00:00 | 15 |
7896 | 78 | Confluence | 02/05/2022 11:30:00 | 30 |
6445 | 98 | Jira | 03/10/2022 16:00:00 | 45 |
2109 | 56 | Confluence | 04/12/2022 09:00:00 | 60 |
5207 | 67 | Jira | 05/17/2022 14:00:00 | 120 |
3728 | 82 | Confluence | 06/26/2022 15:00:00 | 10 |
1246 | 98 | Jira | 07/02/2022 13:30:00 | 15 |
5894 | 67 | Confluence | 08/31/2022 10:00:00 | 20 |
month | product | avg_duration |
---|---|---|
1 | Jira | 15.00 |
2 | Confluence | 30.00 |
3 | Jira | 45.00 |
4 | Confluence | 60.00 |
5 | Jira | 120.00 |
6 | Confluence | 10.00 |
7 | Jira | 15.00 |
8 | Confluence | 20.00 |
This query first filters out activities from 2022. It then groups the data by the month and product, and calculates the average duration for each group. The result is ordered by month and then product.
To solve a related problem on DataLemur's free online SQL coding environment, attempt this Facebook SQL Interview question:
The best way to prepare for a SQL interview, besides making sure you have strong SQL fundamentals, is to practice a ton of real SQL questions that were asked in recent job interviews. Beyond just solving the earlier Atlassian SQL interview questions, you should also solve the 200+ FAANG SQL Questions on DataLemur which come from companies like Microsoft, Google, and Facebook.
Each SQL question has hints to guide you, full answers and best of all, there's an interactive coding environment so you can right in the browser run your query and have it graded.
To prep for the Atlassian SQL interview you can also be helpful to practice SQL problems from other tech companies like:
In case your SQL coding skills are weak, forget about diving straight into solving questions – improve your SQL foundations with this free SQL for Data Analytics course.
This tutorial covers topics including CASE/WHEN statements and different types of joins – both of these show up frequently in Atlassian SQL interviews.
For the Atlassian Data Science Interview, in addition to SQL query questions, the other types of questions to prepare for are:
The best way to prepare for Atlassian Data Science interviews is by reading Ace the Data Science Interview. The book's got: