Semrush employees use SQL daily for extracting actionable insights from large SEO datasets and optimizing internal data processes for efficient digital marketing strategies. That's why Semrush often tests SQL coding questions during interviews for Data Science, Data Engineering and Data Analytics jobs.
Thus, to help you prep, we've curated 9 Semrush SQL interview questions – how many can you solve?
As a Data Analyst at Semrush, you are tasked to analyze website traffic data for our clients. Specifically, you are asked to determine the total and average number of daily page views per user for each website over a given time period.
The following two tables represent a subset of the data that Semrush stores:
id | user_id | website_id | visit_date | page_views |
---|---|---|---|---|
1 | 101 | 5001 | 2022-06-01 | 5 |
2 | 101 | 5001 | 2022-06-02 | 8 |
3 | 102 | 5002 | 2022-06-01 | 3 |
4 | 102 | 5002 | 2022-06-03 | 2 |
5 | 101 | 5001 | 2022-06-03 | 7 |
website_id | website_name |
---|---|
5001 | example.com |
5002 | test.com |
Write a PostgreSQL query that returns the following columns (order of rows doesn't matter):
This SQL statement uses window functions to calculate the daily total and average page views per website. The clause is used to group the data by website and visit date. The function adds up all the page views for each group, and the function calculates the average page views per user for each group. The clause is used to combine the and tables based on their common column, . The output gives the website_name, visit_date, total_daily_page_views, and average_daily_page_views_per_user.
To solve a related window function SQL problem on DataLemur's free interactive SQL code editor, solve this Google SQL Interview Question:
Given a table of Semrush employee salary information, write a SQL query to find employees who earn more money than their own boss.
employee_id | name | salary | department_id | manager_id |
---|---|---|---|---|
1 | Emma Thompson | 3800 | 1 | |
2 | Daniel Rodriguez | 2230 | 1 | 10 |
3 | Olivia Smith | 8000 | 1 | 8 |
4 | Noah Johnson | 6800 | 2 | 8 |
5 | Sophia Martinez | 1750 | 1 | 10 |
8 | William Davis | 7000 | 2 | NULL |
10 | James Anderson | 4000 | 1 | NULL |
employee_id | employee_name |
---|---|
3 | Olivia Smith |
This is the output because Olivia Smith earns 8,000 (which is more than her manager, William Davis who earns 7,800).
Write a SQL query for this interview question interactively on DataLemur:
First, we perform a SELF-JOIN where we treat the first table () as the managers' table and the second table () as the employees' table. Then we use a clause to filter the results, ensuring we only get employees whose salaries are higher than their manager's salary.
If the code above is hard to understand, you can find a step-by-step solution here: Well Paid Employees.
Denormalization is the practice of altering a database schema in a way that breaks the normalization rules (1st, 2nd, 3rd normal forms).
For example, in a database that stores Semrush sales analytics data, you might have separate tables for "customers," "orders," and "products," with foreign key constraints linking the tables together. This helps to ensure the integrity of the data and reduces redundancy, but it can also make queries that involve multiple tables more complex and slower to execute.
By de-normalizing the database, you can combine some of the data from separate tables into a single table, which can reduce the number of joins that are required to retrieve the data you need. This can speed up queries and improve the performance of your database.
However, it's important to carefully consider the trade-offs of de-normalization before making any changes to your database. De-normalization can make it more difficult to maintain the integrity and reliability of your data, and can also increase the risk of data redundancy. It's generally best to use de-normalization as a performance optimization technique only when necessary, and to carefully evaluate the benefits and drawbacks in the context of your specific database and workload.
Semrush is a platform that provides data about the performance of domains on Search Engines. They have multiple modules like website traffic analysis, keyword research, backlink analysis etc. Let's consider that we have two tables: and . The table contains information for domains, and the table captures the information of keywords for each domain.
Could you please design a query that calculates the total number of keywords for each domain and also ranks the domains based on the total keyword count in descending order?
domain_id | domain_name | submit_date |
---|---|---|
101 | www.semrush.com | 6/20/2022 |
102 | www.google.com | 6/18/2022 |
103 | www.microsoft.com | 6/22/2022 |
104 | www.amazon.com | 6/20/2022 |
keyword_id | domain_id | keyword |
---|---|---|
201 | 101 | seo tools |
202 | 101 | keyword research |
203 | 102 | search engine |
204 | 102 | google search |
205 | 103 | microsoft office |
206 | 104 | amazon shopping |
The join operation allows us to relate the two tables using the domain_id, then we count the number of keywords per domain. Finally, the results are ordered by the keyword count in descending order. This gives us the total number of keywords associated with each domain, ranked from highest to lowest. The join operation ensures that only domains with at least one keyword are included in the result.
To discover records in one table that are not present in another, you can utilize a and filter out any values in the right-side table.
For example, say you had a table of Semrush customers and a 2nd table of all purchases made with Semrush. To find all customers who did not make a purchase, you'd use the following
This query returns all rows from the customers table, along with any matching rows from the purchases table. If there is no matching row in the purchases table, values will be returned for all of the right table's columns. The clause then filters out any rows where the purchases.id column is , leaving only customers who have not made a purchase.
Let's assume a simplified table that captures the daily traffic data for different websites. The goal is to write a SQL query that filters the websites which have been receiving higher than average traffic in the last 7 days.
date | website_id | visits |
---|---|---|
06/08/2022 | 101 | 2000 |
06/09/2022 | 101 | 2500 |
06/10/2022 | 101 | 3000 |
06/10/2022 | 102 | 4000 |
06/11/2022 | 103 | 2500 |
06/12/2022 | 101 | 2200 |
06/12/2022 | 102 | 4200 |
06/13/2022 | 101 | 2100 |
06/13/2022 | 102 | 4300 |
06/14/2022 | 101 | 2300 |
06/14/2022 | 102 | 3900 |
06/14/2022 | 103 | 3000 |
This query calculates the average traffic for all the websites in the last 7 days. Then it compares this average with the individual visit numbers of each website in the same period. The websites with visits higher than the overall average are then selected. This can be very useful for finding trending websites or those that successfully run a marketing campaign.
Both the and window functions are used to access a row at a specific offset from the current row.
However, the function retrieves a value from a row that follows the current row, whereas the function retrieves a value from a row that precedes the current row.
Say you had a table of salary data for Semrush employees:
name | salary |
---|---|
Amanda | 130000 |
Brandon | 90000 |
Carlita | 80000 |
You could use the function to output the salary of each employee, along with the next highest-paid employee:
This would yield the following output:
name | salary | next_salary |
---|---|---|
Amanda | 130000 | 90000 |
Brandon | 90000 | 80000 |
Carlita | 80000 | NULL |
Swapping for would get you the salary of the person who made just more than you:
name | salary | next_salary |
---|---|---|
Amanda | 130000 | NULL |
Brandon | 90000 | 130000 |
Carlita | 80000 | 90000 |
Semrush is a company that provides digital marketers with insights related to their SEO performance, advertisement strategies, content, and social media. They gather vast amounts of data on website traffic, rankings, and other metrics of interest.
Imagine you are a data analyst at Semrush and you've been assigned a task to figure out the most visited websites for every month. Your task is to write a SQL query retrieving the top 3 most visited websites for each month based on the view count.
view_id | website_id | visit_date | view_count |
---|---|---|---|
1 | 101 | 2022-07-01 | 3000 |
2 | 102 | 2022-07-03 | 5000 |
3 | 103 | 2022-07-10 | 2000 |
4 | 101 | 2022-08-16 | 2500 |
5 | 102 | 2022-08-17 | 10000 |
6 | 103 | 2022-08-21 | 6000 |
7 | 104 | 2022-08-28 | 12000 |
8 | 101 | 2022-09-07 | 4000 |
9 | 105 | 2022-09-10 | 25000 |
10 | 102 | 2022-09-11 | 15000 |
month | website | total_views |
---|---|---|
7 | 102 | 5000 |
7 | 101 | 3000 |
7 | 103 | 2000 |
8 | 104 | 12000 |
8 | 102 | 10000 |
8 | 103 | 6000 |
9 | 105 | 25000 |
9 | 102 | 15000 |
9 | 101 | 4000 |
Here is a simple PostgreSQL query to solve this:
The query first aggregates the total views per month for each website. It then ranks the websites by the total views within each month. The final query selects the top 3 websites for each month according to the ranking.
As a data analyst at Semrush, one of your tasks is to get insights on customer's usage patterns to improve the product functionalities. You have two tables, the table that contains details about the customers and the table that records the time spent by these customers on various applications on a particular day.
Write a SQL query to join these two tables and calculate the total and average time spent by customers on each application. Also, return the results sorted by the application name.
The table is structured as follows:
customer_id | first_name | last_name | |
---|---|---|---|
101 | Mark | Smith | mark.smith@example.com |
102 | Sarah | Lee | sarah.lee@example.com |
103 | John | Doe | john.doe@example.com |
The table is structured as follows:
record_id | customer_id | application | usage_time | date |
---|---|---|---|---|
2001 | 101 | App1 | 120 | 2023-08-02 |
2002 | 101 | App2 | 60 | 2023-08-02 |
2003 | 102 | App1 | 90 | 2023-08-03 |
2004 | 103 | App2 | 130 | 2023-08-04 |
2005 | 103 | App1 | 70 | 2023-08-04 |
This query first joins the and table using an on the column. Then it groups the results by the and calculates two aggregations: the total and average spent by customers on each application. Finally, it sorts the result in ascending order of name.
Since join questions come up frequently during SQL interviews, take a stab at this Spotify JOIN SQL question:
Firstly, understand how marketers use SQL – you'll be tested on this! Next, solve as many QL interview questions as you can! Beyond just solving the above Semrush SQL interview questions, you should also solve the 200+ SQL exercises on DataLemur which come from companies like Netflix, Google, and Amazon.
Each DataLemur SQL question has multiple hints, full answers and best of all, there's an online SQL code editor so you can easily right in the browser your SQL query and have it executed.
To prep for the Semrush SQL interview it is also useful to solve SQL problems from other tech companies like:
In case your SQL skills are weak, don't worry about jumping right into solving questions – improve your SQL foundations with this SQL interview tutorial.
This tutorial covers topics including CASE/WHEN statements and grouping by multiple columns – both of which show up routinely in SQL interviews at Semrush.
Beyond writing SQL queries, the other topics to practice for the Semrush Data Science Interview are:
Firstly, understand the data SEMRush has, and what competitor tools like Ahrefs also provides. Next, keep these products and datasets in mind as you read the book Ace the Data Science Interview, which has tons of commonly asked data interview questions.