At Freshworks, SQL is used day-to-day to generate product insights into Freshwork's Customer Service product suite. So, it shouldn't surprise you that Freshworks almost always asks SQL problems in interviews for Data Science, Analytics, and & Data Engineering jobs.
Thus, to help you study for the Freshworks SQL interview, we'll cover 10 Freshworks SQL interview questions – able to answer them all?
Given a table called with user reviews for different products, write a SQL query to find the highest-rated product for each week. Assume we have a table that comprises , , , , and . Here stars are given on a scale of 1-5.
If there's a tie in rating, choose the product with more reviews for that week. If there's still a tie, choose any one of them.
review_id | user_id | submit_date | product_id | stars |
---|---|---|---|---|
6171 | 123 | 07/08/2022 00:00:00 | 50001 | 4 |
7802 | 265 | 07/15/2022 00:00:00 | 69852 | 4 |
5293 | 362 | 07/22/2022 00:00:00 | 50001 | 3 |
6352 | 192 | 07/29/2022 00:00:00 | 69852 | 3 |
4517 | 981 | 08/05/2022 00:00:00 | 69852 | 2 |
This query first aggregates reviews by week and product, computing average ratings and count of reviews. It then uses a window function to assign a row number within each partition, ordered by average rating and count of reviews in descending order. Finally, it selects the first row in each partition (i.e., the highest-rated product in each week).
To practice a related window function SQL problem on DataLemur's free online SQL coding environment, solve this Amazon SQL question asked in a BI Engineer interview:
Freshworks is a customer support platform that allows users to generate trouble tickets when they have an issue, which are handled by assigned customer support agents. Imagine we have two tables:
A table named with columns: 'ticket_id' (unique for each ticket), 'user_id' (id of user who generated the ticket), 'assignee_id' (id of agent to whom ticket is assigned), 'generate_date' (date when ticket generated in 'MM/DD/YYYY HH24:MI:SS' format), 'solve_date' (date when ticket solved), 'status' (status of ticket - could be "solved", "in progress", "new")
And another table named with columns: 'agent_id' (unique id for each agent), 'first_name', 'last_name'.
Write a SQL query that returns the name of the agent, along with the number of tickets they have solved between June and August 2022 (inclusive).
ticket_id | user_id | assignee_id | generate_date | solve_date | status |
---|---|---|---|---|---|
1001 | 75 | 4 | 06/01/2022 09:00:00 | 06/06/2022 14:00:00 | solved |
1002 | 85 | 4 | 06/02/2022 10:00:00 | 06/07/2022 15:00:00 | solved |
1003 | 65 | 3 | 06/02/2022 11:00:00 | in progress | |
1004 | 97 | 5 | 07/20/2022 19:00:00 | 07/21/2022 16:00:00 | solved |
1005 | 32 | 2 | 08/15/2022 08:00:00 | new |
agent_id | first_name | last_name |
---|---|---|
1 | John | Doe |
2 | Jane | Smith |
3 | Max | Brown |
4 | Emma | Hill |
5 | Oliver | Johnson |
This query first joins the "agents" and "tickets" tables on the 'agent_id' and 'assignee_id' fields. Then it filters for tickets which are marked as 'solved' and were solved between June and August 2022. Finally, it groups the results by the agent's id and counts the number of solved tickets per agent. The result is a list of agent names along with the count of tickets they each solved in the given period.
As the name implies, the window function ranks each row within your window/partition. If two rows have the same rank, the next number in the ranking will be the previous rank plus the number of duplicates. For example, if you've got three records tied for 5th place, they'll get the values 5, 6, and 7, and the next row after this 3-way tie will have a rank of 8.
The function assigns a distinct rank to each row within a partition based on the provided column value with no gaps. This function will assign the same rank to two rows if they have the same rank, and give the next row the next rank number. To make this more concrete, imagine you had three records at rank 5 – then, the next rank would be 6.
In the context of Freshworks CRM, imagine you are analyzing a sales database. You want to identify customers that have been with company for more than two years but have not made a purchase in over a month. Your task is to write a SQL query that will filter out such customers.
Here are two example tables containing relevant data:
customer_id | joining_date |
---|---|
101 | 02/15/2018 |
102 | 06/23/2019 |
103 | 05/12/2021 |
104 | 03/01/2020 |
105 | 01/08/2018 |
order_id | customer_id | purchase_date |
---|---|---|
1001 | 101 | 07/15/2022 |
1002 | 105 | 05/23/2022 |
1003 | 102 | 06/03/2022 |
1004 | 103 | 07/18/2022 |
1005 | 104 | 06/02/2022 |
Now, write a SQL query to get all customers who have joined more than two years ago but have not made any purchases in the past month.
This query retrieves the customer details from the table who have joined over two years ago. The LEFT JOIN ensures that all customers are considered, even those who haven't made any purchases (their purchase_date would be NULL). After this, we place a condition where the customer's most recent purchase date, if exists, should be over a month ago. The result is a list of customer ID and their joining date, who have not been active recently despite being old customers.
There are several advantages to normalizing a database, including less redundancy, more flexibility, and better performance.
Less Redundancy: Normalization reduces redundancy by breaking down a larger, more general table into smaller, more specific tables. This reduces the amount of data that needs to be accessed for queries.
More Flexibility: Normalization makes it easier to modify the structure of the database, as there is less redundancy, so it allows you to make changes to one table without affecting others. This makes it easier to adapt the database to changing business needs (a very real reality at Freshworks!)
Better Performance: Normalization can improve the performance of the database by reducing the amount of data that needs to be stored and accessed in a single table. This can result in faster query times and better overall performance.
As a database manager at Freshworks, a company that provides customer service software, you are asked to determine the average duration of support tickets in your system. The duration of a support ticket is defined as the number of days between the day the ticket is created and the day it is closed.
The sample table 'tickets' includes the relevant data.
ticket_id | created_at | closed_at |
---|---|---|
1 | 01/01/2022 | 01/03/2022 |
2 | 01/02/2022 | 01/09/2022 |
3 | 01/10/2022 | 01/15/2022 |
4 | 01/15/2022 | 01/20/2022 |
5 | 01/20/2022 | 01/25/2022 |
You are required to write a PostgreSQL query to calculate the average duration of support tickets.
This query first calculates the duration of each ticket by subtracting the created date from the closed date using , and then extracts the number of days from this duration using . It then calculates the average of these durations with the function. This will give you the average duration of a support ticket on the 'tickets' table.
To practice a very similar question try this interactive Facebook Average Post Hiatus (Part 1) Question which is similar for calculating days between two events or this Amazon Server Utilization Time Question which is similar for calculating duration of an event.
Here's an example of a clustered index on the column of a table of Freshworks payments table:
This creates a clustered index on the column, which will determine the physical order of the data rows in the table.
Here is an example of a non-clustered index on the column of the same table:
This will create a non-clustered index on the column, which will not affect the physical order of the data rows in the table.
In terms of query performance, a clustered index is usually faster for searches that return a large number of records, while a non-clustered index is faster for searches that return a small number of records. However, updates to a clustered index are slower, as they require the data rows to be physically rearranged, while updates to a non-clustered index are faster, as they only require the index data structure to be updated.
As a data analyst at Freshworks, your team aims to better understand the performance of their digital marketing campaigns.
They track two types of user interactions:
Your task is to write a query to compute the Click-Through-Rate (CTR) and Conversion Rate for their ads and products. The Click-Through-Rate (CTR) for an ad is defined as the number of users who clicked on the ad divided by the number of times the ad was displayed. Conversion Rate is defined as the number of users who add a product to the cart divided by the number of users who viewed the product.
ad_id | user_id | registered_click | timestamp |
---|---|---|---|
12345 | 101 | true | 2022-06-08 00:00:00 |
12345 | 102 | false | 2022-06-08 00:05:00 |
12345 | 103 | true | 2022-06-08 00:10:00 |
67890 | 101 | false | 2022-06-08 00:15:00 |
67890 | 104 | true | 2022-06-08 00:20:00 |
product_id | user_id | add_to_cart | timestamp |
---|---|---|---|
9393 | 101 | true | 2022-06-08 00:00:00 |
9393 | 105 | false | 2022-06-08 00:05:00 |
4567 | 110 | false | 2022-06-08 00:10:00 |
4567 | 101 | true | 2022-06-08 00:15:00 |
4567 | 111 | true | 2022-06-08 00:20:00 |
The first query calculates the Click-Through-Rate (CTR) for each ad by dividing the number of clicks ( = true) by the total number of views (total row count) for each ad.
The second query calculates the Conversion Rate for each product by dividing the number of successful add to cart actions ( = true) by the total number of product views for each product.
To solve a similar SQL interview question on DataLemur's free interactive coding environment, try this SQL interview question asked by Facebook:
As a Customer Relationship Manager at Freshworks, you need to communicate with your customers regularly. You want to thank all customers whose first names start with a pattern. For this particular task, you wish to fetch the details of all customers whose first names start with 'J'.
Write an SQL query that will return the full names of all customers where their first name begins with 'J', according to our 'customers' table.
customer_id | first_name | last_name | phone | join_date | |
---|---|---|---|---|---|
1 | John | Doe | johndoe@gmail.com | 123-456-7890 | 2022-07-01 |
2 | Jennifer | Smith | jennifersmith@gmail.com | 123-456-7891 | 2022-06-15 |
3 | Sam | Lee | samlee@gmail.com | 123-456-7892 | 2022-05-25 |
4 | Jules | Verne | julesverne@gmail.com | 123-456-7893 | 2022-06-30 |
5 | Mike | Johnson | mikejohnson@gmail.com | 123-456-7894 | 2022-07-10 |
This query uses the SQL LIKE keyword along with the wildcard character '%' to filter out all customer records whose first name starts with the letter 'J'. The '%' operator in SQL 'LIKE' is used to represent zero, one or multiple characters.
first_name | last_name |
---|---|
John | Doe |
Jennifer | Smith |
Jules | Verne |
Denormalization is the process of modifying a database schema in a way that deviates from the typical rules of normalization (1NF, 2NF, 3NF, etc.). There's a few reasons to denormalize a database:
Improved performance: Joins are slow AF, especially when dealing with the massive datasets that are typically used at Freshworks. Denormalization can improve the performance of a database by reducing the number of joins that are required to retrieve data. This can be particularly useful when the database is being used for querying and reporting purposes, as joins can be expensive and slow.
Scalability: By reducing the amount of data that needs to be read and processed to execute a query, denormalization can enhance the scalability of a database. This can be useful when the database is anticipated to handle a large number of read-only queries (such as in OLAP use cases).
Ease of use: Denormalization can also make it easier for users to work with a database by providing them with a more intuitive and straightforward data model.
Because denormalization can create more complex update and delete operations, and pose potential data integrity issues, consider denormalization only if joins are causing performance bottlenecks.
Assuming that you've already got basic SQL skills, the next best tip we have to prepare for the Freshworks SQL interview is to solve as many practice SQL interview questions as you can! Besides solving the earlier Freshworks SQL interview questions, you should also solve the 200+ SQL questions on DataLemur which come from companies like Facebook, Google and unicorn tech startups.
Each SQL question has hints to guide you, detailed solutions and crucially, there is an online SQL coding environment so you can instantly run your SQL query and have it checked.
To prep for the Freshworks SQL interview you can also be useful to practice interview questions from other tech companies like:
But if your SQL coding skills are weak, forget about going right into solving questions – go learn SQL with this DataLemur SQL tutorial.
This tutorial covers SQL concepts such as UNION vs. joins and filtering data with WHERE – both of these show up routinely in SQL job interviews at Freshworks.
In addition to SQL query questions, the other topics to practice for the Freshworks Data Science Interview are:
To prepare for Freshworks Data Science interviews read the book Ace the Data Science Interview because it's got: