At Doximity, SQL is often used for extracting patient healthcare data for insights, and upgrading database systems for better healthcare data management. Because of this, Doximity frequently asks SQL problems in interviews for Data Analytics, Data Science, and Data Engineering jobs.
As such, to help you practice for the Doximity SQL interview, we've collected 8 Doximity SQL interview questions – how many can you solve?
Doximity, as a professional medical network, values users who consistently engage in various activities such as posting articles, asking questions, answering questions, or interacting with other professionals. The frequent users hence become "whale" users for the platform. The challenge in this SQL problem is to identify these "whale" users. A user is considered "whale" if the user has posted more than 5 articles or more than 10 questions in the last month.
user_id | name | join_date |
---|---|---|
101 | John Doe | 01/10/2020 |
102 | Jane Smith | 05/05/2020 |
103 | Bob Johnson | 06/18/2021 |
post_id | user_id | post_date | post_type |
---|---|---|---|
2001 | 101 | 03/15/2022 | article |
2002 | 101 | 03/18/2022 | article |
2003 | 102 | 03/20/2022 | article |
2004 | 101 | 03/25/2022 | question |
2005 | 101 | 03/26/2022 | question |
2006 | 102 | 03/30/2022 | question |
This query first selects users from the user table. It then counts the number of posts each user has made for each type: 'article' or 'question', within the last month. If the count of posts exceeds the requisite number (greater than 5 for articles or greater than 10 for questions), that user is considered a "whale" user and is selected by the query.
Please note that the 'CURRENT_DATE - interval '1' month' syntax is specific to PostgreSQL and might need to be adjusted if using a different SQL dialect. For example, in MySQL, the equivalent command would be 'DATE_SUB(CURRENT_DATE, INTERVAL 1 MONTH)'.
To practice a related customer analytics question on DataLemur's free online SQL coding environment, try this Microsoft SQL Interview problem:
As a data analyst at Doximity, your task is to analyze the level of user engagement based on the number of articles read by each user per day. Engaged users are individuals who have read more than 3 articles in a day. Write a SQL query to find out the percentage of engaged users for each day.
Consider the following table:
activity_id | user_id | read_date | article_id |
---|---|---|---|
1 | 1001 | 2022-01-05 | 5001 |
2 | 1001 | 2022-01-05 | 5002 |
3 | 1002 | 2022-01-05 | 5003 |
4 | 1003 | 2022-01-05 | 5004 |
5 | 1001 | 2022-01-06 | 5005 |
6 | 1002 | 2022-01-06 | 5006 |
7 | 1004 | 2022-01-06 | 5007 |
8 | 1005 | 2022-01-06 | 5008 |
9 | 1004 | 2022-01-06 | 5009 |
10 | 1004 | 2022-01-06 | 5010 |
11 | 1004 | 2022-01-06 | 5011 |
This query first calculates the number of articles read by each user in each day, and then determines how many of those users are considered engaged (more than 3 articles read per day). The percentage of engaged users is calculated by dividing the number of engaged users by the total number of users per day. Note that PostgreSQL syntax is used here to express the data types involved.
For more window function practice, solve this Uber SQL Interview Question on DataLemur's online SQL code editor:
Similar to the and / operators, the PostgreSQL INTERSECT operator combines result sets of two or more statements into a single result set. However, only returns the rows that are in BOTH select statements.
For a concrete example, say you were on the Sales Analytics team at Doximity, and had data on sales leads exported from both HubSpot and Salesforce CRMs in two different tables. To write a query to analyze leads created after 2023 started, that show up in both CRMs, you would use the command:
Doximity is a social networking platform specifically for doctors and health care professionals. For the purpose of improving and analyzing the network, they want to find out the average number of connections each doctor has on their network. As a database manager, you've been asked to write a PostgreSQL based function that finds the average number of connections per doctor on the platform.
connection_id | doctor_id | connected_doctor_id | connection_date |
---|---|---|---|
1 | 001 | 002 | 06/08/2022 00:00:00 |
2 | 001 | 003 | 06/10/2022 00:00:00 |
3 | 002 | 001 | 06/18/2022 00:00:00 |
4 | 003 | 001 | 07/26/2022 00:00:00 |
5 | 003 | 002 | 07/05/2022 00:00:00 |
This query will first find the number of connections for each doctor by grouping on the field and counting the number of rows in each group. It will then take the average of these count values to find the average number of connections per doctor.
To practice a very similar question try this interactive Twitter Histogram of Tweets Question which is similar for calculating aggregated user activities or this Microsoft Teams Power Users Question which is similar for finding top users based on interactions.
A foreign key is a field in a database table that links to the primary key of another table, establishing a connection between the two tables. For example, let's analyze Doximity's Google Ads campaigns data:
:
+------------+------------+------------+------------+ | ad_id | campaign_id| keyword | click_count| +------------+------------+------------+------------+ | 1 | 201 | Doximity reviews | 120 | | 2 | 202 | Doximity pricing | 150 | | 3 | 101 | buy Doximity | 65 | | 4 | 101 | Doximity alternatives | 135 | +------------+------------+------------+------------+
is a foreign key that connects to the of the corresponding Google Ads campaign. This establishes a relationship between the ads and their campaigns, enabling easy querying to find which ads belong to a specific campaign or which campaigns a specific ad belongs to.
The table may also have multiple foreign keys that reference primary keys in other tables. For example, and foreign keys could be used to link each ad to its ad group and the Google Ads account that the campaigns belong to, respectively.
Doximity is a social network for physicians and advanced practice clinicians. As a data analyst for Doximity, your task is to analyze the click-through rate (CTR) from viewing doctor's profile pages to clicking the 'contact doctor' button.
A click-through rate is calculated as the number of clicks that your link receives divided by the number of times your link is shown, multiplied by 100 to get the percentage. In this task, we want to understand users' behavior in terms of, after they view a doctor's profile page, how likely they are to click the 'contact doctor' button.
Given the following tables:
view_id | user_id | view_date | doctor_id |
---|---|---|---|
8142 | 121 | 06/08/2022 00:00:00 | 30115 |
9821 | 254 | 06/08/2022 00:00:00 | 30115 |
6923 | 351 | 06/10/2022 00:00:00 | 61490 |
7552 | 481 | 06/11/2022 00:00:00 | 33366 |
4417 | 894 | 06/15/2022 00:00:00 | 33366 |
click_id | user_id | click_date | doctor_id |
---|---|---|---|
1701 | 121 | 06/08/2022 00:00:00 | 30115 |
2802 | 254 | 06/08/2022 00:00:00 | 30115 |
1903 | 351 | 06/10/2022 00:00:00 | 61490 |
2502 | 481 | 06/13/2022 00:00:00 | 33366 |
The task is to write a SQL query to calculate the click-through rate for each doctor's profile.
The above SQL statement first defines two CTEs: and , which count the total number of profile views and 'contact' button clicks for each doctor, respectively. Then it calculates the click-through rate (CTR) for each doctor by dividing the number of clicks by the number of views and multiplying by 100 (to get the percentage). Doctors with a higher CTR are listed first because of the clause.
To solve another question about calculating rates, solve this TikTok SQL question on DataLemur's online SQL coding environment:
"The clause in SQL allows you to select records that are unique, eliminating duplicates.
For a tangible example, say you had a table of Doximity employees:
first_name | job_title |
---|---|
Akash | Data Analyst |
Brittany | Data Scientist |
Carlos | Data Engineer |
Diego | Data Engineer |
Eva | Data Analyst |
If you were doing an HR Analytics project and you wanted to get all the unique job titles that currently worked at the company, you would write the following SQL query:
The output would give you 3 distinct job titles at Doximity:
job_title |
---|
Data Analyst |
Data Scientist |
Data Engineer |
Doximity, as a networking service caters to medical professionals. Imagine a scenario, where we want to calculate the weighted overall rating for each doctor in Doximity, where the weight is the number of patients a doctor has attended. As we know, a doctor who has attended more patients can have a more reliable rating compared to a doctor who has attended lesser patients.
For this question, I will create two tables, and . The table contains the id, name and the total number of patients each doctor has attended, while table contains the reviews submitted by patients for each doctor with ratings in the scale of 1 to 5.
Here's an example of how you might go about creating these tables:
doctor_id | name | total_patients |
---|---|---|
1 | Dr. John | 50 |
2 | Dr. Alice | 30 |
3 | Dr. Bob | 20 |
review_id | doctor_id | rating |
---|---|---|
1 | 1 | 4 |
2 | 1 | 5 |
3 | 2 | 4 |
4 | 2 | 3 |
5 | 3 | 2 |
6 | 3 | 3 |
We want to output a table that shows the Doctor ID, doctor name, and their weighted rating (rounded to 2 decimal places).
doctor_id | name | weighted_rating |
---|---|---|
1 | Dr. John | 4.50 |
2 | Dr. Alice | 3.50 |
3 | Dr. Bob | 2.50 |
This query first joins the and tables on the doctor_id. Then it groups the results by the doctor_id and doctor name. For each group (i.e., for each doctor), the query calculates the average rating multiplied by the logarithm of the total patients attended by the doctor (denoted by ) and rounds the result to two decimal places.
The reason we use the logarithm of as the weight factor here is to counteract the influence of doctors with extreme numbers of patients. This means that the ratings for doctors attending thousands of patients won't overwhelmingly dwarf the ratings for doctors attending a few hundred patients. Please note, you can modify the weight factor as per your weighting strategy.
Assuming that you've already got basic SQL skills, the next best tip we have to prepare for the Doximity SQL interview is to solve as many practice SQL interview questions as you can! Besides solving the above Doximity SQL interview questions, you should also solve the 200+ DataLemur SQL Interview Questions which come from companies like Facebook, Google, and VC-backed startups.
Each SQL question has multiple hints, fully explained answers along with a discussion board to see how others solved it and crucially, there is an online SQL coding environment so you can instantly run your SQL query and have it graded.
To prep for the Doximity SQL interview you can also be helpful to practice SQL problems from other tech companies like:
But if your SQL foundations are weak, forget about jumping right into solving questions – improve your SQL foundations with this SQL interview tutorial.
This tutorial covers SQL concepts such as math functions like ROUND()/CEIL() and CASE/WHEN statements – both of which show up frequently in SQL job interviews at Doximity.
In addition to SQL interview questions, the other types of problems to prepare for the Doximity Data Science Interview are:
The best way to prepare for Doximity Data Science interviews is by reading Ace the Data Science Interview. The book's got: