logo

11 DiDi SQL Interview Questions (Updated 2024)

Updated on

February 29, 2024

At DiDi (aka the Chinese version of Uber), SQL is used quite frequently for analyzing traffic data for route optimization and analyzing customer data for personalized marketing campaigns. Because of this, DiDi asks SQL problems in interviews for Data Analyst, Data Science, and BI jobs.

As such, to help you prepare for the DiDi SQL interview, this blog covers 11 DiDi SQL interview questions – able to answer them all?

11 DiDi SQL Interview Questions

SQL Question 1: Identify and Analyze DiDi's Power Users

The ride-hailing company DiDi has a database which tracks rides taken by its users. Your task is to identify the "power users". For this exercise, a "power user" is defined as a user who has completed over 20 rides in a month. Write a SQL query to identify these users and calculate the total kilometres and average cost of their rides for every month in 2022.

Your database has two tables: users and rides. The table has a one-to-many relationship with the table (each user can have multiple rides, but each ride belongs to only one user).

Sample data for the table:


Sample data for the table:


Answer:

This problem can be solved by using a combination of aggregation, filtering and joining operations on the and tables. Here's one possible way to write the SQL query in PostgreSQL format:


This query first filters the table to include only rides that took place in 2022. It then groups the results by user and month, and for each user and month, it calculates the number of rides taken, the total kilometres traveled and the average ride cost. Finally, it filters these groups to return only those where the number of rides taken is more than 20, and orders the results by user and month.

To practice a similar customer analytics SQL question where you can solve it interactively and have your SQL query automatically checked, try this Walmart SQL Interview Question: Walmart Labs SQL Interview Question

SQL Question 2: Analyzing Daily User Ride Duration

Consider a ride-sharing company like DiDi where you have a record of completed trips data. Your task is to write a SQL query that calculates the running seven days average of ride durations for each rider_id.

The company stores data on trip durations in the table, which has the following columns:

Example Input:
trip_idrider_idtrip_dateduration_minutes
10012022-07-0115
20022022-07-0120
30012022-07-0230
40022022-07-0225
50012022-07-0320
60022022-07-0330
70012022-07-0425

The output columns should include , and .

Example Output:
rider_idtrip_date7_days_avg_duration
0012022-07-0115.00
0022022-07-0120.00
0012022-07-0222.50
0022022-07-0222.50
0012022-07-0321.67
0022022-07-0325.00
0012022-07-0422.50

Answer:

You should use the window function with the window defined to be trailing seven days (considering current row) using . This gives the running seven days average.


This query first partitions the data by and then sorts it by within each partition. It evaluates the average for each row with the help of a window frame that includes the current row and the six preceding rows. For the first six rows of each , the average is computed only for the available rows.

To practice a related window function SQL problem on DataLemur's free interactive SQL code editor, try this Google SQL Interview Question: Google SQL Interview Question

SQL Question 3: How can you identify duplicates in a table? Do have any other approaches?

One way to find duplicate records in a table is by using , and then seeing which groups have more than one occurence:


Another way is by using the operator:


DiDi SQL Interview Questions

SQL Question 4: Analyzing Click-through and Conversion Rates

DiDi, a ride-hailing service, has launched a new promotional campaign. They want to measure the campaign's performance by analyzing the Click-through and Conversion rates. A user first views a promotional ad (Click-through) and then books a ride (Conversion). Here are the sample datasets.

Example Input:
ad_iduser_idclick_date
10112306/08/2022
10226506/10/2022
10136206/18/2022
10319207/26/2022
10298107/05/2022
Example Input:
booking_iduser_idad_idbooking_date
800112310106/09/2022
805026510206/11/2022
850236210106/19/2022
8259192null07/26/2022
890198110207/06/2022

Answer:


This query performs a left join of the table (a) and the table (b) on both and columns. Then, for each ad (), it calculates the Conversion Rate as the ratio of distinct users who booked a ride to the distinct users who clicked the ad. If none of the users clicked the ad (denominator is 0), prevent division by zero, returning .

To practice a similar problem about calculating rates, try this TikTok SQL question within DataLemur's online SQL code editor: TikTok SQL Interview Question

SQL Question 5: What does adding 'DISTINCT' to a SQL query do?

If you want to return records with no duplicates, you can use the keyword in your statement.

For example, if you had a table of DiDi employees, and wanted to see what are all the unique job titles that currently worked at the company, you could write the following query:


If had the following data:

f_namejob_title
AkashData Analyst
BrittanyData Scientist
CarlosData Engineer
DiegoData Engineer
EvaData Analyst

Then the output from the query would be:

job_title
Data Analyst
Data Scientist
Data Engineer

SQL Question 6: Average Ride Duration for Each Car Type in DiDi

Imagine you work at DiDi, the company that provides ride-hailing services. You are asked to analyze the ride durations for each car type to understand the product usage and customer experience. Given the rides data, how would you find out what is the average ride duration for each car type?

Here's an example of how the table is structured:

Example Input:
ride_iduser_idride_datecar_typeduration_minutes
200112306/08/2022Luxury30
200226506/10/2022Standard45
200336206/18/2022Luxury60
200419207/26/2022Standard20
200598107/05/2022Luxury45

Write a SQL query that will return the average ride duration for each car type.

Example Output:
car_typeavg_duration_minutes
Luxury45.00
Standard32.50

Answer:


The SQL query uses the statement coupled with the function to find the average duration of rides for each car type. It pulls data from the column and groups them by the values in the column. It then calculates the average duration for each group.

SQL Question 7: What does it mean to normalize a database?

Normalization involves breaking up your tables into smaller, more specialized ones and using primary and foreign keys to define relationships between them. Not only does this make your database more flexible and scalable, it also makes it easier to maintain. Plus, normalization helps to keep your data accurate by reducing the chance of inconsistencies and errors.

The only downside is now is that your queries will involve more joins, which are slow AF and often a DB performance botteleneck.

SQL Question 8: Filter Customer Records Based on Specific Pattern

DiDi is a major player in the ride-hailing market. They have a huge customer database consisting of customer names, email addresses and their ride details. There is a requirement to find the details of customers whose email addresses belong to a particular domain, say 'gmail.com'.

We need to filter the records of DiDi's customers present in their "customers" database that have 'gmail.com' in their email addresses.

Example Input:
customer_idnameemailride_countregistration_date
101John Doejohn.doe@gmail.com1501/10/2019
102Jane Smithjane.smith@yahoo.com2215/04/2017
103Alicia Brownaliciabrown@gmail.com3020/11/2018
104Michael Blackmichaelblack@hotmail.com4501/01/2020
105Samuel Greensamuel.green@yahoo.com3505/06/2018
106Anna Whiteannawhite@gmail.com1801/12/2019
Example Output:
customer_idnameemailride_countregistration_date
101John Doejohn.doe@gmail.com1501/10/2019
103Alicia Brownaliciabrown@gmail.com3020/11/2018
106Anna Whiteannawhite@gmail.com1801/12/2019

Answer:


In this PostgreSQL query, we are making use of the SQL 'LIKE' keyword to filter information from the 'customers' table. The "%" before and after 'gmail.com' in the 'LIKE' clause are wildcards that match any sequence of characters. So, the query is searching for any data entry in the 'email' column that contains 'gmail.com'. The output is a table listing the details of all customers whose email addresses end with 'gmail.com'.

Sure, here is an example of a SQL join interview question that involves joining tables and analyzing the customer behavior data at DiDi.

SQL Question 9: Analyzing Customer Ride Data and Payment Methods

As a data analyst at DiDi, you are tasked to analyze the customer ride data. You need to find out which payment method is most popular among customers for each ride type. Write a SQL query to solve this problem.

You are provided with two tables:

table:

ride_idcustomer_idride_typestart_time
12Economy2021-05-09 08:00:00
23Luxury2021-05-09 09:00:00
32Economy2021-05-10 10:00:00
41Luxury2021-05-11 11:00:00

table:

payment_idride_idpayment_typepayment_amount
11Credit Card30.00
22WeChat150.00
34AliPay200.00
43WeChat35.00

Answer:

In PostgreSQL, you could use a subquery to first find out the count of each payment type per ride type. The 'rank' function can then be used to select the most popular payment method for each ride type.


This query will return the most popular payment method for each type of ride based on the number of times each payment method was used for each ride type. If more than one payment method shares the top spot for a certain ride type, all will be included in the output.

Since joins come up frequently during SQL interviews, try this interactive Snapchat JOIN SQL interview question: Snapchat SQL Interview question using JOINS

SQL Question 10: Can you describe the meaning of a constraint in SQL in layman's terms?

SQL constraints are used to specify rules for the data in a table. They can be applied to single or multiple fields in a table when the table is created, or after the table has been created using the ALTER TABLE command.

For example, say you had a database that stores ad campaign data from DiDi's Google Analytics account.

Here's what some constraints could look like:


SQL Question 11: Mathematical Trip Data Analysis

Based on the trip data from the DiDi app, calculate the average trip distance from each starting location, round the result to 2 decimal places, and find the absolute difference between the maximum and minimum average trip distance. Also, find the square root of the total number of trips made from each starting location. Assume that the starting location is stored as a string.

Example Input:
trip_idstart_locationend_locationtrip_distance (km)
1'New York''Los Angeles'500
2'New York''San Francisco'480
3'San Francisco''Los Angeles'385
4'San Francisco''New York'520
5'Los Angeles''New York'515
Example Output:
start_locationaverage_distancedistance_differencesqrt_of_total_trips
'New York'490.0035.0014.14
'San Francisco'452.5067.5014.14
'Los Angeles'515.000.001.00

Answer:


In the provided solution, a common table expression is created calculating the average trip distance and the total number of trips group by . Another CTE calculates the absolute difference of the maximum and minimum average distance for each .

Finally, the necessary information is selected from these CTEs and the square root of the total number of trips is also calculated.

To practice a very similar question try this interactive Amazon Highest-Grossing Items Question which is similar for requiring calculations with aggregate functions or this Alibaba Compressed Mean Question which is similar for providing statistics and rounding values to a specific decimal place.

Preparing For The DiDi SQL Interview

The key to acing a DiDi SQL interview is to practice, practice, and then practice some more! Beyond just solving the earlier DiDi SQL interview questions, you should also solve the 200+ DataLemur SQL Interview Questions which come from companies like Microsoft, Google, and Facebook. DataLemur SQL Interview Questions

Each interview question has multiple hints, full answers and most importantly, there's an interactive coding environment so you can right online code up your SQL query and have it graded.

To prep for the DiDi SQL interview it is also helpful to practice SQL problems from other tech companies like:

However, if your SQL coding skills are weak, forget about jumping right into solving questions – improve your SQL foundations with this SQL interview tutorial.

SQL interview tutorial

This tutorial covers topics including window functions and removing NULLs – both of these pop up often during DiDi SQL assessments.

DiDi Data Science Interview Tips

What Do DiDi Data Science Interviews Cover?

In addition to SQL interview questions, the other question categories covered in the DiDi Data Science Interview are:

  • Probability & Stats Questions
  • Coding Questions in Python or R
  • Analytics and Product-Metrics Questions
  • Machine Learning and Predictive Modeling Questions
  • Behavioral Based Interview Questions

DiDi Data Scientist

How To Prepare for DiDi Data Science Interviews?

The best way to prepare for DiDi Data Science interviews is by reading Ace the Data Science Interview. The book's got:

  • 201 Interview Questions from Facebook, Google, & Amazon
  • A Refresher on Stats, SQL & ML
  • Great Reviews (900+ 5-star reviews on Amazon)

Ace the Data Science Interview Book on Amazon