Google almost always asks SQL problems during interviews for Data Analytics, Data Science, and Data Engineering jobs. To help you land your dream job in data at Google, practice these 13 REAL Google SQL interview questions which we've curated and solved for you.
While these problems are a grind to practice, preparing for Google is SO worth it, and I'm speaking from experience – I interned as a Google Data Engineer in 2016 and it was a BLAST:
Google's marketing team is making a Superbowl commercial and needs a simple statistic to put on their TV ad: the median number of searches a person made last year.
However, at Google scale, querying the 2 trillion searches is too costly. Luckily, you have access to the summary table which tells you the number of searches made last year and how many Google users fall into that bucket.
Write a query to report the median of searches made by a user. Round the median to one decimal point.
Column Name | Type |
---|---|
searches | integer |
num_users | integer |
searches | num_users |
---|---|
1 | 2 |
2 | 2 |
3 | 3 |
4 | 1 |
median |
---|
2.5 |
To practice this question on our interactive coding site try this, Google SQL Interview Question.
Assume you're given a table with measurement values obtained from a Google sensor over multiple days with measurements taken multiple times within each day.
Write a query to calculate the sum of odd-numbered and even-numbered measurements separately for a particular day and display the results in two different columns.
measurement_id | measurement_value | measurement_time |
---|---|---|
131233 | 1109.51 | 07/10/2024 09:00:00 |
135211 | 1662.74 | 07/10/2024 11:00:00 |
523542 | 1246.24 | 07/10/2024 13:15:00 |
143562 | 1124.50 | 07/11/2024 15:00:00 |
346462 | 1234.14 | 07/11/2024 16:45:00 |
measurement_day | odd_sum | even_sum |
---|---|---|
07/10/2024 00:00:00 | 2355.75 | 1662.74 |
07/11/2024 00:00:00 | 1124.50 | 1234.14 |
Based on the results,
Before we reveal the solution, try this Google SQL Interview Question yourself on DataLemur:
As a Data Analyst on the Google Maps User Generated Content team, you and your Product Manager are investigating user-generated content (UGC) – photos and reviews that independent users upload to Google Maps.
Write a query to determine which type of place () attracts the most UGC tagged as "off-topic". In the case of a tie, show the output in ascending order of .
place_id | place_name | place_category |
---|---|---|
1 | Baar Baar | Restaurant |
2 | Rubirosa | Restaurant |
3 | Mr. Purple | Bar |
4 | La Caverna | Bar |
content_id | place_id | content_tag |
---|---|---|
101 | 1 | Off-topic |
110 | 2 | Misinformation |
153 | 2 | Off-topic |
176 | 3 | Harassment |
190 | 3 | Off-topic |
off_topic_places |
---|
Restaurant |
The restaurants (Baar Baar and Rubirosa) have a total of has 2 UGC posts tagged as "off-topic". The bars only have 1. Restaurant is shown here because it's the type of place with the most UGC tagged as "off-topic".
The above is a partial-solution – DataLemur Premium users can unlock multiple hints, and the full solution to this Google SQL Interview problem here:
For this scenario, assume that Google wants to analyze the top searched categories in their platform to optimize their search results. We have two tables, which has information about each search, and where every category ID is associated with a category name.
The table has the following structure:
search_id | user_id | search_date | category_id | query |
---|---|---|---|---|
1001 | 7654 | 06/01/2024 00:00:00 | 3001 | "chicken recipe" |
1002 | 2346 | 06/02/2024 00:00:00 | 3001 | "vegan meal prep" |
1003 | 8765 | 06/03/2024 00:00:00 | 2001 | "google stocks" |
1004 | 9871 | 07/01/2024 00:00:00 | 1001 | "python tutorial" |
1005 | 8760 | 07/02/2024 00:00:00 | 2001 | "tesla stocks" |
The table has the following structure:
category_id | category_name |
---|---|
1001 | "Programming Tutorials" |
2001 | "Stock Market" |
3001 | "Recipes" |
4001 | "Sports News" |
The question is: Can you write a SQL query that gives the total count of searches made in each category by month for the available data in the year 2024?
Expected output:
category_name | month | total_searches |
---|---|---|
"Programming Tutorials" | 07 | 1 |
"Stock Market" | 06 | 1 |
"Stock Market" | 07 | 1 |
"Recipes" | 06 | 2 |
You can implement the SQL Window function to solve this:
This query first join the and tables, and then partition the results by and the month of the . The clause counts the number of searches in each partition. The WHERE filter ensure the data used is from the year 2024.
Finally, we order the results by in descending order to get the categories with the most searches by month at the top.
Denormalization is the process of modifying a database schema in a way that deviates from the typical rules of normalization (1NF, 2NF, 3NF, etc.).
Denormalization is often used to improve the performance of a database, particularly when it is being used for reporting and analytical purposes (rather than in an Online Transaction Processing (OLTP) manager).
By duplicating data, denormalization can reduce the number of expensive joins required to retrieve data, which can improve query performance. However, denormalization can also cause problems such as increased data redundancy and the need for more complex update and delete operations.
As a data analyst at Google, you are tasked with examining the Google Ads data for better ad placement and customer targeting. You are asked to retrieve all records of ads from the database that fall into the following coditions:
Your task is to write a query to filter down the records following these conditions.
ad_id | name | status | impressions | last_updated |
---|---|---|---|---|
1234 | Google Phone | active | 600000 | 06/25/2024 12:00:00 |
5678 | Google Laptop | inactive | 800000 | 05/18/2024 12:00:00 |
9012 | Google App | active | 300000 | 04/02/2024 12:00:00 |
3456 | Google Cloud | active | 700000 | 08/12/2024 12:00:00 |
7890 | Google Mail | inactive | 550000 | 09/03/2024 12:00:00 |
The query first identifies rows where the value is 'active', then further filters the result to show those rows only with 'impressions' more than 500,000, and then finally filters on year.
Stored procedures are a lot like functions in programming. They're used to encapsulate and organize business logic into one unit of code, and they can accept multiple input parameters and return multiple output values.
For example, if you were a Data Analyst at Google working on a HR analytics project, you might create a stored procedure to calculate the average salary for a given department:
To call this stored procedure and find the average salary for the Data Analytics department you'd write the following query:
Google's marketing team is making a Superbowl commercial and needs a simple statistic to put on their TV ad: the median number of searches a person made last year.
However, at Google scale, querying the 2 trillion searches is too costly. Luckily, you have access to the summary table which tells you the number of searches made last year and how many Google users fall into that bucket.
Write a query to report the median of searches made by a user. Round the median to one decimal point.
Before we reveal the answer, please try coding this up yourself on DataLemur's SQL Interview platform:
The / operator is used to remove to return all rows from the first SELECT statement that are not returned by the second SELECT statement.
Note that is available in PostgreSQL and SQL Server, while MINUS is available in MySQL and Oracle (but don't stress about knowing which DBMS supports what exact commands since the interviewers at Google should be lenient!).
As a data analyst on Google Shopping, one of your tasks is to monitor the efficiency of various Google Shopping ads. Specifically, you are interested in the click-through rate (CTR) and conversion rate (each click that results in placing an item into the shopping cart).
Given the two data tables and , write a SQL query to assess the click-through rate (CTR) and conversion rate for each ad.
ad_id | user_id | click_date |
---|---|---|
1001 | 123 | 06/08/2024 |
1002 | 265 | 06/10/2024 |
1001 | 362 | 06/18/2024 |
1003 | 192 | 07/26/2024 |
1002 | 981 | 07/05/2024 |
ad_id | user_id | cart_date |
---|---|---|
1001 | 123 | 06/08/2024 |
1003 | 192 | 07/26/2024 |
1002 | 265 | 06/11/2024 |
Above we join and tables on and , group by and compute the counts of distinct s in both tables.
The conversion rate is calculated as the ratio of to !
In case this marketing problem was interesting, checkout how this blog on how marketing analysts use SQL.
As a data analyst on the advertiser solutions team at Google, your task is to analyze the performance of various ad campaigns running on Google AdWords for a F500 client. You were asked to find the average cost per click (CPC) for each campaign and each ad group within those campaigns for the previous month. CPC is calculated as the total cost of all clicks divided by the number of clicks.
For this task, you have been given access to the table which stores data about each click on the ads.
click_id | date | campaign_id | ad_group_id | clicks | cost |
---|---|---|---|---|---|
4325 | 06/08/2024 | 1302 | 2001 | 50 | 100.00 |
4637 | 06/10/2024 | 1403 | 2002 | 65 | 130.00 |
4876 | 06/18/2024 | 1302 | 2001 | 70 | 140.00 |
4531 | 07/05/2024 | 1604 | 3001 | 80 | 200.00 |
4749 | 07/05/2024 | 1604 | 2002 | 75 | 180.00 |
You'd like to return an output table in the following format:
campaign_id | ad_group_id | avg_CPC |
---|---|---|
1302 | 2001 | 2.4 |
1403 | 2002 | 2.0 |
1604 | 3001 | 2.50 |
1604 | 2002 | 2.4 |
This query returns the average cost per click (avg_CPC) for each campaign and ad group by grouping the table by and and then calculating the average CPC as per the formula given, i.e., total cost divided by total number of clicks.
p.s. If you have DataLemur Premium, you'll also get access to a similar Google Ad Campaign ROAS problem.
A foreign key is a column or group of columns in a table that refers to the primary key in another table. The foreign key constraint helps maintain referential integrity between the two tables.
Let's examine employee data from Google's HR database:
:
+------------+------------+------------+------------+ | employee_id| first_name | last_name | manager_id | +------------+------------+------------+------------+ | 1 | Aubrey | Graham | 3 | | 2 | Marshal | Mathers | 3 | | 3 | Dwayne | Carter | 4 | | 4 | Shawn | Carter | | +------------+------------+------------+------------+
In this table, serves as the primary key and functions as a foreign key because it links to the of the employee's manager. This establishes a relationship between Google employees and their managers, allowing for easy querying to find an employee's manager or see which employees report to a specific manager.
The table may also have multiple foreign keys that reference primary keys in other tables. For example, and foreign keys could be used to connect each employee to their respective department and location.
As a data analyst at Google on the Android PlayStore team, you are tasked with providing insights into in-app purchases made via the PlayStore.
Write a SQL query to get a list of customers along with their last purchase. The result should contain customer_id, first name, last name, product, and latest purchase date.
customer_id | first_name | last_name | app |
---|---|---|---|
1 | John | Doe | Tinder |
2 | Jane | Smith | CandyCrush |
3 | Jack | Brown | Fortnite |
4 | Emily | Johnson | Uber |
5 | Jake | Kenny | Google Music |
purchase_id | customer_id | price | date |
---|---|---|---|
101 | 1 | 79.99 | 2024-02-23 |
102 | 2 | 49.99 | 2024-03-18 |
103 | 3 | 89.99 | 2024-06-08 |
104 | 4 | 119.99 | 2024-07-05 |
The query joins the Customers and Google PlayStore Purchases tables on the "customer_id" field. It then groups by the customer's details to show one row per customer. In addition, the function is used to find the latest (most recent) purchase date for each customer.
Because joins come up frequently during SQL interviews, try this interactive Snapchat JOIN SQL interview question:
The Google Data Analytics Certificate isn't enough to land a job at Google. The key to crushing Google SQL interview questions is to practice, practice, and then practice some more! Besides solving the earlier Google SQL interview questions, you should also solve the 200+ DataLemur SQL Interview Questions which come from companies like FAANG tech companies and tech startups.
Each SQL question has hints to guide you, step-by-step solutions and crucially, there's an interactive coding environment so you can right online code up your query and have it checked. Please, for the love of god, don't just read the questions & solutions – actually DO the damn problems, otherwise when the Google interviewer asks you a question you'll be like:
To prep for the Google SQL interview it's also wise to solve interview questions from other FAANG tech companies like:
But if your SQL coding skills are weak, forget about going right into solving questions – improve your SQL foundations with this SQL tutorial for Data Analytics.
This tutorial covers SQL topics like grouping by multiple columns and WHERE with AND/OR/NOT – both of these come up frequently in Google interviews.
For the Google Data Science Interview, in addition to SQL query questions, the other types of questions to prepare for are:
If this sounds like a ton of topics... it is! Google can afford to be very thorough, which is why candidates often have to do 5 or more interviews to land a Google Data Science job.
To prepare for Google Data Science interviews read the book Ace the Data Science Interview because it's got:
Completing Data Science Bootcamps and Certifications are another great way to show employers that you're all about learning. Read about how they can help you Ace your next Data Science Interview.
Google looks at more than just your SQL skills. Stay up to date and learn about the top 10 programming languages Data Scientists are using in 2024.