At Naver, SQL does the heavy lifting for extracting and analyzing user behavior data to improve it's search engine, which serves ~75% of the South Korean market (~45 million users). Because of this, Naver frequently asks SQL coding questions in interviews for Data Science and Data Engineering positions.
So, to help you prepare for the Naver SQL interview, we've curated 9 Naver SQL interview questions – how many can you solve?
As a data analyst at Naver, you have been given access to the reviews data that contains the reviews submitted by users for various products on the platform. Your task is to write a SQL query that calculates the average rating for each product on a monthly basis. Product ratings range from 1 to 5 (5 being the highest).
review_id | user_id | submit_date | product_id | stars |
---|---|---|---|---|
6171 | 123 | 2022-06-08 | 50001 | 4 |
7802 | 265 | 2022-06-10 | 69852 | 4 |
5293 | 362 | 2022-06-18 | 50001 | 3 |
6352 | 192 | 2022-07-26 | 69852 | 3 |
4517 | 981 | 2022-07-05 | 69852 | 2 |
mth | product_id | avg_stars |
---|---|---|
6 | 50001 | 3.50 |
6 | 69852 | 4.00 |
7 | 69852 | 2.50 |
Here is the SQL query written in PostgreSQL that achieves the desired result:
This query uses the function to get the month from the submit_date. It then groups the data by this month and the product_id (so one row per product per month) and calculates the average stars using the function. The result is sorted by and for easier readability.
p.s. Window functions show up super often during SQL interviews, so practice the 27+ window function questions on DataLemur
Naver is a multinational technology company which offers various services including a blogging platform. The company is interested in gauging user activity on the blogs. Suppose that Naver records each time a user writes a blog post or a comment in a table named . The data fields include a unique , the , the ('Blog Post' or 'Comment'), the , the , and the . The refers to the identifier for a blog post the user wrote and the refers to a blog post on which the user has commented.
A business question arises: "What are the top 3 most active users in term of posting blogs and comments, for every month during year 2022?". Active is defined as a user who has the most number of total posts and comments. If there's a tie in number, the earlier registered users will get the preference.
Provide the right SQL query that will solve this problem assuming we are using a PostgreSQL database.
activity_id | user_id | activity_type | activity_date | blog_id | comment_id |
---|---|---|---|---|---|
456 | 301 | Blog Post | 01/02/2022 00:00:00 | 78 | null |
457 | 301 | Comment | 02/10/2022 00:00:00 | null | 60 |
458 | 302 | Blog Post | 03/15/2022 00:00:00 | 81 | null |
459 | 303 | Blog Post | 03/18/2022 00:00:00 | 82 | null |
460 | 303 | Comment | 04/18/2022 00:00:00 | null | 61 |
461 | 304 | Blog Post | 02/20/2022 00:00:00 | 83 | null |
462 | 304 | Comment | 02/23/2022 00:00:00 | null | 62 |
463 | 305 | Blog Post | 05/11/2022 00:00:00 | 84 | null |
464 | 303 | Comment | 06/1/2022 00:00:00 | null | 63 |
465 | 305 | Blog Post | 07/17/2022 00:00:00 | 85 | null |
466 | 305 | Comment | 08/19/2022 00:00:00 | null | 64 |
This SQL query uses common table expressions (CTEs) and window functions to determine the top 3 users with the most activities (composed of blog posts and comments) for each month in 2022. The CTE groups the activities by users and months, while the CTE ranks these users based on their total activities per month. In case of ties, it gives preference to users with lower IDs (assuming lower IDs represent earlier registered users). The final SELECT statement filters out only the top 3 users for each month.
A is a column or set of columns in a table that references the primary key of another table. It is used to link the two tables together, and to ensure that the data in the foreign key column is valid.
The constraint helps to ensure the integrity of the data in the database by preventing the insertion of rows in the foreign key table that do not have corresponding entries in the primary key table. It also helps to enforce the relationship between the two tables, and can be used to ensure that data is not deleted from the primary key table if there are still references to it in the foreign key table.
For example, if you have a table of Naver customers and an orders table, the customer_id column in the orders table could be a that references the id column (which is the primary key) in the Naver customers table.
Naver Corporation, a South Korean online platform operated by Naver Corporation, provides various services such as search engine, e-commerce, news, etc. For this question, assume that you are a data analyst at Naver with access to the table which tracks user activity on various Naver services every day. This table has the following structure:
activity_id | user_id | activity_date | service_id | usage_time (in minutes) |
---|---|---|---|---|
9171 | 123 | 06/08/2022 | 5001 | 44 |
8902 | 265 | 06/10/2022 | 9852 | 34 |
8293 | 362 | 06/18/2022 | 5001 | 53 |
9352 | 192 | 07/26/2022 | 9852 | 23 |
8517 | 981 | 07/05/2022 | 9852 | 42 |
You have been asked to write an SQL query to find out the average usage time per day for each service in the month of June 2022.
mth | service_id | avg_usage_time |
---|---|---|
6 | 5001 | 48.50 |
6 | 9852 | 34.00 |
This query utilizes the GROUP BY clause to produce a result set that is grouped by the (month from activity_date) and . The function is then used within this grouped result set to calculate the average for each group, providing the average usage time per day for each service in June 2022.
NULLs are NOT the same as zero or blank spaces in SQL. NULLs are used to represent a missing value or the abscence of a value, whereas zero and blank space are legitimate values.
It's important to handle NULLs carefully, because they can mess up your analysis very easily. For example, if you compare a NULL value using the = operator, the result will always be NULL (because just like Drake, nothing be dared compared to NULL). That's why many data analysis in SQL start with removing NULLs using the function.
In the company Naver, it is often important to find specific customer records for communication and marketing purposes. Assume that you're given a customer database and you're asked to find all customers whose last names start with 'Kim' and live in 'Seoul'.
customer_id | first_name | last_name | city |
---|---|---|---|
385 | Ha-Jin | Kim | Seoul |
971 | Sun-Yeong | Park | Busan |
110 | Sing-Hwa | Lee | Gwangju |
501 | Kye-Sook | Choi | Seoul |
608 | Seo-Hyeon | Kim | Seoul |
492 | Hae-Won | Park | Seoul |
721 | Chan-Ho | Kim | Daegu |
To answer this question, we use PostgreSQL’s keyword to filter our records based on a specific string pattern. In our command, we ask for all records () from the 'customers' table where the 'last_name' column starts with 'Kim' and 'city' equals 'Seoul'. The '%' symbol is a wild card that matches any sequence of characters. We don't use it in this query because we are matching an exact string 'Kim', and not a pattern.
No, in almost all cases, and for all practical purposes, and do NOT produce the same result.
While both are similar, in that they combine two tables, you can think of joins as increasing the width of the resulting table (you'll have more columns in the result set for a left/inner/right join), whereas a union is used to combine rows which increases the height of the result set but keeps the column count the same.
Given the table which records the amount of every transaction in a Naver e-commerce platform, write a SQL query to calculate the average amount spent by each customer after tallying all his or her transactions, rounding the average spent amount to no decimal places, and create a new column to indicate if the average amount is more than 90000 by giving it a marking of "POWER_Spender" using conditional statements and POWER() function.
Here are the example 'transactions' table content:
transaction_id | customer_id | transaction_date | amount_spent |
---|---|---|---|
1 | 987 | 06/08/2022 00:00:00 | 95000 |
2 | 123 | 06/10/2022 00:00:00 | 98000 |
3 | 987 | 06/18/2022 00:00:00 | 93000 |
4 | 123 | 07/26/2022 00:00:00 | 91000 |
5 | 192 | 07/05/2022 00:00:00 | 96000 |
customer_id | average_spent | spender_type |
---|---|---|
123 | 94500 | POWER_Spender |
192 | 96000 | POWER_Spender |
987 | 94000 | POWER_Spender |
Here is a PostgreSQL answer for the problem:
This query would calculate the average spend for each customer and round it to no decimal places. It then checks with the POWER() function if the square of the average amount spent is greater than 90000 squared indicating high-spending customer, allocating 'POWER_Spender' for them, and 'Regular_Spender' otherwise. We use the grouping clause to group all records with the same together.
The 2 most similar questions to the Naver e-commerce platform SQL question, based on the requirement of calculating averages, rounding, and categorization/grouping of users, are:
"Average Review Ratings" by Amazon: This problem also deals with calculating averages (of review ratings), and aggregates data on a category level (each product per month).
"User's Third Transaction" by Uber: This problem deals with transactions and might provide useful insight on handling transaction data, even though it uses a RANK function instead of average function.
Hence, the markdown-friendly format based on the problem details is:
To practice a very similar question try this interactive Amazon Average Review Ratings Question which is similar for average calculation and categorization or this Uber User's Third Transaction Question which is similar for transaction data handling.
Naver, one of the largest Internet companies in South Korea, would like to analyze the monthly traffic to one of its specific websites. The page views for users who visit the site are recorded in a table called . Each row in the table corresponds to a user's click on the website on a specific day.
You are expected to write a SQL query that calculates the average number of page views for this specific website, on a monthly basis, from the beginning of the year until the present month (assume current month is August, 2022).
click_id | click_date | user_id | page_id |
---|---|---|---|
3125 | 02/02/2022 | 9231 | 1089 |
5930 | 02/05/2022 | 8620 | 1089 |
6349 | 03/06/2022 | 5276 | 1089 |
8496 | 03/10/2022 | 3982 | 1089 |
2793 | 04/25/2022 | 7863 | 1089 |
Note: A single user can click the page several times in a month, and each click will be recorded as a separate row.
This query will fetch and count all the click event records (average monthly views) for the page with id 1089 between January and August in 2022. It then groups the views by month, using the TO_CHAR function to format the timestamp data as 'YYYY-MM'.
The key to acing a Naver SQL interview is to practice, practice, and then practice some more!
Beyond just solving the above Naver SQL interview questions, you should also solve the 200+ SQL questions from real Data Science & Analytics interviews which come from companies like Microsoft, Google, and Facebook.
Each SQL question has multiple hints, detailed solutions and best of all, there is an online SQL coding environment so you can right in the browser run your SQL query and have it executed.
To prep for the Naver SQL interview it is also a great idea to practice interview questions from other tech companies like:
But if your SQL skills are weak, forget about going right into solving questions – improve your SQL foundations with this free SQL for Data Analytics course.
This tutorial covers SQL concepts such as filtering groups with HAVING and filtering strings using LIKE – both of these come up often during SQL job interviews at Naver.
In addition to SQL query questions, the other question categories to prepare for the Naver Data Science Interview are:
The best way to prepare for Naver Data Science interviews is by reading Ace the Data Science Interview. The book's got: