Back to questions
When accessing Accenture's retailer client's database, you observe that the category column in products table contains null values.
Write a query that returns the updated product table with all the category values filled in, taking into consideration the assumption that the first product in each category will always have a defined category value.
Assumptions:
Effective April 16, 2023, the question, assumptions, solution and hints were revised.
Column Name | Type |
---|---|
product_id | integer |
category | varchar |
name | varchar |
product_id | category | name |
---|---|---|
1 | Shoes | Sperry Boat Shoe |
2 | Adidas Stan Smith | |
3 | Vans Authentic | |
4 | Jeans | Levi 511 |
5 | Wrangler Straight Fit | |
6 | Shirts | Lacoste Classic Polo |
7 | Nautica Linen Shirt |
product_id | category | name |
---|---|---|
1 | Shoes | Sperry Boat Shoe |
2 | Shoes | Adidas Stan Smith |
3 | Shoes | Vans Authentic |
4 | Jeans | Levi 511 |
5 | Jeans | Wrangler Straight Fit |
6 | Shirts | Lacoste Classic Polo |
7 | Shirts | Nautica Linen Shirt |
Shoes will replace all values below the product Sperry Boat Shoe until Jeans appears. Similarly, Jeans will replace s for the product Wrangler Straight Fit, and so on.
The dataset you are querying against may have different input & output - this is just an example!
Step 1: Assigning Numbers to Categories using Window Function
We can start by creating an additional column to update the categories for products that have null values.
To achieve this, we can use the function as a window function, which assigns a numeric value to each category type. The function calculates the number of rows with non-null values in the column and the window function allows us to apply it to the entire result set.
The output of the query generates a numbered category for each row in the column.
For example, the first 5 rows of the output will look like this:
product_id | category | name | numbered_category |
---|---|---|---|
1 | Shoes | Sperry Boat Shoe | 1 |
2 | NULL | Adidas Stan Smith | 1 |
3 | NULL | Vans Authentic | 1 |
4 | Jeans | Levi 511 | 2 |
5 | NULL | Wrangler Straight Fit | 2 |
Product ID 1 is categorized as 'Shoes' and assigned the value of 1 in the column. For Product IDs 2 and 3, which have no category assigned, the counter remains at 1 in the column. However, Product ID 4 is labeled as 'Jeans', which is a non-null value, and thus the counter gets increased to 2 for this product.
Step 2: Filling Missing Categories using COALESCE() Function
Next, we fill the missing categories using the function.
How function works with two expressions:
For example, product IDs 1, 2, and 3 are assigned the value of 1 in the column. Since the maximum value within this partition is also 1, the function assigns the corresponding category value to product IDs 2 and 3, filling in the missing values based on the category of product ID 1.
Here's the final results:
product_id | category | name |
---|---|---|
1 | Shoes | Sperry Boat Shoe |
2 | Shoes | Adidas Stan Smith |
3 | Shoes | Vans Authentic |
4 | Jeans | Levi 511 |
5 | Jeans | Wrangler Straight Fit |
You may find this Stack Overflow post helpful as it provides a concise explanation of a similar question.
Solution #2: Using subquery