Ranking Products by Popularity and Category
This challenge focuses on using SQL window functions, specifically ROW_NUMBER() and RANK(), to analyze and rank product data. You'll be tasked with assigning ranks to products based on their sales volume, first overall and then within their respective categories. This is a common requirement in business intelligence and data analysis for identifying top performers and understanding relative performance.
Problem Description
You are provided with a table named Products containing information about individual products. The table has the following columns:
product_id(INTEGER): Unique identifier for each product.product_name(VARCHAR): The name of the product.category(VARCHAR): The category to which the product belongs.sales_volume(INTEGER): The total number of units sold for the product.
Your goal is to produce a report that includes:
- Overall Rank: Assign a rank to each product based on its
sales_volumein descending order (highest sales volume gets rank 1). UseROW_NUMBER()for this, as it assigns a unique sequential integer to each row within its partition, ensuring no ties in rank. - Rank within Category: Assign a rank to each product based on its
sales_volumewithin itscategory, also in descending order. UseRANK()for this, as it allows for ties in rank if products within the same category have the samesales_volume.
The output should contain the product_id, product_name, category, sales_volume, the calculated overall_rank, and the calculated category_rank.
Key Requirements:
- Use the
ROW_NUMBER()window function for the overall ranking. - Use the
RANK()window function for the category-specific ranking. - Order the rankings by
sales_volumein descending order. - The
ROW_NUMBER()function should not have a partition. - The
RANK()function should be partitioned bycategory.
Expected Behavior:
- Products with higher
sales_volumeshould receive a lower (better) rank. - In the
overall_rank, each product will have a distinct rank. - In the
category_rank, products within the same category that have the samesales_volumewill share the same rank, and the next rank will be skipped accordingly (e.g., 1, 2, 2, 4).
Edge Cases:
- Products with zero sales volume: These should be ranked accordingly, likely at the bottom of the overall and category rankings.
- Multiple products with identical sales volume:
ROW_NUMBER()will assign distinct sequential numbers.RANK()will assign them the same rank.
Examples
Example 1:
Input Products table:
| product_id | product_name | category | sales_volume |
|---|---|---|---|
| 101 | Laptop | Electronics | 500 |
| 102 | Keyboard | Electronics | 300 |
| 103 | Mouse | Electronics | 300 |
| 104 | T-Shirt | Apparel | 700 |
| 105 | Jeans | Apparel | 400 |
| 106 | Hat | Apparel | 700 |
Output:
| product_id | product_name | category | sales_volume | overall_rank | category_rank |
|---|---|---|---|---|---|
| 104 | T-Shirt | Apparel | 700 | 1 | 1 |
| 106 | Hat | Apparel | 700 | 2 | 1 |
| 101 | Laptop | Electronics | 500 | 3 | 2 |
| 105 | Jeans | Apparel | 400 | 4 | 3 |
| 102 | Keyboard | Electronics | 300 | 5 | 4 |
| 103 | Mouse | Electronics | 300 | 6 | 4 |
Explanation:
overall_rank: Products are ranked solely bysales_volumedescending. T-Shirt and Hat are tied in sales but get consecutive ranks due toROW_NUMBER().category_rank:- In 'Apparel', T-Shirt and Hat both have 700 sales, so they both get rank 1. Jeans has 400 sales, so it gets rank 3 (since ranks 1 and 2 are accounted for by the top two items).
- In 'Electronics', Laptop has 500 sales (rank 2 within category), Keyboard and Mouse both have 300 sales (rank 4 within category, as rank 3 is skipped because of the tie).
Example 2:
Input Products table:
| product_id | product_name | category | sales_volume |
|---|---|---|---|
| 201 | Book A | Books | 150 |
| 202 | Book B | Books | 250 |
| 203 | Book C | Books | 150 |
| 301 | Coffee Mug | Home Goods | 400 |
| 302 | Picture Frame | Home Goods | 400 |
| 303 | Candle | Home Goods | 350 |
Output:
| product_id | product_name | category | sales_volume | overall_rank | category_rank |
|---|---|---|---|---|---|
| 301 | Coffee Mug | Home Goods | 400 | 1 | 1 |
| 302 | Picture Frame | Home Goods | 400 | 2 | 1 |
| 303 | Candle | Home Goods | 350 | 3 | 3 |
| 202 | Book B | Books | 250 | 4 | 1 |
| 201 | Book A | Books | 150 | 5 | 2 |
| 203 | Book C | Books | 150 | 6 | 2 |
Explanation:
overall_rank: Coffee Mug and Picture Frame are tied for highest sales (400) but get distinct overall ranks.category_rank:- In 'Home Goods', Coffee Mug and Picture Frame share rank 1. Candle, with 350 sales, gets rank 3.
- In 'Books', Book B has 250 sales (rank 1). Book A and Book C, with 150 sales, share rank 2.
Constraints
product_idwill be unique for each product.sales_volumewill be a non-negative integer.- The number of products can range from 1 to 10,000.
- The number of unique categories can range from 1 to 100.
- The SQL query should execute efficiently, aiming for a performance that can handle the given constraints.
Notes
- Consider the order of operations for window functions. The
ORDER BYclause within theOVER()clause is crucial for determining the ranking. - Pay close attention to the difference between
ROW_NUMBER()andRANK(), particularly how they handle ties. - You will need to select all the original columns plus the two new rank columns.
- The final output does not need to be sorted in any specific order, but it's often helpful to display it in a logical order for verification.