Hone logo
Hone
Problems

Analyzing Sales Trends with Advanced Window Frames in SQL

This challenge focuses on leveraging advanced window frame specifications in SQL to analyze sales data and identify trends over time. Understanding how to use window frames allows for powerful calculations like moving averages, cumulative sums, and percentile analysis without requiring self-joins or complex subqueries, enabling efficient and insightful data exploration. You will be provided with a sales dataset and asked to calculate various metrics using window frame clauses.

Problem Description

You are given a table named sales with the following columns:

  • sale_id (INTEGER): Unique identifier for each sale.
  • product_id (INTEGER): Identifier for the product sold.
  • sale_date (DATE): Date of the sale.
  • sale_amount (DECIMAL): Amount of the sale.

Your task is to write SQL queries to calculate the following metrics using window frame specifications:

  1. 3-Month Moving Average: Calculate the 3-month moving average of sale_amount for each product. The moving average should be calculated over the preceding 3 months, including the current month.
  2. Cumulative Sales: Calculate the cumulative sum of sale_amount for each product up to each sale_date.
  3. Percentile Sales (75th): Calculate the 75th percentile of sale_amount for each product within each month.
  4. Rank Sales by Amount: Rank each sale within each product based on sale_amount in descending order.

The queries should be efficient and utilize window frame clauses effectively. Consider how the ORDER BY, PARTITION BY, and ROWS BETWEEN clauses interact to achieve the desired results.

Examples

Example 1:

Input:
sales table:
sale_id | product_id | sale_date  | sale_amount
------- | ---------- | ---------- | -----------
1       | 101        | 2023-01-15 | 100.00
2       | 101        | 2023-02-20 | 120.00
3       | 101        | 2023-03-10 | 150.00
4       | 101        | 2023-04-05 | 130.00
5       | 101        | 2023-05-12 | 160.00
6       | 102        | 2023-01-25 | 80.00
7       | 102        | 2023-02-10 | 90.00
8       | 102        | 2023-03-05 | 110.00

Output (3-Month Moving Average for product 101):
sale_date  | moving_average
---------- | --------------
2023-01-15 | 100.00
2023-02-20 | 110.00
2023-03-10 | 123.33
2023-04-05 | 133.33
2023-05-12 | 146.67

Explanation: The moving average is calculated by averaging the sale amounts for the current month and the two preceding months.

Example 2:

Input:
sales table:
sale_id | product_id | sale_date  | sale_amount
------- | ---------- | ---------- | -----------
1       | 101        | 2023-01-15 | 100.00
2       | 101        | 2023-02-20 | 120.00
3       | 101        | 2023-03-10 | 150.00
4       | 102        | 2023-01-25 | 80.00
5       | 102        | 2023-02-10 | 90.00

Output (Cumulative Sales for product 101):
sale_date  | cumulative_sales
---------- | ----------------
2023-01-15 | 100.00
2023-02-20 | 220.00
2023-03-10 | 370.00

Explanation: The cumulative sales are calculated by summing the sale amounts up to each sale date for each product.

Example 3:

Input:
sales table:
sale_id | product_id | sale_date  | sale_amount
------- | ---------- | ---------- | -----------
1       | 101        | 2023-01-15 | 100.00
2       | 101        | 2023-01-20 | 120.00
3       | 101        | 2023-02-10 | 150.00
4       | 101        | 2023-02-15 | 130.00

Output (Percentile Sales (75th) for product 101 in January 2023):
sale_date  | percentile_75
---------- | --------------
2023-01-15 | 110.00
2023-01-20 | 110.00

Explanation: The 75th percentile is calculated for each month within each product.  In January 2023 for product 101, the sale amounts are 100 and 120. The 75th percentile is 110.

Constraints

  • The sales table will contain at least 10 rows.
  • sale_date will be a valid date format.
  • sale_amount will be a non-negative decimal number.
  • Queries should be optimized for performance, avoiding unnecessary self-joins or subqueries.
  • The database system is assumed to be a standard SQL implementation (e.g., PostgreSQL, MySQL, SQL Server).

Notes

  • Pay close attention to the ORDER BY clause within the window frame specification. The order of the data significantly impacts the calculated metrics.
  • The PARTITION BY clause is crucial for calculating metrics separately for each product.
  • Consider using the ROWS BETWEEN clause to define the window frame precisely (e.g., preceding 2 months and the current month for the moving average).
  • Percentile calculations might require database-specific functions (e.g., PERCENTILE_CONT in PostgreSQL). Adapt your query accordingly.
  • For ranking, consider the use of DENSE_RANK() or RANK() depending on whether you want to handle ties differently.
  • Test your queries thoroughly with various input scenarios, including edge cases like missing data or unusual sales patterns. Pseudocode for the 3-Month Moving Average:
// Assuming a SQL database
SELECT
    product_id,
    sale_date,
    AVG(sale_amount) OVER (PARTITION BY product_id ORDER BY sale_date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS moving_average
FROM
    sales;

Pseudocode for Cumulative Sales:

// Assuming a SQL database
SELECT
    product_id,
    sale_date,
    SUM(sale_amount) OVER (PARTITION BY product_id ORDER BY sale_date) AS cumulative_sales
FROM
    sales;

Pseudocode for Percentile Sales (75th):

// Assuming a SQL database and a percentile function like PERCENTILE_CONT
SELECT
    product_id,
    sale_date,
    PERCENTILE_CONT(0.75) WITHIN GROUP (ORDER BY sale_amount) OVER (PARTITION BY product_id, DATE_TRUNC('month', sale_date)) AS percentile_75
FROM
    sales;

Pseudocode for Ranking Sales by Amount:

// Assuming a SQL database
SELECT
    sale_id,
    product_id,
    sale_date,
    sale_amount,
    RANK() OVER (PARTITION BY product_id ORDER BY sale_amount DESC) AS sale_rank
FROM
    sales;
Loading editor...
plaintext