Hone logo
Hone
Problems

Hierarchical Data Traversal with Recursive CTEs

Many datasets represent hierarchical relationships, such as organizational charts, file system structures, or product categories. This challenge asks you to write a SQL query utilizing a recursive Common Table Expression (CTE) to traverse and extract data from such a hierarchy. Successfully solving this problem demonstrates proficiency in handling complex data structures within SQL.

Problem Description

You are given a table named Categories representing a hierarchical category structure. The table has the following columns:

  • CategoryID (INT): Unique identifier for each category.
  • CategoryName (VARCHAR): Name of the category.
  • ParentCategoryID (INT): The CategoryID of the parent category. A value of NULL indicates the category is a root category (no parent).

Your task is to write a recursive CTE that retrieves all categories within a specified root category, along with their depth (distance from the root). The query should return the CategoryID, CategoryName, and Depth for each category reachable from the given root.

Key Requirements:

  • The query must use a recursive CTE.
  • The query must correctly handle root categories (those with ParentCategoryID as NULL).
  • The query must calculate the depth of each category relative to the root category. The root category should have a depth of 0.
  • The query must return all descendants of the root category, not just immediate children.

Expected Behavior:

Given a root CategoryID, the query should return a result set containing all categories that are descendants of that root, ordered by depth (ascending) and then by CategoryID (ascending) within each depth level.

Edge Cases to Consider:

  • The root category might not exist in the table. In this case, return an empty result set.
  • The table might be empty. In this case, return an empty result set.
  • Circular dependencies (e.g., Category A is a parent of Category B, and Category B is a parent of Category A) – the CTE should terminate gracefully without infinite recursion. Most SQL implementations have a recursion limit to prevent this.

Examples

Example 1:

Categories Table:
CategoryID | CategoryName | ParentCategoryID
-----------|--------------|-----------------
1          | Electronics  | NULL
2          | TVs          | 1
3          | Smartphones  | 1
4          | Samsung TVs  | 2
5          | Apple iPhones | 3

Input: RootCategoryID = 1
Output:
CategoryID | CategoryName | Depth
-----------|--------------|-------
1          | Electronics  | 0
2          | TVs          | 1
3          | Smartphones  | 1
4          | Samsung TVs  | 2
5          | Apple iPhones | 2

Explanation: The query starts at Electronics (CategoryID 1) and recursively finds all its descendants, calculating their depth.

Example 2:

Categories Table:
CategoryID | CategoryName | ParentCategoryID
-----------|--------------|-----------------
1          | Electronics  | NULL
2          | TVs          | 1
3          | Smartphones  | 1

Input: RootCategoryID = 4 (does not exist)
Output:
(Empty Result Set)

Explanation: Since CategoryID 4 does not exist, the query returns an empty result set.

Example 3: (Circular Dependency - handled gracefully)

Categories Table:
CategoryID | CategoryName | ParentCategoryID
-----------|--------------|-----------------
1          | A            | NULL
2          | B            | 1
3          | C            | 2
4          | A            | 3  -- Circular dependency: A -> B -> C -> A

Input: RootCategoryID = 1
Output:
CategoryID | CategoryName | Depth
-----------|--------------|-------
1          | A            | 0
2          | B            | 1
3          | C            | 2

Explanation: The query will traverse the hierarchy until it hits the recursion limit, preventing an infinite loop. It will return the categories reachable before the limit is reached.

Constraints

  • The CategoryID is always a positive integer.
  • The CategoryName is a non-empty string.
  • The ParentCategoryID can be NULL or a valid CategoryID.
  • The maximum depth of the hierarchy is 10. (This is a practical constraint to prevent excessive recursion and potential performance issues. Your query should be designed to handle this depth.)
  • The table Categories will contain at most 1000 rows.
  • The query should execute within 5 seconds on a standard database server.

Notes

  • Consider using a UNION ALL to combine the base case (root category) and the recursive case.
  • The Depth calculation is crucial. Ensure it accurately reflects the distance from the root.
  • Pay close attention to the order of operations within the recursive CTE.
  • Test your query thoroughly with various root categories, including root categories that do not exist and cases with circular dependencies.
  • The specific SQL syntax might vary slightly depending on the database system (e.g., MySQL, PostgreSQL, SQL Server), but the core concept of a recursive CTE remains the same. Assume a standard SQL implementation.
  • Pseudocode for the recursive CTE structure:
WITH RECURSIVE CategoryHierarchy AS (
    -- Base Case: Select the root category
    SELECT CategoryID, CategoryName, 0 AS Depth
    FROM Categories
    WHERE CategoryID = <RootCategoryID>

    UNION ALL

    -- Recursive Case: Select children of categories already in the CTE
    SELECT c.CategoryID, c.CategoryName, ch.Depth + 1 AS Depth
    FROM Categories c
    JOIN CategoryHierarchy ch ON c.ParentCategoryID = ch.CategoryID
)
SELECT CategoryID, CategoryName, Depth
FROM CategoryHierarchy
ORDER BY Depth, CategoryID;
Loading editor...
plaintext