Hone logo
Hone
Problems

Building a Suffix Tree in JavaScript

Suffix trees are powerful data structures with numerous applications in string processing, such as pattern matching, finding the longest common substring, and bioinformatics. This challenge will guide you through implementing a suffix tree from scratch in JavaScript.

Problem Description

Your task is to implement a suffix tree data structure in JavaScript. A suffix tree for a string $S$ is a compressed trie of all suffixes of $S$. Each edge in the tree is labeled with a substring of $S$, and each leaf node represents a unique suffix.

Key Requirements:

  • Node Structure: Design a node structure for your suffix tree. Each node should be able to store references to its children and potentially information about the substring represented by the edge leading to it.
  • Insertion: Implement a method to insert a string into the suffix tree. This involves traversing the tree, splitting edges, and creating new nodes as needed.
  • Construction: Implement a function that takes a string and constructs its corresponding suffix tree.
  • Suffix Representation: Ensure that each leaf node uniquely identifies a suffix of the original string (e.g., by storing the starting index of the suffix).

Expected Behavior:

The SuffixTree class should have a method like build(text) which takes a string and populates the tree with all its suffixes. The internal representation should correctly reflect the compressed trie structure.

Edge Cases:

  • Empty string input.
  • String with repeating characters.
  • String with only one character.

Examples

Example 1:

Input: "banana"

Output: A constructed suffix tree representing all suffixes of "banana".

  • Suffixes:
    • "banana" (index 0)
    • "anana" (index 1)
    • "nana" (index 2)
    • "ana" (index 3)
    • "na" (index 4)
    • "a" (index 5)

A visual representation of the suffix tree would show:

The root with children. For instance, an edge labeled "a" leading to a node, which might further branch to "na" and then "na". Another edge from the root labeled "b" leading to "anana". And so on.

Example 2:

Input: "abcabxabc"

Output: A constructed suffix tree representing all suffixes of "abcabxabc".

  • Suffixes:
    • "abcabxabc" (index 0)
    • "bcabxabc" (index 1)
    • "cabxabc" (index 2)
    • "abxabc" (index 3)
    • "bxabc" (index 4)
    • "xabc" (index 5)
    • "abc" (index 6)
    • "bc" (index 7)
    • "c" (index 8)

Example 3: Edge Case - Empty String

Input: ""

Output: An empty suffix tree (e.g., a root node with no children).

Constraints

  • The input string will consist of lowercase English alphabet characters.
  • The maximum length of the input string is 1000 characters.
  • The implementation should be reasonably efficient. While linear time construction algorithms exist (like Ukkonen's), a simpler $O(N^2)$ construction is acceptable for this challenge if it demonstrates understanding of the core concepts.

Notes

  • Consider how to represent the substrings on the edges. Storing start and end indices of the substring within the original text is often more efficient than storing the actual substring.
  • A common approach to suffix tree construction is Ukkonen's algorithm, which achieves linear time complexity. However, understanding the basic insertion logic for each suffix is a good starting point. You might consider a naive $O(N^2)$ insertion approach first, where you insert each suffix one by one.
  • Think about how to handle shared prefixes between suffixes. This is where edge splitting and new node creation become crucial.
  • The goal is to build the suffix tree itself, not necessarily to implement specific algorithms that use the suffix tree (like pattern matching).
Loading editor...
javascript