Hone logo
Hone
Problems

Implement Text Summarization with Jest Testing

This challenge focuses on developing a robust text summarization function in TypeScript and then rigorously testing it using Jest. Text summarization is a crucial NLP task, enabling users to quickly grasp the main points of lengthy documents, improving information access and efficiency.

Problem Description

You need to implement a function summarizeText that takes a long string of text as input and returns a concise summary of that text. The summary should aim to capture the most important sentences from the original text. You will also be responsible for writing comprehensive unit tests for this function using Jest.

Key Requirements:

  1. Summarization Logic: Implement a basic text summarization algorithm. For this challenge, a simple extractive summarization approach will suffice. This could involve:
    • Tokenizing the text into sentences.
    • Scoring sentences based on a chosen metric (e.g., presence of keywords, frequency of words, position in the text).
    • Selecting the top N sentences to form the summary.
  2. summarizeText Function: Create a TypeScript function named summarizeText that accepts two arguments:
    • text: A string representing the input document.
    • numSentences (optional): A number representing the desired number of sentences in the summary. If not provided, a default value should be used (e.g., 3). The function should return a string representing the summary.
  3. Jest Unit Tests: Write a suite of Jest tests to cover various scenarios for your summarizeText function. This should include:
    • Tests for basic functionality with typical input.
    • Tests for edge cases such as empty input, very short input, and requests for more sentences than available.
    • Tests to verify the number of sentences in the output.

Expected Behavior:

  • The summarizeText function should return a coherent summary that reflects the main themes of the input text.
  • The number of sentences in the returned summary should match the numSentences argument, unless the input text has fewer sentences.
  • The function should handle different sentence delimiters (e.g., '.', '!', '?').

Edge Cases to Consider:

  • Empty Input: What happens when an empty string is provided?
  • Single Sentence Input: What if the input has only one sentence?
  • No Sentence Delimiters: What if the input is a long string without standard punctuation?
  • Requesting Zero Sentences: What if numSentences is 0?
  • Requesting More Sentences Than Available: The summary should not exceed the total number of sentences in the input.
  • Input with Multiple Punctuation: Sentences ending with '!', '?' should be handled correctly.
  • Abbreviations: Consider how abbreviations like "Mr." or "Dr." might affect sentence splitting. For this challenge, you can assume standard sentence splitting is sufficient and not worry about advanced abbreviation handling.

Examples

Example 1:

Input text: "The quick brown fox jumps over the lazy dog. This is a classic sentence used for testing. It contains all the letters of the alphabet. The dog remains lazy, unbothered by the fox. We are testing summarization here. This is the end of the text."
Input numSentences: 2

Output: "The quick brown fox jumps over the lazy dog. We are testing summarization here."
Explanation: The algorithm identified the first and a sentence towards the end as most representative.

Example 2:

Input text: "Artificial intelligence is transforming industries. Machine learning is a subset of AI. Deep learning is a subset of machine learning. These technologies are rapidly evolving. The future is exciting."
Input numSentences: 3

Output: "Artificial intelligence is transforming industries. Machine learning is a subset of AI. Deep learning is a subset of machine learning."
Explanation: The algorithm picked the first three sentences as they introduce the core concepts.

Example 3: Edge Case (Short Input)

Input text: "Hello world."
Input numSentences: 5

Output: "Hello world."
Explanation: When requesting more sentences than available, the entire text is returned.

Example 4: Edge Case (Empty Input)

Input text: ""
Input numSentences: 2

Output: ""
Explanation: An empty input string results in an empty output string.

Constraints

  • The input text will be a string.
  • numSentences will be an integer, potentially undefined.
  • The input text length will not exceed 10,000 characters.
  • The number of sentences requested (numSentences) will not exceed 100.
  • Your solution should be implemented in TypeScript.
  • Jest will be used for testing.

Notes

  • Sentence Splitting: A good starting point for sentence splitting is to split by '.', '!', and '?'. Be mindful of potential issues with these delimiters appearing mid-sentence (e.g., in abbreviations), though for this challenge, a simple split is acceptable.
  • Sentence Scoring: You can implement a basic scoring mechanism. For instance, sentences containing more common words (excluding very common English stop words like "the", "a", "is") could be considered more important. Alternatively, you could give higher scores to sentences appearing earlier in the text.
  • Default numSentences: Ensure your function gracefully handles cases where numSentences is not provided, using a sensible default.
  • Testing Approach: Think about different test cases that would ensure the robustness of your summarizeText function. Consider how to isolate the summarization logic from the sentence splitting if you decide to separate those concerns.
Loading editor...
typescript