Hone logo
Hone
Problems

Text Summarization with Jest Testing

Text summarization is a crucial task in natural language processing, allowing us to condense large amounts of text into shorter, more manageable summaries. This challenge asks you to implement a basic text summarization function and then thoroughly test it using Jest, ensuring its correctness and robustness. You'll focus on a simple extractive summarization approach, selecting sentences based on their length.

Problem Description

You need to implement a TypeScript function called summarizeText that takes a string of text as input and returns a summarized version of the text. The summarization should be extractive, meaning it selects existing sentences from the original text rather than generating new ones. The function should select the top 3 longest sentences from the input text to form the summary.

Key Requirements:

  • Sentence Splitting: The function must accurately split the input text into individual sentences. Assume sentences are delimited by periods ('.').
  • Length-Based Selection: The function should identify the 3 longest sentences based on character count.
  • Order Preservation: The summary should maintain the original order of the selected sentences within the input text.
  • Handling Fewer Than 3 Sentences: If the input text contains fewer than 3 sentences, the function should return all sentences in their original order.
  • Empty Input: If the input text is empty, the function should return an empty string.
  • Whitespace Handling: Trim leading/trailing whitespace from each sentence before calculating length and including it in the summary.

Expected Behavior:

The summarizeText function should return a string containing the top 3 longest sentences from the input text, concatenated together with periods separating them.

Edge Cases to Consider:

  • Text with no periods.
  • Text with multiple periods in a single sentence.
  • Sentences of equal length. (In this case, the first 3 encountered should be selected.)
  • Input text containing only whitespace.
  • Very long sentences.

Examples

Example 1:

Input: "This is the first sentence. This is the second sentence, which is a bit longer. And this is the third sentence. This is the fourth sentence, and it's the longest one."
Output: "This is the second sentence, which is a bit longer. And this is the third sentence. This is the fourth sentence, and it's the longest one."
Explanation: The longest three sentences are the second, third, and fourth sentences in the input.

Example 2:

Input: "Short sentence. Another short sentence."
Output: "Short sentence. Another short sentence."
Explanation: The input contains only two sentences, so both are returned.

Example 3:

Input: ""
Output: ""
Explanation: The input is an empty string, so an empty string is returned.

Example 4:

Input: "This is a sentence.  This is another.  This is a third sentence. "
Output: "This is a sentence.  This is another.  This is a third sentence. "
Explanation: Whitespace is trimmed, and the longest three sentences are returned in their original order.

Constraints

  • Input Text Length: The input text can be up to 10,000 characters long.
  • Sentence Length: Individual sentences can be up to 2,000 characters long.
  • Performance: The function should complete within 100 milliseconds for typical input texts.
  • Input Format: The input will always be a string.

Notes

  • Consider using regular expressions for sentence splitting, but be mindful of potential edge cases.
  • You can use built-in TypeScript array methods like sort to efficiently find the longest sentences.
  • Focus on writing clean, readable, and well-documented code.
  • Your Jest tests should cover all the scenarios described in the "Expected Behavior" and "Edge Cases" sections. Aim for high test coverage.
  • Remember to handle potential errors gracefully.
  • The summarization is extractive, so you are not generating new sentences. You are selecting existing ones.
Loading editor...
typescript