Hone logo
Hone
Problems

Polonius: A Simple Text Analyzer in Rust

Polonius, the advisor to Hamlet, was known for his lengthy and often convoluted speeches. This challenge asks you to implement a simplified version of Polonius's analytical abilities: a program that analyzes a given text and provides basic statistics like word count, character count (excluding spaces), and the frequency of each word. This is a useful exercise in string manipulation, data structures (specifically, a hash map), and basic Rust programming.

Problem Description

You are to write a Rust program that takes a string as input and performs the following analysis:

  1. Word Count: Determine the total number of words in the input string. Words are separated by spaces.
  2. Character Count (excluding spaces): Calculate the number of characters in the input string, excluding spaces.
  3. Word Frequency: Create a hash map (using std::collections::HashMap) that stores the frequency of each word in the input string. The keys of the hash map should be the words (lowercase), and the values should be the number of times each word appears.

The program should then print the results in a clear and formatted manner.

Key Requirements:

  • The input string should be converted to lowercase before analysis to ensure case-insensitive word counting.
  • Punctuation should be ignored when counting words and calculating frequencies. Only letters and numbers should be considered part of a word.
  • The program should handle empty input strings gracefully.
  • The hash map should be sorted alphabetically by word before printing the word frequencies.

Expected Behavior:

The program should read a string from standard input (using std::io::stdin()), perform the analysis, and print the results to standard output in the following format:

Word Count: [word_count]
Character Count (excluding spaces): [char_count]
Word Frequency:
  [word1]: [frequency1]
  [word2]: [frequency2]
  ...

Examples

Example 1:

Input: "To be or not to be, that is the question."
Output:
Word Count: 10
Character Count (excluding spaces): 40
Word Frequency:
  be: 2
  is: 1
  not: 1
  or: 1
  question: 1
  that: 1
  the: 1
  to: 2

Explanation: The input is converted to lowercase. Punctuation is removed. "To" and "be" appear twice, while the other words appear once.

Example 2:

Input: "Hello world! Hello, Rust."
Output:
Word Count: 4
Character Count (excluding spaces): 17
Word Frequency:
  hello: 2
  rust: 1
  world: 1

Explanation: Punctuation is removed. "Hello" appears twice.

Example 3: (Edge Case)

Input: ""
Output:
Word Count: 0
Character Count (excluding spaces): 0
Word Frequency:

Explanation: An empty string results in zero word count, zero character count, and an empty word frequency map.

Constraints

  • The input string will have a maximum length of 10,000 characters.
  • The input string will consist of ASCII characters.
  • The program should execute within 1 second for all valid inputs.
  • The word frequency map should be sorted alphabetically.

Notes

  • Consider using the split() method to separate the string into words.
  • The chars() iterator can be helpful for iterating over the characters of the string.
  • Regular expressions could be used to remove punctuation, but a simpler character-by-character check might be more efficient for this problem.
  • Remember to handle potential errors when reading from standard input.
  • The HashMap in Rust requires you to specify the type of key and value. Use String for the key (the word) and usize for the value (the frequency).
  • Use sort_keys() on the HashMap to ensure alphabetical order.
Loading editor...
rust