Hone logo
Hone
Problems

Rustic File Operations: Word Counter

This challenge will guide you through implementing fundamental file input and output operations in Rust. You'll build a program that reads the content of a text file, counts the occurrences of each word, and then writes the results to another file. This is a common task in data processing and text analysis.

Problem Description

Your task is to create a Rust program that performs the following actions:

  1. Read from a specified input file: The program should accept a file path as input and read its entire content into memory.
  2. Count word occurrences: After reading the file, you need to process the text to count how many times each unique word appears.
    • Words should be case-insensitive. For example, "The" and "the" should be treated as the same word.
    • Punctuation attached to words (e.g., "word.", "word,", "word!") should be removed so that only the alphanumeric characters of the word are considered.
  3. Write to a specified output file: The program should then write the word counts to a new file. The output file should list each unique word along with its count, one word-count pair per line, separated by a colon and a space. The output should be sorted alphabetically by word.

Key Requirements:

  • Use Rust's standard library for file operations (std::fs and std::io).
  • Handle potential errors during file reading and writing gracefully.
  • Implement case-insensitive word comparison.
  • Strip common punctuation from the ends of words.
  • Sort the output alphabetically by word.

Expected Behavior:

Given an input file input.txt with the following content:

This is a sample text.
This text is for testing.
Sample text, sample file.

The program should produce an output file output.txt with the following content:

a: 1
file: 1
for: 1
is: 2
sample: 3
testing: 1
text: 3
this: 2

Edge Cases to Consider:

  • Empty input file: If the input file is empty, the output file should also be empty.
  • File not found: The program should handle cases where the input file does not exist.
  • Special characters/multiple spaces: Consider how your word splitting and cleaning logic handles multiple spaces between words or other non-alphanumeric characters within the text.

Examples

Example 1:

Input File (input.txt):

Rust is fun. Fun with Rust!

Output File (output.txt):

fun: 2
is: 1
rust: 2
with: 1

Explanation: The program reads the text, splits it into words, converts them to lowercase, removes punctuation ("."), counts occurrences, and writes the sorted results.

Example 2:

Input File (input.txt):

One, two, three.
One! two?

Output File (output.txt):

one: 2
three: 1
two: 2

Explanation: Punctuation like commas, periods, exclamation marks, and question marks are removed before counting.

Example 3:

Input File (input.txt):

Output File (output.txt):

Explanation: An empty input file results in an empty output file.

Constraints

  • The input text file will be a standard UTF-8 encoded text file.
  • Words are considered sequences of alphanumeric characters.
  • The maximum file size for testing will not exceed 10MB.
  • The program should complete within a reasonable time for files of this size.

Notes

  • You'll likely want to use a HashMap from std::collections to store and update word counts efficiently.
  • Consider using iterators and methods like split_whitespace, to_lowercase, and filter to process the text.
  • For removing punctuation, you can iterate through characters of a word and keep only alphanumeric ones.
  • Remember to handle Result types returned by I/O operations.
Loading editor...
rust