Hone logo
Hone
Problems

Run-Length Encoding (RLE) Compression

Run-Length Encoding (RLE) is a simple form of lossless data compression, particularly effective for data containing long runs of the same value. This challenge asks you to implement an RLE encoder and decoder in Python. Understanding and implementing RLE is a foundational step in grasping more complex compression algorithms.

Problem Description

You are tasked with creating a Python program that can both encode and decode strings using Run-Length Encoding. The encoding process should replace consecutive occurrences of the same character with the character and the number of times it repeats. The decoding process should reverse this, reconstructing the original string from the encoded representation.

What needs to be achieved:

  • Encoding: Given a string, produce its RLE encoded string.
  • Decoding: Given an RLE encoded string, produce the original string.

Key Requirements:

  • The encoder should handle strings containing any characters (letters, numbers, symbols, spaces).
  • The decoder should correctly reconstruct the original string from the encoded string.
  • The encoder should not add any extra characters or spaces.
  • The decoder should handle invalid encoded strings gracefully (see Edge Cases).

Expected Behavior:

The encoder should produce a string where each run of identical characters is represented by the character followed by the count of its repetitions. The decoder should accurately reverse this process.

Edge Cases to Consider:

  • Empty input string: Both encoder and decoder should handle this gracefully (returning an empty string).
  • String with no repeating characters: The encoder should return the original string unchanged.
  • Invalid encoded string: The decoder should handle cases where the encoded string contains non-numeric characters after a character (e.g., "A2B3C"). Return an appropriate error message or raise an exception.
  • Encoded string with a count of zero: This should be treated as an invalid encoded string.

Examples

Example 1:

Input: "AAABBBCCDAA"
Output: "A3B3C2D1A2"
Explanation: The input string has runs of 'A' (3 times), 'B' (3 times), 'C' (2 times), 'D' (1 time), and 'A' (2 times).

Example 2:

Input: "WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWB"
Output: "W12B1W12B3W24B1"
Explanation:  Long runs of 'W' and 'B' are compressed.

Example 3: (Edge Case)

Input: "A2B3C"
Output: "Invalid encoded string"
Explanation: The encoded string "A2B3C" is invalid because 'C' is not followed by a number.

Example 4: (Edge Case)

Input: ""
Output: ""
Explanation: Empty string input results in an empty string output for both encoder and decoder.

Constraints

  • Input String Length: The input string can be up to 1000 characters long.
  • Character Set: The input string can contain any ASCII characters.
  • Encoded String Length: The encoded string's length will be proportional to the input string's length, but potentially shorter due to compression.
  • Performance: The encoding and decoding functions should complete within a reasonable time (e.g., less than 1 second) for the given input length.

Notes

  • Consider using a loop to iterate through the input string and identify runs of identical characters.
  • For encoding, you'll need to keep track of the current character and the count of its repetitions.
  • For decoding, you'll need to parse the encoded string, extracting the character and its repetition count.
  • Error handling is crucial for the decoder to handle invalid input gracefully. Think about how to signal an error to the user.
  • You can implement the encoder and decoder as separate functions.
  • Focus on clarity and readability in your code. Good variable names and comments are appreciated.
Loading editor...
python