Run-Length Encoding Compression
This challenge involves implementing a simple compression algorithm called Run-Length Encoding (RLE) in Python. RLE is a form of lossless data compression that works by replacing sequences of the same data value with a count of the value and the value itself. This technique is useful for data that contains many repeating characters or values, such as simple image formats or text files with repetitive patterns.
Problem Description
Your task is to create a Python function rle_encode(data) that takes a string as input and returns its run-length encoded representation. The function should identify consecutive sequences of identical characters and replace them with the count of repetitions followed by the character itself.
Key Requirements:
- The function must accept a single string argument
data. - The function must return a string representing the RLE compressed data.
- Consecutive identical characters should be encoded as
countcharacter. - If a character appears only once consecutively, it should be represented as
1character. - The output should be case-sensitive.
Expected Behavior:
- An empty input string should result in an empty output string.
- A string with no repeating characters should be encoded with a '1' preceding each character.
Edge Cases to Consider:
- Empty input string.
- Input string with all unique characters.
- Input string with long sequences of repeated characters.
- Input string containing spaces or special characters.
Examples
Example 1:
Input: "AAABBBCCDAA"
Output: "3A3B2C1D2A"
Explanation: 'AAA' becomes '3A', 'BBB' becomes '3B', 'CC' becomes '2C', 'D' becomes '1D', and 'AA' becomes '2A'.
Example 2:
Input: "XYZ"
Output: "1X1Y1Z"
Explanation: Each character appears once consecutively.
Example 3:
Input: ""
Output: ""
Explanation: An empty input string results in an empty output string.
Example 4:
Input: "WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWW"
Output: "12W1B12W3B24W1B14W"
Explanation: Demonstrates handling of longer sequences and interspersed single characters.
Constraints
- The input string
datawill contain only ASCII characters (letters, numbers, spaces, and common symbols). - The length of the input string
datawill be between 0 and 10,000 characters, inclusive. - The number of repetitions for any character will not exceed 9. If a sequence has 10 or more repetitions, it should be handled correctly by concatenating counts (e.g., 10 'A's would be '10A', not '10A' or a different representation). Correction: The example constraints imply that counts will always be single digits. For this challenge, assume counts will not exceed 9. This simplifies the problem. If the count exceeds 9, treat it as multiple single-digit counts (e.g., 12 'A's becomes '9A3A'). Re-correction for clarity: The prompt intends to simplify for a learning exercise. For this challenge, assume the count will not exceed 9, meaning any sequence of identical characters will be 9 or less. If a character repeats more than 9 times, it should be broken down (e.g., 12 'A's would be '9A3A').
- The output string should not contain any leading or trailing whitespace.
- Performance should be reasonable, aiming for a solution that is roughly O(N) where N is the length of the input string.
Notes
- Consider how you will iterate through the string and keep track of the current character and its consecutive count.
- You'll need a way to append the count and the character to your result string.
- Think about how to handle the last sequence of characters in the input string, as it might not be followed by a different character.