Implementing Data Compression in Go
In today's digital world, efficiently storing and transmitting data is crucial. Compression algorithms play a vital role in reducing the size of data, saving storage space and bandwidth. This challenge asks you to implement a basic compression and decompression mechanism in Go, allowing you to understand the core principles behind reducing data redundancy.
Problem Description
Your task is to create two Go functions: Compress and Decompress.
The Compress function should take a slice of bytes ([]byte) as input and return a compressed slice of bytes.
The Decompress function should take a compressed slice of bytes as input and return the original, uncompressed slice of bytes.
For this challenge, you will implement a simple run-length encoding (RLE) compression algorithm. RLE works by replacing consecutive occurrences of the same data value with a count and the data value itself.
Key Requirements:
-
Compress(data []byte) []byte:- Iterate through the input
data. - Identify consecutive runs of identical bytes.
- Encode each run as a pair:
[count byte, data byte]. - The
count byteshould represent the number of consecutive occurrences. If a byte appears more than 255 times consecutively, it should be split into multiple encoded pairs. For example, 300 'A's should be encoded as[255, 'A'], [45, 'A']. - Return the resulting compressed byte slice.
- Iterate through the input
-
Decompress(compressedData []byte) []byte:- Iterate through the
compressedData. - Each pair of bytes represents
[count byte, data byte]. - Replicate the
data bytecount bytetimes. - Append the replicated bytes to the output slice.
- Return the resulting decompressed byte slice.
- Iterate through the
Edge Cases to Consider:
- Empty input: What happens if the input
datais empty? - Single byte input: How is a single byte handled?
- Runs exceeding 255: Ensure that runs longer than 255 are handled correctly by splitting them into multiple
[count, data]pairs. - Corrupted input (for decompression): While not strictly required for this basic implementation, consider what might happen if
compressedDatahas an odd number of bytes or invalid count values (though for this challenge, we assume valid input).
Examples
Example 1: Basic Compression
Input: []byte("AAABBBCCCDDDDE")
Output: []byte{0x03, 0x41, 0x03, 0x42, 0x03, 0x43, 0x04, 0x44, 0x01, 0x45}
Explanation:
'A' appears 3 times -> [3, 'A']
'B' appears 3 times -> [3, 'B']
'C' appears 3 times -> [3, 'C']
'D' appears 4 times -> [4, 'D']
'E' appears 1 time -> [1, 'E']
(0x41 is ASCII for 'A', 0x42 for 'B', etc.)
Example 2: Decompression
Input: []byte{0x03, 0x41, 0x03, 0x42, 0x03, 0x43, 0x04, 0x44, 0x01, 0x45}
Output: []byte("AAABBBCCCDDDDE")
Explanation:
[3, 'A'] -> "AAA"
[3, 'B'] -> "BBB"
[3, 'C'] -> "CCC"
[4, 'D'] -> "DDDD"
[1, 'E'] -> "E"
Concatenated: "AAABBBCCCDDDDE"
Example 3: Run exceeding 255
Input: []byte("AAAA...A") // 300 'A's
Output: []byte{0xff, 0x41, 0x2d, 0x41} // 255 'A's, then 45 'A's
Explanation:
The first 255 'A's are encoded as [255, 'A'].
The remaining 45 'A's are encoded as [45, 'A'].
(0x2d is the decimal value 45)
Constraints
- The input data for
Compresswill be a slice of bytes ([]byte). - The input data for
Decompresswill be a valid compressed byte slice generated by yourCompressfunction, with the exception of testing edge cases like an empty input forCompress. - The maximum count for a single run in the compressed output will not exceed 255. If a run is longer, it must be split into multiple
[count, data]pairs. - The performance of your compression and decompression algorithms should be reasonably efficient. Avoid excessive memory reallocations if possible.
Notes
- This implementation uses a simple RLE. Real-world compression algorithms are significantly more complex.
- Consider how you will manage the growing size of your output byte slices in both
CompressandDecompress. Usingappendis a common Go idiom, but be mindful of potential reallocations. - The
bytetype in Go is an alias foruint8, meaning it can hold values from 0 to 255. This is perfect for our count and data bytes. - To test the "runs exceeding 255" scenario, you might need to construct a large byte slice with many repeating characters.