Hone logo
Hone
Problems

Implementing Data Compression in Go

In today's digital world, efficiently storing and transmitting data is crucial. Compression algorithms play a vital role in reducing the size of data, saving storage space and bandwidth. This challenge asks you to implement a basic compression and decompression mechanism in Go, allowing you to understand the core principles behind reducing data redundancy.

Problem Description

Your task is to create two Go functions: Compress and Decompress.

The Compress function should take a slice of bytes ([]byte) as input and return a compressed slice of bytes. The Decompress function should take a compressed slice of bytes as input and return the original, uncompressed slice of bytes.

For this challenge, you will implement a simple run-length encoding (RLE) compression algorithm. RLE works by replacing consecutive occurrences of the same data value with a count and the data value itself.

Key Requirements:

  1. Compress(data []byte) []byte:

    • Iterate through the input data.
    • Identify consecutive runs of identical bytes.
    • Encode each run as a pair: [count byte, data byte].
    • The count byte should represent the number of consecutive occurrences. If a byte appears more than 255 times consecutively, it should be split into multiple encoded pairs. For example, 300 'A's should be encoded as [255, 'A'], [45, 'A'].
    • Return the resulting compressed byte slice.
  2. Decompress(compressedData []byte) []byte:

    • Iterate through the compressedData.
    • Each pair of bytes represents [count byte, data byte].
    • Replicate the data byte count byte times.
    • Append the replicated bytes to the output slice.
    • Return the resulting decompressed byte slice.

Edge Cases to Consider:

  • Empty input: What happens if the input data is empty?
  • Single byte input: How is a single byte handled?
  • Runs exceeding 255: Ensure that runs longer than 255 are handled correctly by splitting them into multiple [count, data] pairs.
  • Corrupted input (for decompression): While not strictly required for this basic implementation, consider what might happen if compressedData has an odd number of bytes or invalid count values (though for this challenge, we assume valid input).

Examples

Example 1: Basic Compression

Input: []byte("AAABBBCCCDDDDE")
Output: []byte{0x03, 0x41, 0x03, 0x42, 0x03, 0x43, 0x04, 0x44, 0x01, 0x45}
Explanation:
'A' appears 3 times -> [3, 'A']
'B' appears 3 times -> [3, 'B']
'C' appears 3 times -> [3, 'C']
'D' appears 4 times -> [4, 'D']
'E' appears 1 time  -> [1, 'E']
(0x41 is ASCII for 'A', 0x42 for 'B', etc.)

Example 2: Decompression

Input: []byte{0x03, 0x41, 0x03, 0x42, 0x03, 0x43, 0x04, 0x44, 0x01, 0x45}
Output: []byte("AAABBBCCCDDDDE")
Explanation:
[3, 'A'] -> "AAA"
[3, 'B'] -> "BBB"
[3, 'C'] -> "CCC"
[4, 'D'] -> "DDDD"
[1, 'E'] -> "E"
Concatenated: "AAABBBCCCDDDDE"

Example 3: Run exceeding 255

Input: []byte("AAAA...A") // 300 'A's
Output: []byte{0xff, 0x41, 0x2d, 0x41} // 255 'A's, then 45 'A's
Explanation:
The first 255 'A's are encoded as [255, 'A'].
The remaining 45 'A's are encoded as [45, 'A'].
(0x2d is the decimal value 45)

Constraints

  • The input data for Compress will be a slice of bytes ([]byte).
  • The input data for Decompress will be a valid compressed byte slice generated by your Compress function, with the exception of testing edge cases like an empty input for Compress.
  • The maximum count for a single run in the compressed output will not exceed 255. If a run is longer, it must be split into multiple [count, data] pairs.
  • The performance of your compression and decompression algorithms should be reasonably efficient. Avoid excessive memory reallocations if possible.

Notes

  • This implementation uses a simple RLE. Real-world compression algorithms are significantly more complex.
  • Consider how you will manage the growing size of your output byte slices in both Compress and Decompress. Using append is a common Go idiom, but be mindful of potential reallocations.
  • The byte type in Go is an alias for uint8, meaning it can hold values from 0 to 255. This is perfect for our count and data bytes.
  • To test the "runs exceeding 255" scenario, you might need to construct a large byte slice with many repeating characters.
Loading editor...
go