Hone logo
Hone
Problems

Implementing a Basic String Type in Rust

This challenge focuses on understanding how fundamental data structures work under the hood in Rust by creating a simplified version of its built-in String type. You'll gain practical experience with memory management, UTF-8 encoding, and common string operations.

Problem Description

Your task is to implement a custom MyString struct in Rust that mimics some of the core functionalities of the standard library's String. This MyString should be capable of storing UTF-8 encoded text and support basic operations like creation, appending, and retrieving its length.

Key Requirements:

  1. Storage: MyString should internally store its data as a dynamically allocated buffer of bytes (Vec<u8>).
  2. UTF-8 Validity: The internal Vec<u8> must always represent valid UTF-8 encoded data.
  3. Capacity and Length: MyString should track its allocated capacity and the current number of bytes it contains (its length).
  4. Operations:
    • new(): Create an empty MyString.
    • from_str(s: &str): Create a MyString from a string slice.
    • push_str(&mut self, s: &str): Append a string slice to the end of MyString. This operation should handle reallocation if the internal buffer is not large enough.
    • len(&self): Return the number of bytes in the MyString.
    • as_str(&self): Return a string slice (&str) view of the MyString's contents. This should panic if the internal buffer is not valid UTF-8.

Expected Behavior:

  • MyString::new() should result in an empty string with zero capacity and zero length.
  • MyString::from_str("hello") should create a MyString containing the bytes for "hello" and have appropriate capacity.
  • Appending a string that fits within the current capacity should not cause reallocation.
  • Appending a string that exceeds the current capacity should trigger a reallocation, typically doubling the capacity (or setting a minimum if capacity is zero).
  • len() should accurately report the byte count.
  • as_str() should return a valid &str when the contents are valid UTF-8.

Edge Cases to Consider:

  • Creating an empty string.
  • Appending an empty string.
  • Appending a string that exactly fills the remaining capacity.
  • Appending a string that requires multiple reallocations.
  • Ensuring UTF-8 validity is maintained after appends.

Examples

Example 1:

let mut my_string = MyString::new();
my_string.push_str("Hello");
my_string.push_str(" ");
my_string.push_str("World!");

println!("Length: {}", my_string.len()); // Expected: 12
println!("Content: {}", my_string.as_str()); // Expected: Hello World!

Explanation: We start with an empty string, then append "Hello", a space, and "World!". The total length is the sum of the byte lengths of these strings. as_str() should correctly reconstruct the full string.

Example 2:

let my_string = MyString::from_str("Rust");
println!("Length: {}", my_string.len()); // Expected: 4
println!("Content: {}", my_string.as_str()); // Expected: Rust

Explanation: Creating a string from a string slice initializes MyString with the provided content.

Example 3: (UTF-8)

let mut my_string = MyString::from_str("你好"); // "Ni hao" in Chinese (2 characters, 6 bytes)
my_string.push_str("世界"); // "World" in Chinese (2 characters, 6 bytes)

println!("Length: {}", my_string.len()); // Expected: 12
println!("Content: {}", my_string.as_str()); // Expected: 你好世界

Explanation: This demonstrates handling of multi-byte UTF-8 characters. The length returned is the byte length, not the character count.

Constraints

  • The internal storage for MyString must be a Vec<u8>.
  • All string operations must preserve UTF-8 validity.
  • Reallocation strategy: When capacity is insufficient, double the current capacity. If the current capacity is 0, set it to a small initial value (e.g., 8 bytes).
  • as_str() should use std::str::from_utf8 internally.

Notes

  • Consider the implications of Rust's ownership and borrowing rules for your MyString implementation.
  • The push_str method needs to carefully handle appending bytes and ensuring the resulting Vec<u8> remains valid UTF-8.
  • You will need to implement new, from_str, push_str, len, and as_str as public methods on your MyString struct.
  • You might find it helpful to think about how Rust's String manages its capacity and performs reallocations.
  • This challenge intentionally omits some advanced String features (like capacity(), push(), pop(), insert(), slicing, etc.) to keep the scope manageable. Focus on the core requirements listed above.
Loading editor...
rust