Hone logo
Hone
Problems

SIMD-Accelerated Vector Addition in Rust

This challenge focuses on leveraging Rust's SIMD (Single Instruction, Multiple Data) capabilities to significantly speed up a common numerical operation: vector addition. You will implement a function that adds two vectors of floating-point numbers, utilizing SIMD instructions to perform multiple additions in parallel, leading to substantial performance gains over traditional scalar processing.

Problem Description

Your task is to create a Rust function simd_vector_add that takes two slices of f32 (single-precision floating-point numbers) as input and returns a new Vec<f32> containing the element-wise sum of the input slices. The primary requirement is to use Rust's portable SIMD API (available through the std::simd module or the packed_simd_2 crate if targeting older Rust versions or requiring more explicit control) to perform the addition.

Key Requirements:

  1. SIMD Implementation: The core of the vector addition must be implemented using SIMD instructions. This means processing multiple elements of the input vectors simultaneously.
  2. Scalar Fallback: For any remaining elements that do not form a full SIMD vector, a standard scalar addition loop should be used.
  3. Input Handling: The function should gracefully handle input slices of different lengths. The output vector's length should be the minimum of the two input vector lengths.
  4. Return Value: The function should return a Vec<f32> containing the resulting summed elements.
  5. No Panics: The function should not panic. Handle unequal lengths by truncating the result to the shortest input length.

Expected Behavior:

Given two input slices a and b, the output c should satisfy c[i] = a[i] + b[i] for 0 <= i < min(a.len(), b.len()).

Edge Cases:

  • Empty Slices: If either input slice is empty, the output should be an empty Vec<f32>.
  • Unequal Length Slices: The addition should only proceed up to the length of the shorter slice. The excess elements of the longer slice are ignored.
  • Slice Lengths not divisible by SIMD width: The scalar fallback mechanism is crucial here.

Examples

Example 1:

Input:
a = &[1.0, 2.0, 3.0, 4.0, 5.0]
b = &[0.5, 1.5, 2.5, 3.5, 4.5]
Output:
[1.5, 3.5, 5.5, 7.5, 9.5]
Explanation:
Each element from `a` is added to the corresponding element from `b`. For instance, `1.0 + 0.5 = 1.5`.

Example 2:

Input:
a = &[10.0, 20.0, 30.0]
b = &[1.0, 2.0, 3.0, 4.0, 5.0]
Output:
[11.0, 22.0, 33.0]
Explanation:
The addition stops after 3 elements because slice `a` is shorter.

Example 3:

Input:
a = &[]
b = &[1.0, 2.0]
Output:
[]
Explanation:
One of the input slices is empty, so the output is an empty vector.

Constraints

  • Input slices will contain f32 values.
  • The length of the input slices can range from 0 up to usize::MAX.
  • Performance is a key consideration. The SIMD implementation should be demonstrably faster than a naive scalar loop for sufficiently large input sizes (e.g., lengths > 1024).

Notes

  • Consider using std::simd for modern Rust versions. If you are using an older version or require more explicit control, the packed_simd_2 crate is a good alternative, though it requires adding it as a dependency.
  • The width of SIMD operations (e.g., how many f32 elements can be processed at once) depends on the target architecture. Your code should be general enough to work across different architectures without hardcoding specific SIMD widths. The std::simd API provides abstractions for this.
  • Think about how to chunk the input slices into segments that match the SIMD vector width.
  • The scalar fallback loop is essential for correctness when the slice length is not a perfect multiple of the SIMD vector width.
Loading editor...
rust