Accelerate Numerical Computations with SIMD in Go
This challenge focuses on harnessing the power of Single Instruction, Multiple Data (SIMD) operations within Go to significantly speed up common numerical computations. You will implement a function that performs element-wise addition on two slices of floating-point numbers, leveraging Go's vector package to achieve performance gains over a traditional loop.
Problem Description
Your task is to create a Go function that takes two slices of float32 and returns a new slice containing the element-wise sum of the input slices. The core requirement is to implement this operation using Go's vector package, which provides access to SIMD instructions. You should aim to make your SIMD-accelerated version demonstrably faster than a naive loop-based implementation.
Key Requirements:
- Implement a function
SIMDAddFloat32(a, b []float32) []float32. - This function must use the
github.com/golang-collections/go-vector/vectorpackage for SIMD acceleration. - The function should perform element-wise addition:
result[i] = a[i] + b[i]. - The output slice
resultshould have the same length as the input slices. - Handle cases where the input slices might have different lengths. The operation should only proceed up to the length of the shorter slice.
Expected Behavior:
The function should correctly compute the element-wise sum for all elements up to the minimum length of the two input slices.
Edge Cases:
- Empty Slices: If either or both input slices are empty, the function should return an empty slice.
- Mismatched Lengths: The function should correctly handle cases where
len(a)is not equal tolen(b). The output slice's length should bemin(len(a), len(b)).
Examples
Example 1:
Input:
a = [1.0, 2.0, 3.0, 4.0]
b = [5.0, 6.0, 7.0, 8.0]
Output:
[6.0, 8.0, 10.0, 12.0]
Explanation:
Each element is added to its corresponding element:
1.0 + 5.0 = 6.0
2.0 + 6.0 = 8.0
3.0 + 7.0 = 10.0
4.0 + 8.0 = 12.0
Example 2:
Input:
a = [1.1, 2.2, 3.3]
b = [4.4, 5.5, 6.6, 7.7, 8.8]
Output:
[5.5, 7.7, 9.9]
Explanation:
The operation stops at the length of the shorter slice (len(a) = 3).
1.1 + 4.4 = 5.5
2.2 + 5.5 = 7.7
3.3 + 6.6 = 9.9
Example 3:
Input:
a = []
b = [1.0, 2.0]
Output:
[]
Explanation:
One of the input slices is empty, so the result is an empty slice.
Constraints
- Input slices will contain
float32values. - The maximum length of input slices will be 1,000,000.
- The SIMD-accelerated implementation is expected to be at least 10% faster than a naive loop-based implementation for sufficiently large input slices (e.g., > 10,000 elements).
Notes
- You will need to
go get github.com/golang-collections/go-vector/vector. - Familiarize yourself with how to use the
vectorpackage to perform operations on slices of floating-point numbers. Consider how to unroll the loops and process data in chunks that align with vector register sizes. - For performance comparison, implement a simple, unoptimized loop-based addition function to benchmark against your SIMD implementation.
- Ensure your SIMD implementation gracefully handles the remainder of elements if the slice length is not a perfect multiple of the vector width.