Hone logo
Hone
Problems

Concurrent Sweeping of Data

This challenge focuses on implementing concurrent sweeping in Go to efficiently process a large dataset. Imagine you have a massive collection of items, and for each item, you need to perform a potentially time-consuming operation. To speed this up, you want to process these items concurrently using multiple workers. The goal is to design a system that can distribute the sweeping task across several goroutines and collect the results reliably.

Problem Description

You are tasked with creating a Go program that simulates a concurrent sweeping process. You will be given a slice of "items" (represented by integers in this challenge) and a function that simulates a time-consuming operation on each item. Your program should:

  1. Distribute Work: Distribute the processing of items across a configurable number of worker goroutines.
  2. Process Items: Each worker goroutine will take an item, perform the given processing function on it, and produce a "result."
  3. Collect Results: Collect all the processed results from the workers. The order of results doesn't strictly matter, but all results from the input items must be accounted for.
  4. Handle Errors (Optional but Recommended): The processing function might return an error. Your sweeping mechanism should be able to handle and report these errors.

Key Requirements:

  • Use goroutines for concurrent processing.
  • Employ channels for communication between the main goroutine and worker goroutines (for sending items and receiving results/errors).
  • Implement a mechanism to signal when all items have been processed and all results have been collected.
  • The number of workers should be configurable.

Expected Behavior:

The Sweep function should take the input data, the number of workers, and the processing function as arguments. It should return a slice of processed results and any errors encountered. The function should return only after all input items have been processed.

Edge Cases:

  • An empty input slice.
  • A very large input slice.
  • A processing function that takes a significant amount of time.
  • A processing function that consistently returns errors.
  • Zero or one worker.

Examples

Example 1:

Input:
items = [1, 2, 3, 4, 5]
numWorkers = 2
processingFunc = func(item int) (int, error) {
    time.Sleep(100 * time.Millisecond) // Simulate work
    return item * 2, nil
}

Output:
results = [2, 4, 6, 8, 10] (order may vary)
err = nil
Explanation:
With 2 workers, items are distributed. For example, worker 1 might process 1, 3, 5 and worker 2 might process 2, 4. Each item is doubled, and no errors occur. The final results are collected.

Example 2:

Input:
items = [10, 20, 30]
numWorkers = 3
processingFunc = func(item int) (int, error) {
    time.Sleep(50 * time.Millisecond) // Simulate work
    if item == 20 {
        return 0, fmt.Errorf("failed to process item %d", item)
    }
    return item / 10, nil
}

Output:
results = [1, 3] (order may vary)
err = fmt.Errorf("multiple errors occurred, e.g., failed to process item 20")
Explanation:
Item 20 fails to process and returns an error. Items 10 and 30 are processed successfully. The `Sweep` function should return the successful results and indicate that errors occurred. The error returned by `Sweep` can be a single aggregated error or a collection of errors.

Example 3:

Input:
items = []
numWorkers = 4
processingFunc = func(item int) (int, error) {
    return item * 10, nil
}

Output:
results = []
err = nil
Explanation:
An empty input slice should result in an empty output slice and no errors.

Constraints

  • The input items slice can contain up to 1,000,000 integers.
  • The numWorkers will be between 0 and 100. If numWorkers is 0, it should default to 1.
  • The processingFunc might take between 10 milliseconds and 1 second to complete for each item.
  • The total execution time should be significantly less than processing items sequentially, especially for a large number of items.

Notes

  • Consider using sync.WaitGroup to coordinate the completion of all worker goroutines.
  • Think about how to gracefully shut down workers when all tasks are done.
  • For error handling, you might decide to:
    • Return the first error encountered.
    • Collect all errors and return them as a single error (e.g., using errors.Join or a custom error type).
    • Return a list of errors along with the successful results. For this challenge, returning a slice of successful results and a single aggregated error (or nil if no errors) is a good starting point.
  • The processingFunc signature is func(item int) (result int, err error). You will need to adapt this if your chosen data type for results differs, but for this problem, int is sufficient for both input and output of the processing function.
  • Test your solution with varying numbers of workers to see the performance impact.
Loading editor...
go