Go Job Queue System

This challenge involves building a robust job queue system in Go. A job queue is a fundamental component in many distributed systems, allowing for asynchronous task processing, load balancing, and fault tolerance. You will implement a basic, in-memory job queue that can handle multiple workers processing tasks concurrently.

Problem Description

Your task is to design and implement a job queue in Go that can:

Accept jobs: A mechanism to add new tasks (jobs) to the queue.
Process jobs: A set of worker goroutines that continuously pull jobs from the queue and execute them.
Handle concurrency: Ensure that jobs are processed concurrently by multiple workers without race conditions or data corruption.
Graceful shutdown: Allow the queue to be shut down gracefully, ensuring all pending jobs are either completed or explicitly handled as per a shutdown strategy.

Key Requirements:

Job Definition: A Job should be a type that can represent a unit of work. For simplicity, let's define a Job as a struct with an ID (string) and a Payload (interface{}).
Queue Interface: Define an interface for the job queue that includes methods for adding jobs, starting workers, and stopping the queue.
Worker Logic: Workers should repeatedly fetch jobs from the queue, execute a provided handler function, and acknowledge completion or report errors.
Concurrency Control: Use Go's concurrency primitives (channels, mutexes) to manage access to the queue and job distribution.
Error Handling: Implement a basic strategy for handling errors during job processing. For this challenge, if a job fails, it should be retried a few times (configurable) before being discarded.
Shutdown: The Stop method should signal workers to finish their current job and then exit. It should also wait for all active workers to finish before returning.

Expected Behavior:

When StartWorkers is called, a specified number of worker goroutines should begin processing jobs.
When AddJob is called, the job should be placed into the queue.
Workers should pick up jobs from the queue in a fair manner.
If a job handler returns an error, the job should be retried up to a defined limit.
When Stop is called, all workers should stop accepting new jobs, finish their current job, and then terminate. The Stop method should block until all workers have exited.

Edge Cases:

Adding jobs when the queue is empty.
Adding jobs when the queue is full (if a bounded queue is implemented, though not strictly required for this basic version).
A worker crashing or blocking indefinitely during job processing (for this challenge, assume workers don't crash unexpectedly).
Stopping the queue when no jobs are being processed.
Stopping the queue when jobs are still being processed.

Examples

Example 1: Basic Job Submission and Processing

Let's assume we have a simple JobHandler function that simulates work by sleeping for a short duration and printing messages.

Scenario:

Create a job queue.
Start 2 workers.
Add 5 jobs to the queue.
Stop the queue after a short delay.

Conceptual Input:

Queue: NewJobQueue()
Workers: queue.StartWorkers(2)
Jobs: queue.AddJob(...) for 5 different jobs
Stop: queue.Stop() after 2 seconds

Conceptual Output (order of worker messages might vary):

Worker 1: Starting job: job-1
Worker 1: Processing job: job-1...
Worker 1: Finished job: job-1
Worker 2: Starting job: job-2
Worker 2: Processing job: job-2...
Worker 2: Finished job: job-2
Worker 1: Starting job: job-3
Worker 1: Processing job: job-3...
Worker 1: Finished job: job-3
Worker 2: Starting job: job-4
Worker 2: Processing job: job-4...
Worker 2: Finished job: job-4
Worker 1: Starting job: job-5
Worker 1: Processing job: job-5...
Worker 1: Finished job: job-5
Worker 1: Shutting down...
Worker 2: Shutting down...
Queue stopped gracefully.

Explanation: Two workers pick up jobs concurrently. As one finishes, it picks up another. After all jobs are processed and the Stop signal is received, workers shut down.

Example 2: Job Failure and Retries

Consider a job that randomly fails.

Scenario:

Create a job queue.
Start 1 worker.
Add a job that has a 50% chance of failing. Set retry count to 3.
Stop the queue.

Conceptual Input:

Queue: NewJobQueue()
Workers: queue.StartWorkers(1)
Job: AddJob with a handler that might return an error.
Stop: queue.Stop()

Conceptual Output (if job fails initially, then succeeds):

Worker 1: Starting job: job-retry-1
Worker 1: Processing job: job-retry-1... (Attempt 1 of 3)
Worker 1: Job job-retry-1 failed, retrying...
Worker 1: Processing job: job-retry-1... (Attempt 2 of 3)
Worker 1: Job job-retry-1 failed, retrying...
Worker 1: Processing job: job-retry-1... (Attempt 3 of 3)
Worker 1: Job job-retry-1 failed, retrying...
Worker 1: Job job-retry-1 failed after multiple retries, discarding.
Worker 1: Shutting down...
Queue stopped gracefully.

Conceptual Output (if job succeeds on first attempt):

Worker 1: Starting job: job-retry-1
Worker 1: Processing job: job-retry-1... (Attempt 1 of 3)
Worker 1: Finished job: job-retry-1
Worker 1: Shutting down...
Queue stopped gracefully.

Explanation: The worker attempts to process the job. If the handler returns an error, the job is put back into a retry mechanism. The worker will attempt to re-process it up to the retry limit. If it still fails after the last attempt, it's discarded.

Constraints

The job queue should be implemented as an in-memory structure.
The number of workers can be specified.
The maximum number of retries for a failed job should be configurable (e.g., default to 3).
The Payload of a Job can be any Go type (interface{}).
The JobHandler function will accept a Job and return an error.
The Stop method should be idempotent (calling it multiple times has no adverse effect after the first call).

Notes

Consider using channels for communication between the queue and workers, and for signaling.
A sync.WaitGroup will be useful for ensuring all workers have finished during shutdown.
Think about how to handle the case where a job is added after Stop has been called. For this challenge, assume jobs added after Stop might be dropped or queued if the queue is still accepting. A robust implementation would likely reject jobs after Stop is initiated.
The JobHandler function can be simulated using time.Sleep and random error generation for testing retry logic.
Consider a basic retry strategy: if a job fails, it's added back to the front or back of the queue for retry. For simplicity, adding to the back is acceptable.

Go Job Queue System

Problem Description

Your task is to design and implement a job queue in Go that can:

Accept jobs: A mechanism to add new tasks (jobs) to the queue.
Process jobs: A set of worker goroutines that continuously pull jobs from the queue and execute them.
Handle concurrency: Ensure that jobs are processed concurrently by multiple workers without race conditions or data corruption.
Graceful shutdown: Allow the queue to be shut down gracefully, ensuring all pending jobs are either completed or explicitly handled as per a shutdown strategy.

Key Requirements:

Job Definition: A Job should be a type that can represent a unit of work. For simplicity, let's define a Job as a struct with an ID (string) and a Payload (interface{}).
Queue Interface: Define an interface for the job queue that includes methods for adding jobs, starting workers, and stopping the queue.
Worker Logic: Workers should repeatedly fetch jobs from the queue, execute a provided handler function, and acknowledge completion or report errors.
Concurrency Control: Use Go's concurrency primitives (channels, mutexes) to manage access to the queue and job distribution.
Error Handling: Implement a basic strategy for handling errors during job processing. For this challenge, if a job fails, it should be retried a few times (configurable) before being discarded.
Shutdown: The Stop method should signal workers to finish their current job and then exit. It should also wait for all active workers to finish before returning.

Expected Behavior:

When StartWorkers is called, a specified number of worker goroutines should begin processing jobs.
When AddJob is called, the job should be placed into the queue.
Workers should pick up jobs from the queue in a fair manner.
If a job handler returns an error, the job should be retried up to a defined limit.
When Stop is called, all workers should stop accepting new jobs, finish their current job, and then terminate. The Stop method should block until all workers have exited.

Edge Cases:

Adding jobs when the queue is empty.
Adding jobs when the queue is full (if a bounded queue is implemented, though not strictly required for this basic version).
A worker crashing or blocking indefinitely during job processing (for this challenge, assume workers don't crash unexpectedly).
Stopping the queue when no jobs are being processed.
Stopping the queue when jobs are still being processed.

Examples

Example 1: Basic Job Submission and Processing

Let's assume we have a simple JobHandler function that simulates work by sleeping for a short duration and printing messages.

Scenario:

Create a job queue.
Start 2 workers.
Add 5 jobs to the queue.
Stop the queue after a short delay.

Conceptual Input:

Queue: NewJobQueue()
Workers: queue.StartWorkers(2)
Jobs: queue.AddJob(...) for 5 different jobs
Stop: queue.Stop() after 2 seconds

Conceptual Output (order of worker messages might vary):

Worker 1: Starting job: job-1
Worker 1: Processing job: job-1...
Worker 1: Finished job: job-1
Worker 2: Starting job: job-2
Worker 2: Processing job: job-2...
Worker 2: Finished job: job-2
Worker 1: Starting job: job-3
Worker 1: Processing job: job-3...
Worker 1: Finished job: job-3
Worker 2: Starting job: job-4
Worker 2: Processing job: job-4...
Worker 2: Finished job: job-4
Worker 1: Starting job: job-5
Worker 1: Processing job: job-5...
Worker 1: Finished job: job-5
Worker 1: Shutting down...
Worker 2: Shutting down...
Queue stopped gracefully.

Explanation: Two workers pick up jobs concurrently. As one finishes, it picks up another. After all jobs are processed and the Stop signal is received, workers shut down.

Example 2: Job Failure and Retries

Consider a job that randomly fails.

Scenario:

Create a job queue.
Start 1 worker.
Add a job that has a 50% chance of failing. Set retry count to 3.
Stop the queue.

Conceptual Input:

Queue: NewJobQueue()
Workers: queue.StartWorkers(1)
Job: AddJob with a handler that might return an error.
Stop: queue.Stop()

Conceptual Output (if job fails initially, then succeeds):

Worker 1: Starting job: job-retry-1
Worker 1: Processing job: job-retry-1... (Attempt 1 of 3)
Worker 1: Job job-retry-1 failed, retrying...
Worker 1: Processing job: job-retry-1... (Attempt 2 of 3)
Worker 1: Job job-retry-1 failed, retrying...
Worker 1: Processing job: job-retry-1... (Attempt 3 of 3)
Worker 1: Job job-retry-1 failed, retrying...
Worker 1: Job job-retry-1 failed after multiple retries, discarding.
Worker 1: Shutting down...
Queue stopped gracefully.

Conceptual Output (if job succeeds on first attempt):

Worker 1: Starting job: job-retry-1
Worker 1: Processing job: job-retry-1... (Attempt 1 of 3)
Worker 1: Finished job: job-retry-1
Worker 1: Shutting down...
Queue stopped gracefully.

Constraints

The job queue should be implemented as an in-memory structure.
The number of workers can be specified.
The maximum number of retries for a failed job should be configurable (e.g., default to 3).
The Payload of a Job can be any Go type (interface{}).
The JobHandler function will accept a Job and return an error.
The Stop method should be idempotent (calling it multiple times has no adverse effect after the first call).

Notes

Consider using channels for communication between the queue and workers, and for signaling.
A sync.WaitGroup will be useful for ensuring all workers have finished during shutdown.
Think about how to handle the case where a job is added after Stop has been called. For this challenge, assume jobs added after Stop might be dropped or queued if the queue is still accepting. A robust implementation would likely reject jobs after Stop is initiated.
The JobHandler function can be simulated using time.Sleep and random error generation for testing retry logic.
Consider a basic retry strategy: if a job fails, it's added back to the front or back of the queue for retry. For simplicity, adding to the back is acceptable.