Implement Distributed Tracing in Go
In modern microservice architectures, understanding the flow of requests across different services is crucial for debugging, performance monitoring, and identifying bottlenecks. Distributed tracing provides a way to track a request as it propagates through a system. This challenge asks you to implement a basic distributed tracing system in Go.
Problem Description
Your task is to implement a simplified distributed tracing mechanism. You will create a Tracer interface and concrete implementations that allow you to:
- Start a span: A span represents a single operation or unit of work within a trace. Each span should have a unique ID, a parent ID (if it's not the root span), a name, and start and end timestamps.
- End a span: This marks the completion of an operation and records the end timestamp.
- Record span details: Spans should be collected and made available for analysis.
You need to implement this tracing system to work across multiple Go functions, simulating a distributed environment where a single request might trigger calls to different functions or potentially different (simulated) services.
Key Requirements:
- Define a
Tracerinterface. - Implement at least one concrete
Tracerthat collects spans in memory. - Each span must have:
TraceID(unique identifier for the entire trace)SpanID(unique identifier for this specific span)ParentID(ID of the parent span, 0 or empty for root spans)Name(descriptive name of the operation)StartTime(timestamp when the span started)EndTime(timestamp when the span ended)Duration(calculated as EndTime - StartTime)
- The
Tracershould manage the generation of unique IDs for traces and spans. - The tracing context (specifically
TraceID,SpanID, andParentID) must be propagated correctly across function calls. - The implemented
Tracershould provide a way to retrieve all recorded spans.
Expected Behavior:
When a trace is initiated, a root span is created. Subsequent operations that are traced will create child spans, linking them to their parent span via ParentID. All spans belonging to the same trace will share the same TraceID.
Edge Cases:
- Handling the root span (no parent).
- Ensuring unique IDs are generated.
- Correctly propagating context when no tracing is active in a called function.
Examples
Example 1: Simple Sequential Tracing
// Assume a Tracer implementation named 'myTracer' is initialized.
func main() {
ctx := context.Background()
tracedCtx, span := myTracer.StartSpan(ctx, "main_operation")
defer span.End()
// Simulate work
time.Sleep(50 * time.Millisecond)
tracedCtx, childSpan := myTracer.StartSpan(tracedCtx, "sub_operation_1")
defer childSpan.End()
time.Sleep(30 * time.Millisecond)
tracedCtx, anotherChildSpan := myTracer.StartSpan(tracedCtx, "sub_operation_2")
defer anotherChildSpan.End()
time.Sleep(20 * time.Millisecond)
}
// Expected output (simplified representation of collected spans):
// [
// {TraceID: "...", SpanID: "...", ParentID: "", Name: "main_operation", StartTime: ..., EndTime: ..., Duration: ...},
// {TraceID: "...", SpanID: "...", ParentID: "...", Name: "sub_operation_1", StartTime: ..., EndTime: ..., Duration: ...},
// {TraceID: "...", SpanID: "...", ParentID: "...", Name: "sub_operation_2", StartTime: ..., EndTime: ..., Duration: ...}
// ]
// Note: TraceID will be the same for all spans. SpanIDs will be unique. ParentID of sub_operation_1 and sub_operation_2 will be the SpanID of main_operation.
Example 2: Nested Tracing
// Assume a Tracer implementation named 'myTracer' is initialized.
func processData(ctx context.Context) {
ctx, span := myTracer.StartSpan(ctx, "process_data")
defer span.End()
time.Sleep(40 * time.Millisecond)
fetchRecords(ctx)
}
func fetchRecords(ctx context.Context) {
ctx, span := myTracer.StartSpan(ctx, "fetch_records")
defer span.End()
time.Sleep(25 * time.Millisecond)
}
func main() {
ctx := context.Background()
ctx, rootSpan := myTracer.StartSpan(ctx, "root_request")
defer rootSpan.End()
processData(ctx)
}
// Expected output (simplified representation of collected spans):
// [
// {TraceID: "...", SpanID: "...", ParentID: "", Name: "root_request", StartTime: ..., EndTime: ..., Duration: ...},
// {TraceID: "...", SpanID: "...", ParentID: "...", Name: "process_data", StartTime: ..., EndTime: ..., Duration: ...},
// {TraceID: "...", SpanID: "...", ParentID: "...", Name: "fetch_records", StartTime: ..., EndTime: ..., Duration: ...}
// ]
// Note: TraceID is consistent. ParentID of 'process_data' is 'root_request'. ParentID of 'fetch_records' is 'process_data'.
Example 3: Un-traced Function Call
// Assume a Tracer implementation named 'myTracer' is initialized.
func externalApiCall(ctx context.Context) {
// This function is NOT using myTracer.StartSpan
fmt.Println("Simulating an external API call...")
time.Sleep(15 * time.Millisecond)
}
func main() {
ctx := context.Background()
ctx, span := myTracer.StartSpan(ctx, "main_task")
defer span.End()
time.Sleep(30 * time.Millisecond)
externalApiCall(ctx) // This call does not start a new traced span
time.Sleep(10 * time.Millisecond)
}
// Expected output (simplified representation of collected spans):
// [
// {TraceID: "...", SpanID: "...", ParentID: "", Name: "main_task", StartTime: ..., EndTime: ..., Duration: ...}
// ]
// Explanation: Even though externalApiCall is called, it doesn't initiate a new span. The context is passed, but no new tracing operation begins within it.
Constraints
- Span and Trace IDs can be represented as strings.
- Timestamps should be
time.Timeobjects. - The in-memory storage for spans should be safe for concurrent access (though not strictly required for basic implementations, it's a good practice to consider).
- The number of spans collected in a single trace should not exceed 1000.
- The depth of nested spans should not exceed 50.
Notes
- Consider using
context.Contextto propagate tracing information. You'll need to store and retrieve trace/span context from the context. - For generating unique IDs, you can use libraries like
github.com/google/uuidor simpler timestamp-based approaches if you ensure uniqueness. - The
Spanstruct should be immutable once it's ended. - Think about how to handle the absence of tracing information in the context when
StartSpanis called. - For calculating duration,
EndTime.Sub(StartTime)will be useful. - When retrieving spans, you should get a copy or a read-only view to prevent external modification.