Hone logo
Hone
Problems

Implementing a Concurrent Mark-and-Sweep Garbage Collector in Go

Go's runtime includes a sophisticated garbage collector that automatically reclaims memory that is no longer in use. This challenge involves building a simplified, concurrent mark-and-sweep garbage collector from scratch within a Go program. This exercise will help you understand the fundamental principles of garbage collection and the challenges of concurrency in managing memory.

Problem Description

Your task is to implement a basic mark-and-sweep garbage collector for a simulated heap in Go. This collector should operate concurrently with the "mutator" (the part of your program that allocates and accesses objects). The goal is to reclaim memory without significantly blocking the mutator threads.

Key Requirements:

  1. Simulated Heap: You will manage a collection of simulated objects. Each object will have a unique identifier, a data payload (e.g., a byte slice), and a list of references to other objects (represented by their IDs).
  2. Object Allocation: Implement a function Allocate(size int) that creates a new object on the simulated heap and returns its ID.
  3. Object Referencing: Implement a function AddReference(fromID, toID int) to establish a reference from one object to another.
  4. Root Set: The GC needs a way to identify the "root set" – objects that are directly accessible and should never be collected. For this challenge, assume a fixed set of root object IDs.
  5. Mark-and-Sweep Algorithm:
    • Mark Phase: Starting from the root set, traverse all reachable objects and "mark" them. This phase should be concurrent, meaning multiple goroutines can participate in marking.
    • Sweep Phase: After the mark phase, iterate through all objects in the heap. Any object that was not marked is considered garbage and should be reclaimed (removed from the heap). This phase can also be concurrent.
  6. Concurrency: The mutator (allocation and referencing operations) and the garbage collector must be able to run concurrently. The GC should pause the mutator for the shortest possible time during its operation (minimizing "stop-the-world" pauses).

Expected Behavior:

  • The Allocate function should return a unique ID for each new object.
  • AddReference should correctly record the reference.
  • The GC should run periodically (you'll need to decide on a trigger, e.g., after a certain number of allocations or a time interval).
  • Upon GC completion, only objects reachable from the root set should remain in the heap.
  • The GC should handle cycles of references correctly.

Edge Cases:

  • Allocating an object and then making it unreachable.
  • Creating circular references.
  • The root set being empty.
  • The heap being empty.

Examples

Example 1:

  • Setup:
    • Heap initially empty.
    • Root set: [1]
    • Allocate object 1 (size 10).
    • Allocate object 2 (size 20).
    • Add reference: 1 -> 2.
    • Allocate object 3 (size 30).
    • Add reference: 2 -> 3.
    • Make object 2 unreachable (e.g., by removing any references to it, though for this simplified model, assume it becomes unreachable if not explicitly kept alive by a root or another reachable object). In this example, object 2 is reachable from 1.
  • Trigger GC
  • Output:
    • Heap contains objects 1, 2, 3.
    • Explanation: Objects 1, 2, and 3 are all reachable starting from the root set [1]. Object 1 is a root. Object 2 is reachable from 1. Object 3 is reachable from 2.
Heap after GC:
Object ID: 1, Size: 10, References: [2]
Object ID: 2, Size: 20, References: [3]
Object ID: 3, Size: 30, References: []

Example 2:

  • Setup:
    • Heap initially empty.
    • Root set: [1]
    • Allocate object 1 (size 10).
    • Allocate object 2 (size 20).
    • Add reference: 1 -> 2.
    • Allocate object 3 (size 30).
    • Add reference: 2 -> 3.
    • Allocate object 4 (size 40).
    • Add reference: 1 -> 4.
    • (Implicitly, object 3 becomes unreachable if nothing points to it and it's not a root. In this case, object 2 points to 3, and 1 points to 2. So 3 is reachable.)
    • Let's modify this for a GC scenario:
    • Allocate object 1 (size 10).
    • Allocate object 2 (size 20).
    • Add reference: 1 -> 2.
    • Allocate object 3 (size 30).
    • // No reference from 2 to 3 in this specific scenario for demonstration.
    • Allocate object 4 (size 40).
    • Add reference: 1 -> 4.
    • Set root set to [1]
  • Trigger GC
  • Output:
    • Heap contains objects 1, 2, 4. Object 3 is collected.
    • Explanation: Object 1 is a root. Objects 2 and 4 are reachable from 1. Object 3 was allocated but no object in the reachable set (including roots) refers to it, so it's garbage.
Heap after GC:
Object ID: 1, Size: 10, References: [2, 4]
Object ID: 2, Size: 20, References: []
Object ID: 4, Size: 40, References: []

Example 3: Cyclic References

  • Setup:
    • Heap initially empty.
    • Root set: [1]
    • Allocate object 1 (size 10).
    • Allocate object 2 (size 20).
    • Allocate object 3 (size 30).
    • Add reference: 1 -> 2.
    • Add reference: 2 -> 3.
    • Add reference: 3 -> 2. // Cycle: 2 <-> 3
    • Add reference: 1 -> 3. // Explicitly add 3 to reachability from root
  • Trigger GC
  • Output:
    • Heap contains objects 1, 2, 3.
    • Explanation: Object 1 is the root. Object 2 is reachable from 1. Object 3 is reachable from 1 and 2. The cycle between 2 and 3 is correctly handled; neither is collected because they are both reachable.
Heap after GC:
Object ID: 1, Size: 10, References: [2, 3]
Object ID: 2, Size: 20, References: [3]
Object ID: 3, Size: 30, References: [2]

Constraints

  • The simulated heap can contain up to 10,000 objects.
  • Object sizes will range from 1 to 1024 bytes.
  • The number of references from a single object will not exceed 100.
  • The GC should aim to complete a full cycle (mark and sweep) within 100 milliseconds under typical load.
  • The mutator (allocation and referencing) should experience an average pause time of no more than 5 milliseconds per GC cycle.
  • Object IDs will be non-negative integers.

Notes

  • You'll need to design your heap data structure carefully to support concurrent access and efficient traversal. Consider using mutexes or other synchronization primitives.
  • For the "Mark" phase, think about how to distribute the marking work among multiple goroutines. A work-stealing queue or a simple channel-based approach could be effective.
  • For the "Sweep" phase, parallelizing the iteration and reclamation is key to performance.
  • Simulate the "mutator" by having separate goroutines that call Allocate and AddReference concurrently.
  • You'll need a mechanism to trigger the GC. For simplicity, you could start a goroutine that calls your GC function every X seconds or after Y allocations.
  • Consider how to represent the "marked" state of an object without significantly impacting its primary data. A separate map or a flag within the object structure could work.
  • The primary challenge is minimizing the time the mutator is stopped while the GC runs. This is where the concurrency aspect truly shines. Think about how to overlap GC work with mutator work where possible, or how to reduce the critical "stop-the-world" window.
Loading editor...
go