Hone logo
Hone
Problems

Implementing a Generational Garbage Collector in Go

Building a robust garbage collector (GC) is a cornerstone of efficient memory management in programming languages. This challenge focuses on implementing a simplified generational garbage collector in Go. Understanding generational GC is crucial for optimizing performance, as it leverages the observation that most objects are short-lived.

Problem Description

Your task is to implement a basic generational garbage collector for a custom memory allocator in Go. This GC will divide the heap into two generations: a young generation and an old generation. New objects will be allocated in the young generation. When the young generation fills up, a minor garbage collection will occur, promoting surviving objects to the old generation and reclaiming memory in the young generation. Periodically, a major garbage collection will run on the old generation.

Key Requirements:

  1. Two Generations: The heap must be divided into at least two generations: young and old.
  2. Allocation: All new objects should be allocated in the young generation.
  3. Minor GC:
    • Triggered when the young generation is full.
    • Scans objects in the young generation.
    • Identifies live objects (reachable from roots).
    • "Promotes" live objects from the young generation to the old generation.
    • Reclaims memory in the young generation occupied by dead objects.
  4. Major GC:
    • Triggered less frequently (e.g., after a certain number of minor GCs, or when the old generation reaches a threshold).
    • Scans objects in the old generation.
    • Identifies live objects.
    • Reclaims memory in the old generation occupied by dead objects.
  5. Roots: The GC must be able to identify roots (e.g., global variables, stack frames). For this challenge, we'll simplify roots to be a predefined list of pointers.
  6. Object Representation: You'll need a way to represent objects in memory, including their size, whether they are marked as live, and importantly, pointers to other objects.

Expected Behavior:

  • The GC should correctly identify and reclaim unreachable objects.
  • The promotion of live objects from young to old generation should be handled accurately.
  • The system should not crash or exhibit memory leaks due to GC failures.

Edge Cases:

  • Circular References: The GC should handle circular references correctly, identifying them as reachable if they are part of a larger live structure.
  • Large Objects: Consider how large objects might be handled. For simplicity, you might initially allocate them directly in the old generation or have a strategy for them.
  • Pointer Updates: Assume pointers within objects can be updated. The GC needs to be aware of these pointers.

Examples

Let's define a simple object structure for demonstration.

// Assume this represents an object in our custom heap.
// It has a size, a flag to indicate if it's a pointer type,
// and a slice of pointers to other objects.
type HeapObject struct {
    ID         int
    Size       int
    IsPointer  bool
    References []*HeapObject // Pointers to other HeapObjects
    Data       []byte        // Example data
}

Example 1: Simple Allocation and Minor GC

Input:

A sequence of object allocations.

  • Allocate objA (small, pointer type)
  • Allocate objB (small, pointer type)
  • objA references objB
  • Allocate objC (small, pointer type)
  • objC references objA
  • Young generation is now full. Trigger Minor GC.
  • Global roots point to objC.

Output:

After Minor GC:

  • objC is live (reachable from roots).
  • objA is live (referenced by objC).
  • objB is live (referenced by objA).
  • All objects (objA, objB, objC) are promoted to the old generation.
  • Young generation is reclaimed.

Explanation: The minor GC starts from the roots (objC). It marks objC as live. Then, it traverses objC's references and marks objA as live. Finally, it traverses objA's references and marks objB as live. Since all objects are live, they are promoted to the old generation.

Example 2: Object Becoming Unreachable

Input:

  • Allocate objX (pointer type)
  • Allocate objY (pointer type)
  • objX references objY
  • Global roots point to objX.
  • Young generation is now full. Trigger Minor GC. (objX, objY promoted to old)
  • Now, objX is no longer referenced by any roots, and objX is no longer a root itself.
  • Allocate objZ (pointer type)
  • Young generation is now full. Trigger Minor GC.

Output:

After the second Minor GC:

  • objZ is the only root.
  • objX is now unreachable.
  • objY is also unreachable (as it was only referenced by objX).
  • objZ is marked live and promoted to the old generation.
  • objX and objY are garbage and their memory in the young generation is reclaimed.

Explanation: The first minor GC promotes objX and objY. In the second phase, objX is no longer a root. The GC starts from the new root (objZ). objX and objY are not reachable from objZ and thus are collected.

Example 3: Circular Reference Handling

Input:

  • Allocate objP (pointer type)
  • Allocate objQ (pointer type)
  • objP references objQ
  • objQ references objP
  • Global roots point to objP.
  • Young generation is now full. Trigger Minor GC.

Output:

After Minor GC:

  • objP is live (from roots).
  • objQ is live (referenced by objP).
  • objP is also live (referenced by objQ - this is the cycle).
  • objP and objQ are promoted to the old generation.

Explanation: The minor GC starts from the root (objP), marking it live. It then follows the reference from objP to objQ, marking objQ as live. When it processes objQ's references, it finds a reference back to objP. Since objP is already marked live, this cycle is correctly identified as part of the live set.

Constraints

  • Heap Size: The total heap size (young + old) will not exceed 10MB.
  • Object Size: Individual object sizes (including data and overhead) will not exceed 1KB.
  • Number of Objects: The total number of concurrently allocated objects will not exceed 10,000.
  • Pointer Density: The number of references per object will not exceed 10.
  • Root Set Size: The number of initial roots will not exceed 50.
  • Performance: The GC should aim to complete a minor GC within 50ms and a major GC within 200ms for the given constraints.

Notes

  • You will need to design a custom memory allocator that manages blocks of memory for the heap.
  • Consider how you will represent the "generations" within your heap. This could be contiguous memory regions or separate allocations.
  • Think about how to represent "liveness" or "marked" status for objects during GC sweeps. A simple boolean flag per object might suffice.
  • For this challenge, you can assume that object layouts are fixed once allocated, and you don't need to handle complex object resizing or in-place updates that might invalidate GC assumptions.
  • Focus on the core GC logic: marking reachable objects and reclaiming unreachable ones. The exact details of memory allocation and deallocation can be abstracted to a degree.
  • You will need to define a clear interface for your garbage collector, allowing for object allocation and triggering GC cycles.
Loading editor...
go