Home › Developer Docs › Garbage Collection Design

Garbage Collection Design

Problem Statement
Solution: Reference Counted Values
Implementation Timeline
Risks & Mitigation
Success Criteria
References

Status: ✅ Complete Priority: Critical (fixed 135 GB memory leak) Approach: Reference Counting with Rc

Problem Statement

Without GC, the interpreter leaks memory catastrophically:

String operations create new allocations every iteration
Arrays cloned unnecessarily
No memory reclamation until process exits
Result: 135 GB memory usage in tests

Solution: Reference Counted Values

Phase 1: Rc-based Reference Counting

Use Rust’s Rc<T> (Reference Counted pointer) to share values:

// Before (current - memory leak)
pub enum Value {
    String(String),        // Owned - cloned on every use
    Array(Vec<Value>),     // Owned - entire array cloned
}

// After (with GC)
pub enum Value {
    String(Rc<String>),    // Shared - only pointer cloned
    Array(Rc<Vec<Value>>), // Shared - cheap to clone
}

How Reference Counting Works

let s1 = Value::String(Rc::new("hello".to_string())); // RC = 1
let s2 = s1.clone(); // RC = 2 (cheap - only pointer cloned)
// ... s1 dropped → RC = 1
// ... s2 dropped → RC = 0 → String freed automatically

Benefits:

✅ Automatic memory reclamation when RC hits 0
✅ No manual tracking needed
✅ Zero-cost when values aren’t shared
✅ Immediate impact on memory usage

Limitations:

⚠️ Can’t handle reference cycles (rare in Aether)
⚠️ Slight overhead for RC bookkeeping
⚠️ Thread-local only (fine for single-threaded interpreter)

Implementation Steps

Update Value enum - Wrap String/Array in Rc
Update Value operations - Clone Rc instead of data
Update Display/PartialEq - Dereference Rc
Update interpreter - Handle Rc in evaluation
Test - Verify memory usage drops dramatically

Code Changes

value.rs

use std::rc::Rc;

pub enum Value {
    Int(i64),
    Float(f64),
    String(Rc<String>),        // Rc — cheap clone, freed when RC = 0
    Bool(bool),
    Null,
    Array(Rc<Vec<Value>>),     // Rc — cheap clone
    Dict(Rc<Vec<(Value, Value)>>),  // Rc
    Set(Rc<HashSet<Value>>),        // Rc
    Function { params: Vec<String>, body: Rc<Stmt>, closure: Rc<Environment> },
    AsyncFunction { params: Vec<String>, body: Rc<Stmt>, closure: Rc<Environment> },
    BuiltinFn { name: String, arity: usize, func: BuiltinFn },
    Module { name: String, members: Rc<HashMap<String, Value>> },
    StructDef { name: String, fields: Vec<String>, methods: MethodMap },
    Instance { type_name: String, fields: Rc<RefCell<HashMap<String, Value>>>, methods: MethodMap },
    Iterator(Rc<RefCell<IteratorState>>),
    Promise(Rc<RefCell<PromiseState>>),
    ErrorVal { message: String, stack_trace: String },
    FileLines(Rc<RefCell<FileIterState>>),
}

Creating values

// String literal
Value::String(Rc::new("hello".to_string()))

// Array literal
Value::Array(Rc::new(vec![Value::Int(1), Value::Int(2)]))

// String concatenation (still creates new string, but old one freed)
let new_str = format!("{}{}", s1, s2);
Value::String(Rc::new(new_str))  // Old strings freed if RC = 0

Accessing values

match &value {
    Value::String(s) => {
        let str_ref: &str = s.as_ref();  // Dereference Rc
        println!("{}", str_ref);
    }
    Value::Array(arr) => {
        let vec_ref: &Vec<Value> = arr.as_ref();
        for item in vec_ref {
            // ...
        }
    }
}

Expected Memory Impact

Before GC:

String reverse("hello"):
- 5 intermediate strings allocated
- All kept in memory
- 135 GB for test suite

Test time: 60+ seconds

After GC:

String reverse("hello"):
- 5 intermediate strings allocated
- Old ones freed immediately (RC = 0)
- Expected: < 100 MB for test suite

Test time: Should be similar or faster

Estimated savings: 99%+ memory reduction

Testing Strategy

Unit tests - Verify values work correctly with Rc
Memory tests - Monitor memory usage during string ops
Integration tests - Run full test suite and measure
Benchmarks - Ensure no performance regression

Future Improvements (Phase 5+)

Cycle Detection

Reference counting can’t handle cycles:

let a = []
let b = []
a.push(b)  // a → b
b.push(a)  // b → a (cycle!)
// Both RC > 0 forever, never freed

Solution: Add cycle detector or upgrade to mark-and-sweep

Arena Allocator

For short-lived values, use arena:

struct ValueArena {
    values: Vec<Value>,
}

// Allocate many values, drop all at once
impl Drop for ValueArena {
    fn drop(&mut self) {
        // All values freed together
    }
}

Generational GC

Separate young/old generations:

struct Heap {
    young: Vec<Value>,  // Collected frequently
    old: Vec<Value>,    // Collected rarely
}

Concurrent GC

For multi-threaded future:

use std::sync::Arc;  // Thread-safe RC

Implementation Timeline

Phase 1 (Now): Reference Counting

Time: 1-2 hours
Impact: Fix memory leak
Tests: All 230 should still pass

Phase 2 (Phase 5): Optimizations

Arena allocators
Cycle detection
Profiling and tuning

Phase 3 (Future): Advanced GC

Mark-and-sweep
Generational GC
Concurrent GC

Risks & Mitigation

Risk 1: Breaking existing tests

Mitigation: Run tests after each change, fix immediately

Risk 2: Performance regression

Mitigation: Benchmark before/after, Rc overhead is minimal

Risk 3: Reference cycles leak memory

Mitigation: Document limitation, add cycle detector later

Risk 4: Complex refactor

Mitigation: Incremental changes, commit frequently

Success Criteria

✅ All ~693 tests still pass
✅ Memory usage < 100 MB (vs 135 GB)
✅ Test time ≤ 60 seconds (no regression)
✅ 0 clippy warnings
✅ Clean architecture maintained

References

Rust Rc docs: https://doc.rust-lang.org/std/rc/
“Crafting Interpreters” Chapter 26 (Garbage Collection)
Python’s reference counting implementation
Ruby’s GC evolution

Last Updated: April 17, 2026 Phase: 5 Complete (base) Status: Rc-based GC implemented and working, 99%+ memory reduction achieved

← Interpreter Backlog →

Aether Programming Language

A general-purpose, dynamically typed programming language with automatic memory management