Home › Developer Docs › Garbage Collection Design
Garbage Collection Design
Table of Contents
- Problem Statement
- Solution: Reference Counted Values
- Implementation Timeline
- Risks & Mitigation
- Success Criteria
- References
Status: ✅ Complete
Priority: Critical (fixed 135 GB memory leak)
Approach: Reference Counting with Rc
Problem Statement
Without GC, the interpreter leaks memory catastrophically:
- String operations create new allocations every iteration
- Arrays cloned unnecessarily
- No memory reclamation until process exits
- Result: 135 GB memory usage in tests
Solution: Reference Counted Values
Phase 1: Rc-based Reference Counting
Use Rust’s Rc<T> (Reference Counted pointer) to share values:
// Before (current - memory leak)
pub enum Value {
String(String), // Owned - cloned on every use
Array(Vec<Value>), // Owned - entire array cloned
}
// After (with GC)
pub enum Value {
String(Rc<String>), // Shared - only pointer cloned
Array(Rc<Vec<Value>>), // Shared - cheap to clone
}
How Reference Counting Works
let s1 = Value::String(Rc::new("hello".to_string())); // RC = 1
let s2 = s1.clone(); // RC = 2 (cheap - only pointer cloned)
// ... s1 dropped → RC = 1
// ... s2 dropped → RC = 0 → String freed automatically
Benefits:
- ✅ Automatic memory reclamation when RC hits 0
- ✅ No manual tracking needed
- ✅ Zero-cost when values aren’t shared
- ✅ Immediate impact on memory usage
Limitations:
- ⚠️ Can’t handle reference cycles (rare in Aether)
- ⚠️ Slight overhead for RC bookkeeping
- ⚠️ Thread-local only (fine for single-threaded interpreter)
Implementation Steps
- Update Value enum - Wrap String/Array in Rc
- Update Value operations - Clone Rc instead of data
- Update Display/PartialEq - Dereference Rc
- Update interpreter - Handle Rc in evaluation
- Test - Verify memory usage drops dramatically
Code Changes
value.rs
use std::rc::Rc;
pub enum Value {
Int(i64),
Float(f64),
String(Rc<String>), // Rc — cheap clone, freed when RC = 0
Bool(bool),
Null,
Array(Rc<Vec<Value>>), // Rc — cheap clone
Dict(Rc<Vec<(Value, Value)>>), // Rc
Set(Rc<HashSet<Value>>), // Rc
Function { params: Vec<String>, body: Rc<Stmt>, closure: Rc<Environment> },
AsyncFunction { params: Vec<String>, body: Rc<Stmt>, closure: Rc<Environment> },
BuiltinFn { name: String, arity: usize, func: BuiltinFn },
Module { name: String, members: Rc<HashMap<String, Value>> },
StructDef { name: String, fields: Vec<String>, methods: MethodMap },
Instance { type_name: String, fields: Rc<RefCell<HashMap<String, Value>>>, methods: MethodMap },
Iterator(Rc<RefCell<IteratorState>>),
Promise(Rc<RefCell<PromiseState>>),
ErrorVal { message: String, stack_trace: String },
FileLines(Rc<RefCell<FileIterState>>),
}
Creating values
// String literal
Value::String(Rc::new("hello".to_string()))
// Array literal
Value::Array(Rc::new(vec![Value::Int(1), Value::Int(2)]))
// String concatenation (still creates new string, but old one freed)
let new_str = format!("{}{}", s1, s2);
Value::String(Rc::new(new_str)) // Old strings freed if RC = 0
Accessing values
match &value {
Value::String(s) => {
let str_ref: &str = s.as_ref(); // Dereference Rc
println!("{}", str_ref);
}
Value::Array(arr) => {
let vec_ref: &Vec<Value> = arr.as_ref();
for item in vec_ref {
// ...
}
}
}
Expected Memory Impact
Before GC:
String reverse("hello"):
- 5 intermediate strings allocated
- All kept in memory
- 135 GB for test suite
Test time: 60+ seconds
After GC:
String reverse("hello"):
- 5 intermediate strings allocated
- Old ones freed immediately (RC = 0)
- Expected: < 100 MB for test suite
Test time: Should be similar or faster
Estimated savings: 99%+ memory reduction
Testing Strategy
- Unit tests - Verify values work correctly with Rc
- Memory tests - Monitor memory usage during string ops
- Integration tests - Run full test suite and measure
- Benchmarks - Ensure no performance regression
Future Improvements (Phase 5+)
Cycle Detection
Reference counting can’t handle cycles:
let a = []
let b = []
a.push(b) // a → b
b.push(a) // b → a (cycle!)
// Both RC > 0 forever, never freed
Solution: Add cycle detector or upgrade to mark-and-sweep
Arena Allocator
For short-lived values, use arena:
struct ValueArena {
values: Vec<Value>,
}
// Allocate many values, drop all at once
impl Drop for ValueArena {
fn drop(&mut self) {
// All values freed together
}
}
Generational GC
Separate young/old generations:
struct Heap {
young: Vec<Value>, // Collected frequently
old: Vec<Value>, // Collected rarely
}
Concurrent GC
For multi-threaded future:
use std::sync::Arc; // Thread-safe RC
Implementation Timeline
Phase 1 (Now): Reference Counting
- Time: 1-2 hours
- Impact: Fix memory leak
- Tests: All 230 should still pass
Phase 2 (Phase 5): Optimizations
- Arena allocators
- Cycle detection
- Profiling and tuning
Phase 3 (Future): Advanced GC
- Mark-and-sweep
- Generational GC
- Concurrent GC
Risks & Mitigation
Risk 1: Breaking existing tests
- Mitigation: Run tests after each change, fix immediately
Risk 2: Performance regression
- Mitigation: Benchmark before/after, Rc overhead is minimal
Risk 3: Reference cycles leak memory
- Mitigation: Document limitation, add cycle detector later
Risk 4: Complex refactor
- Mitigation: Incremental changes, commit frequently
Success Criteria
- ✅ All ~693 tests still pass
- ✅ Memory usage < 100 MB (vs 135 GB)
- ✅ Test time ≤ 60 seconds (no regression)
- ✅ 0 clippy warnings
- ✅ Clean architecture maintained
References
- Rust Rc docs: https://doc.rust-lang.org/std/rc/
- “Crafting Interpreters” Chapter 26 (Garbage Collection)
- Python’s reference counting implementation
- Ruby’s GC evolution
Last Updated: April 17, 2026 Phase: 5 Complete (base) Status: Rc-based GC implemented and working, 99%+ memory reduction achieved