Introduction

Hebi is a work-in-progress dynamically-typed programming language, heavily inspired by other languages in the same category: Lua, Python, and JavaScript.

The source code is available at github.com/jprochazk/hebi4.

Language guide

Hebi’s syntax is inspired by, and borrows from:

Rust
JavaScript
Lua
Python
Gleam

It’s in the same family of bracketed programming languages as Rust (the BCPL family) with some ML influences, such as implicit returns.

Note: Hebi’s implementation is still heavily in flux, and syntax may still change at any point!

// disclaimer: this module doesn't actually exist :)
import {service, response, header, body} from "std/http/server"

fn fib(n) {
  if n < 2 { n }
  else { fib(n-1) + fib(n-2) }
}

service({
  "/fib/:num": fn(req) {
    let num = parse_int(req.params.num)
    if num > 40 {
      return response(400)
    }

    let result = fib(num)

    response(200)
      |> header("Content-Type", "text/plain")
      |> body(to_str(result))
  },
})

Comments

It’s often useful and important to annotate your source code with some extra information for yourself and other people. The only form of a comment you can write in Hebi is:

#![allow(unused)]
fn main() {
// A line comment.
}

There are no block comments or special doc comments.

Statements

The base unit of syntax in Hebi is a statement. Every program consists of statements. Each statement does something, and produces no value.

Variables

#![allow(unused)]
fn main() {
let v = nil
}

Variables must have both a name and a value. let without value is a syntax error:

$ hebi4 -e 'let v'

expected '=', found '<eof>'
1 |  let v
  |       ^

All variables in Hebi are mutable, meaning you can change their value by assigning to them:

let v = nil
print(v) // prints "nil"
v = 10
print(v) // prints "10"

There are two kinds of variables:

Module variables (module-scoped globals)
Local variables (block-scoped)

Variables scoped to a module can be accessed by other modules. They are also shared by all functions in the same module.

Block-scoped variables are only visible to other statements in the same scope. They can also be captured by functions, which is discussed in the next section.

Functions

#![allow(unused)]
fn main() {
fn f() {

}
fn f(a, b, c) {}
}

Functions may also be nested.

#![allow(unused)]
fn main() {
fn outer() {
  fn inner() {}
}
}

All functions are also closures. Functions in the module scope close over the entire module, while functions in inner blocks close over only local variables and functions visible to them.

#![allow(unused)]
fn main() {
fn f() {
  print(v)
}

let v = "hi"
}

You can of course call functions, if they’re in scope:

#![allow(unused)]
fn main() {
f() // prints "hi"
}

#![allow(unused)]
fn main() {
fn is_even(n) {
  if n == 1 { false }
  else { is_odd(n-1) }
}

fn is_odd(n) {
  if n == 1 { true }
  else { is_even(n-1) }
}

print(is_even(10)) // prints "true"
}

When a local variable is captured, its value is copied into the function’s closure environment. This is different from how closures work in JavaScript, Lua, and Python.

#![allow(unused)]
fn main() {
fn counter() {
  let v = 0

  return {
    get: fn() { v }
    set: fn(n) { v = n }
  }
}

let c = counter()

print(c.get()) // 0
c.set(10)
print(c.get()) // still 0
}

If you want a shared mutable closure environment, you can wrap your data in a table:

#![allow(unused)]
fn main() {
fn counter() {
  let s = { v: 0 }

  return {
    get: fn() { s.v }
    set: fn(n) { s.v = n }
  }
}

let c = counter()

print(c.get()) // 0
c.set(10)
print(c.get()) // 10
}

Imports

Modules may import other modules.

import {a, b, c} from "foo"

They are evaluated dynamically, wherever the import appears. To import something lazily, you can wrap it in a function:

fn f() {
  import {a, b, c} from "foo"

  foo()
}

You can either import parts of a module, or the entire module:

import {a,b,c} from "foo" // import only parts of the module
import {a,b,c} from foo // same thing, but the module specifier is an identifier

import foo from "foo" // import entire module
import foo from foo // same as above
import foo // shorthand for `foo from foo`

Expressions

Statements operate on values, which are produced by expressions.

Literals

Literals (also known as constants) are values which are taken literally from the source code.

Nil

The “nothing” value:

v = nil

Numbers

There are two kinds of numbers:

Floating point numbers, and
Integers

They are distinct types, but are generally interchangeable:

let int = 2
let float = 3.14

print(int * float)

If you want to convert between them explicitly, you can use the built-in to_int and to_float.

Calling to_int on a floating point has the same effect as rounding down the float.

Strings

A string is a sequence of characters. Strings in Hebi use UTF-8 encoding. They are also immutable.

let v = "hi!"

Strings also support escape sequences, which have special meaning:

	meaning
`\a`	alert
`\b`	non-destructive backspace
`\v`	vertical tab
`\f`	form feed
`\n`	new line
`\r`	carriage return
`\t`	horizontal tab
`\'`	single quote
`\"`	double quote
`\\`	backslash
`\e`	escape
`\x`	hex code
`\u`	unicode byte sequence

\x and \u can be used to directly embed byte sequences in strings:

\x0A is equivalent to \n
\u{1F602} is an emoji (😂)

Note that they must still result in valid utf-8.

Many of these escapes are pretty archaic and aren’t really useful anymore. An example of that is \a (alert), which was used to literally produce some kind of sound when the computer detected it in standard output.

Use these wisely!

Containers

Hebi has two built-in container types: arrays and tables.

Array

An array is a dynamically-sized sequence of values.

let v = [0,1,2]

You can access the values in an array using an index:

v[0] + v[1]

Only integers are valid indices.

Tables

A table is a sequence of key-value pairs. They are sometimes called hash maps or associative arrays.

let v = {a:0, b:1, c:2}

Tables are indexed by strings.

print(v["a"]) // 0

Other types must be explicitly converted to strings before being used as keys:

let k = to_str(0)
v[k] = "hi"!

print(v[k])

You can also use the field syntax to index tables:

print(v.a) // 0

Blocks

let v = do { 1 + 1 }

You can use these to limit the scope of things. The final expression in the block is its value. This is known as an implicit return.

Implicit returns

In Hebi, the final expression in any block of code is its return value, just like in Rust.

#![allow(unused)]
fn main() {
fn foo() {
  "hi"
}

print(foo()) // prints "hi"
}

#![allow(unused)]
fn main() {
print(do { "hi" }) // prints "hi"
}

Control flow

Hebi is a Turing-complete language, offering loops and conditional expressions:

#![allow(unused)]
fn main() {
let i = 0
loop {
  print(i)
  if i == 10 {
    print("end")
    break
  }
  i += 1
}
}

If expressions

#![allow(unused)]
fn main() {
fn fizzbuzz(n) {
  if n % 3 == 0 and n % 5 == 0 {
    "fizzbuzz"
  } else if n % 3 == 0 {
    "fizz"
  } else if n % 5 == 0 {
    "buzz"
  }
}

let i = 0
loop {
  if i > 100 { break }
  printf("{i} {v}", {i, v:fizzbuzz(i)})
  i += 1
}
}

if expressions also support implicit returns:

#![allow(unused)]
fn main() {
fn test(number) {
  if number > 10 {
    "That's more than I can count."
  } else {
    to_str(number)
  }
}

print(test(1)) // prints "1"
print(test(100)) // prints "That's more than I can count."
}

Loops

Currently, the only loop kind is an infinite loop:

#![allow(unused)]
fn main() {
loop {
  // ...
}
}

You can break out of it, or continue to go back to the start of the loop.

That’s it!

As you can tell, some stuff is missing. Hebi is quite minimal right now, and while it will stay minimal, it shouldn’t stay this minimal.

Compiler Architecture

Compiler

We use a two-pass compiler:

Source code -> AST
AST -> Bytecode

Parser

A relatively simple lexer and LL(1) recursive-descent parser.

The lexer is generated using Logos, with a small wrapper around it. The parser is handwritten, using a few helper functions, and outputs an AST. The AST representation is more interesting: see ast.md.

Code generation

This pass behaves like a tree-walking interpreter, mixing what would traditionally be separate passes:

Variable resolution
Constant folding
Register allocation
Dead code elimination

The resulting bytecode is fixed-width at 32 bits per instruction, and entirely register-based. Some instructions emit an additional data slot, which is used to store extra operands, such as the inline cache for table operations.

Variable resolution

Whenever we come across a declaration, we add it to the list of variables. There are no “globals”, so if a name can’t be resolved to a symbol later down the line, we know it doesn’t exist, and produce an error message.

TODO: This is not entirely true, there are (will be) globals in REPL mode.

For functions to be able to recursively call each other, we do a pre-traversal of all statement lists, where we collect all function declarations.

Constant folding and register allocation

The central type for this is Value, which can hold a few different types, including a special dynamic type, which holds a register.

All expression evaluation yields Values. If an expression can’t be evaluated immediately, the code generator instead emits bytecode for evaluating that expression at runtime, into some register, which it then passes back up to its caller as a dynamic.

If an evaluation requires that a constant value exists in a register, then the value may be materialized into a register. For example, the expression return 1 will initially produce a constant value 1, and the return expression will materialize it into an lint r0, 1; ret.

Expressions may also be partially evaluated. For example, in v == 0, we can immediately evaluate the 0, and emit a type-specialized instruction which takes a numeric constant for its rhs. That instruction then skips the type check for rhs.

Register allocation essentially just allocates fixed stack slots for instruction operands ahead of time. There are at most 256 registers (u8), one of which (r0) is always used as the return value slot.

When a register is needed, the allocator increments the current register by 1, and returns it. When a register is no longer needed, it is freed, and the allocator decrements the current register by 1.

Some care is taken to ensure register re-use as much as possible. For example, when assigning to a variable, we want the destination operand to re-use the variable’s register, instead of allocating a temporary one, and issuing a mov to the variable later.

Function calls are a bit special. We don’t want the VM to have to copy arguments to a new call frame, instead, the arguments should already be in the right place at the time of the call. To do this, the code generator allocates contiguous registers for the return slot and call arguments.

The “holy grail” is that something like:

#![allow(unused)]
fn main() {
let table = {}
let value = 100
table["constant_key"] = value
}

produces the following bytecode:

ltable r1, 0      # empty table into `r1`
lint r2, 100      # store `100` into `r2`
skeyc r1, l0, r2  # store `r2` in `r1` at key `l0`

Dead code elimination

We keep track of basic block boundaries during compilation. A basic block entry is either the beginning of a function, or a jump target. A basic block exit is any control flow instruction, such as jmp or ret.

When inside a basic block, we don’t do anything special. But when encountering a BB exit, we mark any subsequent generated code as “unreachable”. Unreachable code is never even emitted, though we still traverse the AST as usual.

Instruction specialization

Some instructions have variants which perform fewer checks or allow for using some sort of shortcut to achieve the same outcome.

For example, there is a generic call instruction, with variants:

fastcall, which directly dispatches a function by ID and performs no type or arity checking.
hostcall, which dispatches a host function from the language’s core library.

In the VM, values may be stored in:

Registers
Upvalues
Module variables
Literal slots

Some instructions are specialized to accept these as operands directly, others require materializing these into registers first.

Errors

Currently, the entire compilation pipeline exits immediately upon encountering the first error.

Errors store a message and a span. Given the source code, they can be rendered into some nice output:

invalid number of arguments, expected 3 but got 1
12 |  f(0)
   |   ^^^

Spans are carried from the lexer all the way to the VM.

AST Representation

AST

A traditional AST would use a “tree of pointers” representation.

For example, let’s consider a simple binary expression:

#![allow(unused)]
fn main() {
1 + (1 * 2)
}

The AST for it would look like:

#![allow(unused)]
fn main() {
Binary {
    op: Add,
    lhs: Int(1),
    rhs: Binary {
        op: Mul,
        lhs: Int(1),
        rhs: Int(2)
    }
}
}

Using a basic tree structure like:

#![allow(unused)]
fn main() {
// size = 16 bytes, alignment = 8 bytes
enum Expr {
    Binary(Box<Binary>),
    Int(Box<Int>),
    // other nodes
}

// size = 40 bytes, alignment = 8 bytes
struct Binary {
    op: BinaryOp,
    lhs: Expr,
    rhs: Expr,
}

// size = 1 byte, alignment = 1 byte
enum BinaryOp { Add, Sub, Mul, Div }

// size = 8 bytes, alignment = 8 bytes
struct Int {
    value: u64,
}
}

The total memory footprint for our example AST would be:

#![allow(unused)]
fn main() {
Expr box Binary { // 16 bytes
    op: Add, // 8 bytes (due to alignment)
    lhs: Expr::Int(box), // 16 bytes + 8 bytes
    rhs: Expr::Int(box Binary { // 16 bytes
        op: Mul, // 8 bytes (due to alignment)
        lhs: Expr::Int(box Int(1)), // 16 bytes + 8 bytes
        rhs: Expr::Int(box Int(2)), // 16 bytes + 8 bytes
    }),
}

// 16*5 + 8*4 = 112 bytes!
}

Flat layout

A common method to reduce the size of ASTs is to store nodes in an array, and use 32-bit indices instead of “pointers”. Some people call this a “flat” layout:

#![allow(unused)]
fn main() {
type ExprArray = Vec<Expr>;
struct ExprId(u32);

// size = 32 bytes, alignment = 8 bytes
enum Expr {
    Binary(Binary), // size = 12 bytes, alignment = 4 bytes
    Int(Int), // size = 8 bytes, alignment = 8 bytes
    // other nodes, one of which brings size to 32 bytes
}

// size = 12 bytes, alignment = 4 bytes
struct Binary {
    op: BinaryOp, // 4 bytes (rounded to 4)
    lhs: ExprId, // 4 bytes
    rhs: ExprId, // 4 bytes
}

// size = 1 byte, alignment = 1 byte
enum BinaryOp { Add, Sub, Mul, Div }

// size = 8 bytes, alignment = 8 bytes
struct Int {
    value: u64,
}
}

Note that Expr above would grow/shrink depending on what’s stored inside it, so for the sake of this argument let’s imagine we add some nodes, and it is exactly 32 bytes.

The memory footprint of our AST is now:

#![allow(unused)]
fn main() {
[
  0: Expr::Int(1) // 32 bytes
  1: Expr::Int(1) // 32 bytes
  2: Expr::Int(2) // 32 bytes
  3: Expr::Binary(Mul, ExprId(1), ExprId(2)) // 32 bytes
  4: Expr::Binary(Add, ExprId(0), ExprId(3)) // 32 bytes
]

// 32*5 = 160 bytes
}

Looks like that made things worse… Storing all variants inline in the enum means the enum is as large as its largest variant, which is not great. The alternative is to Box all variants, as was done with the basic AST:

#![allow(unused)]
fn main() {
// size = 16 bytes, alignment = 8 bytes
enum Expr {
    Binary(Box<Binary>), // size = 8 bytes, alignment = 8 bytes
    Int(Int), // size = 8 bytes, alignment = 8 bytes
    // other nodes
}
}

Let’s avoid boxing the Int variant, since it’s already small enough. The memory footprint is now:

The memory footprint of our AST is now:

#![allow(unused)]
fn main() {
[
  0: Expr::Int(1) // 16 bytes
  1: Expr::Int(1) // 16 bytes
  2: Expr::Int(2) // 16 bytes
  3: Expr::Binary(box {Mul, ExprId(1), ExprId(2)}) // 16 bytes + 12 bytes
  4: Expr::Binary(box {Add, ExprId(0), ExprId(3)}) // 16 bytes + 12 bytes
]

// 16*5 + 12*2 = 104 bytes
}

Okay, that’s slightly better. The downside is that now each variant takes up space in the array, but also separately somewhere on the heap. We’ve destroyed memory locality at the cost of a lower memory footprint.

Super-flat layout

A few things bother me about the above layout:

For a Binary expression, we need to store two “pointers” of some kind to the two sub-nodes.
If we want to reduce the size of Expr, we must wrap every large node in Box.

This gets much worse if we need a dynamic number of sub-nodes. Each Vec adds 24 bytes to a syntax node! Then account for optional nodes, and storing spans.

Surely we can do better than that. We can take advantage of the fact that our syntax tree nodes have two distinct “shapes”:

A node with a fixed number of sub-nodes, and optionally some small inline data
A node with no sub-nodes, and only some inline data

If we force the sub-nodes of a node to be contiguous, we only need to store the index of its first sub-node. The index of the 2nd (and beyond) can be inferred by knowing their relative order, e.g. for Binary, lhs comes before rhs, so lhs == nodes[index+0] and rhs == nodes[index+1].

To store nodes with a dynamic number of sub-nodes, we’ll also need a length field. We can combine the two in three different ways:

Only fixed number of sub-nodes
Only dynamic number of sub-nodes
A “prefix” of some fixed number of sub-nodes, and a “tail” of a dynamic number of sub-nodes, with the tail starting after the prefix.

For now, we’ll only support u32 indices to nodes, and u24 lengths. To know what kind of node we have, we’ll still need a tag. There aren’t many kinds of nodes, so a u8 tag will be enough.

Under this rule set, almost all the structural information is implicit, and we still have plenty of flexibility to store any kind of syntax node, not just those with a fixed number of sub-nodes, all in just 64 bits.

A compiler needs more than just structural information. To store additional data, we’ll use secondary storage, and store indices in the packed nodes.

Spans are stored separately, with each node getting its own span. The index of a node is also its index into the span array.

Here’s the full in-memory layout of our syntax Node (in pseudo-Rust):

#![allow(unused)]
fn main() {
struct Node {
    tag: Tag,
    repr: Repr,
}

#[repr(u8)]
enum Tag {
    Int,
    Binary,
    // other nodes
}

union Repr {
    empty: { _padding: u56 },
    fixed: { next: u32, val: u24 },
    variable: { next: u32, len: u24 },
    mixed: { next: u32, tail_len: u24, },
    inline: { val: u56 },
}
}

To represent an optional node, we can use an empty None node, and treat anything other than None as Some. This means that Option<Expr> in our representation can still be stored in just 8 bytes.

Okay, now that we have our layout, what is the total memory footprint for 1 + (1 * 2)?

#![allow(unused)]
fn main() {
[
  0: Node(Int, 1) // 8 bytes
  1: Node(Int, 2) // 8 bytes
  2: Node(Int, 1) // 8 bytes
  3: Node(Binary, NodeId(0), Mul) // 8 bytes
  4: Node(Binary, NodeId(2), Add) // 8 bytes
]

// 5*8 = 40 bytes
}

That made a huge difference. The memory footprint of this “super-flat” AST is less than 40% of a flat AST.

This has a compounding effect. It gets even better on more complex ASTs, and larger syntax trees.

And that’s not all of it. We’ve cut our memory usage in half, but we’ve also completely removed all boxing, so the entire AST is in one contiguous buffer. This buffer can double its size every time it needs to grow, reducing the number of allocations done during parsing to nearly zero comparsed to a tree of pointers.

Finally, this kind of AST is great for memory locality:

Each node fits exactly within a 64-bit register
Accessing different parts of it involves just a few bit shifts
We can store 8 nodes in a single cache line!
Every node is in the same array, and all related nodes are co-located in memory.

One major downside is that using this kind of AST introduces a lot of complexity.

In Hebi, I’ve resorted to generating all the code required to construct and traverse the AST. For the few nodes I have, it’s generating nearly 3000 lines of code, which is a massive increase over the simple “tree of pointers” AST.

Conclusion

I do not think I invented this. I just haven’t been able to find any information about similar approaches elsewhere.

Unsafe Code Philosophy

Unsafe code

This project uses a lot of unsafe code.

Care is taken to ensure soundness and most importantly correctness of the compiler and VM. Every corner of the implementation is tested. Every test must pass miri. Unsafe code is paired with SAFETY comments as appropriate. Code exposed to users must not be able to cause UB unless it is marked unsafe.

The ultimate goal here is to match or exceed the performance of programming languages in the same category. That’s difficult to do: Most of those languages are written in C, or C++, which are inherently unsafe languages, which rely on a lot of UB for their performance characteristics.

To give a concrete example, the “traditional” way to write a bytecode interpreter in Rust is to use a loop with a large match statement in it:

#![allow(unused)]
fn main() {
fn interpreter(bytecode: &[Instruction], stack: &mut [Value]) {
  let pc = 0;
  loop {
    match bytecode[pc] {
      Instruction::Nop => {}
      Instruction::Mov { src, dst } => {
        stack[dst] = stack[src];
      }
      Instruction::Jmp { offset } => {
        pc += offset;
        continue;
      }
    }
    pc += 1;
  }
}
}

This is perfectly functional, and it’s fully safe. Unfortunately, it’s also hopelessly slow:

Stack accesses require bounds checks. Our bytecode compiler knows how large stack must be at the time it generates bytecode. So why are we doing any bounds checks?
Reading an instruction also requires a bounds check.

The worst part is that the loop+match pattern (probably) won’t be recognized by LLVM, and optimized to “threaded” dispatch, where each instruction handler directly jumps to the next one. Instead, they’ll jump to the start of the loop, and then jump to the next instruction handler.

In C, you can do something like this:

void interpreter(Instruction* bytecode, Value* stack) {
  static const void* jump_table[] = {&&nop, &&mov, &&jmp, &&halt};

  Instruction insn = *bytecode;
  goto *jump_table[opcode(insn)];

  nop: {
    insn = *bytecode++;
    goto *jump_table[opcode(insn)];
  };

  mov: {
    Reg src = operand_a8(insn);
    Reg dst = operand_b8(insn);
    stack[dst] = stack[src];

    insn = *bytecode++;
    goto *jump_table[opcode(insn)];
  };

  jmp: {
    Offset offset = operand_a24(insn);

    bytecode += offset;
    insn = *bytecode;
    goto *jump_table[opcode(insn)];
  };

  halt: {
    return;
  };
}

There are a few ways in which this is faster:

Each instruction handler directly jumps to the next one using a computed goto.
There are no bounds checks, not when reading an instruction nor when accessing the stack.

Now, the problem with writing all your code in C is that all your code is in C! I like Rust. It has a lot of nice features, and most of the compiler can use safe code. Even a lot of the VM implementation can use safe code. It feels bad to have to give that up, just so that the instruction dispatch method used can be “optimal”.

We can mostly have our cake and eat it too, using tail calls.

#![allow(unused)]
fn main() {
enum Control {
  Stop,
  Error,
}

const JUMP_TABLE: [fn(Ip, Sp, Instruction) -> Control; 256] = [
  nop,
  mov,
  jmp,
  halt,
  // ...
];

fn dispatch_current(ip: Ip, sp: Sp) -> Control {
  let insn = ip.read();
  JUMP_TABLE[opcode(insn)](ip, sp)
}

fn dispatch_next(ip: Ip, sp: Sp) -> Control {
  let ip = ip.next();
  dispatch_current(ip, sp)
}

fn nop(ip: Ip, sp: Sp, insn: Instruction) -> Control {
  dispatch_next(ip, sp)
}

fn mov(ip: Ip, sp: Sp, insn: Instruction) -> Control {
  let src = operand_a8(insn);
  let dst = operand_b8(insn);

  let tmp = sp.read(src);
  sp.write(dst, tmp);

  dispatch_next(ip, sp)
}

fn jmp(ip: Ip, sp: Sp, insn: Instruction) -> Control {
  let offset = operand_a24(insn);

  let ip = ip.add(offset);

  dispatch_current(ip, sp)
}

fn halt(ip: Ip, sp: Sp, insn: Instruction) -> Control {
  Control::Stop
}
}

The above code optimizes to something very close to computed goto. Every instruction handler tail-calls the next one. The downside here is that each function is still a full function, prelude and all. It needs to save/restore volatile registers, and sometimes allocate stack space for additional intermediates.

With this, we still can’t hope to match the performance of an interpeter in hand-written assembly, but we can get pretty close to or even beat one which is written in C!

VM

In release mode, the VM is written to take advantage of tail call optimization. It falls back to loop+match in debug mode.

The VM is pretty much entirely unsafe code. It relies on invariants from the compiler and semantics of the language to maintain a high standard of quality for the generated assembly. We aim to avoid:

panic branches, even if they never run
calls to drop glue
bounds checks
stack spills of VM state
etc.

and similar. There are definitely ways to reduce the surface area of unsafe code, but doing so often requires introducing complex abstractions.

A concrete example of something the VM does is it assumes that register operands are always in bounds. Constructing operands like Reg is actually unsafe via the new_unchecked constructor. This adds unsafe surface area to the compiler. The VM then relies on these Reg operands being in bounds for the current call frame. There are other approaches to this problem, but any of the other ones I’ve managed to come up with always come with some downside, like a practically unreachable panic path which just sits there, staring menacingly.

Another thing is that the instruction pointer is just a pointer. We don’t carry around its length, and we don’t perform a bounds check when reading from it. This means we assume that the compiler will:

Properly terminate all branches (with a ret or stop instruction)
Generate all jmp instruction with in-bounds targets

Once again, big assumptions! But we need to make those assumptions in order to reach our performance targets.

Garbage collection

The language implemented here is garbage collected.

It is possible to implement certain kinds of garbage collectors entirely within safe code (see safe-gc), but we don’t want to accept the overheads associated with a fully safe implementation.

The approach used by the VM is not novel at all. It is inspired by:

The rooting API from SpiderMonkey
safe-gc

The GC API outlined below supports precise, incremental, and generational garbage collection. Currently, it is only precise, and stop-the-world.

Before being accessed, objects must be allocated in a Heap, and then rooted on the stack. StackRoots are linked into a per-heap stack-allocated list. They are constructed in a macro which doesn’t leak the root’s identifier. The stack root is then wrapped in a GcRoot, which holds a pinned mutable reference to the StackRoot. As a result, stack roots are pinned to the stack and cannot be moved or referred to directly.

A GcRoot allows for safe access to the underlying object. It serves as proof that the object is rooted and is guaranteed to survive the next collection cycle, which makes it safe to dereference. To actually access the object, one of two methods must be called:

as_ref, which requires a &Heap reference
as_mut, which requires a &mut Heap reference

This is where the Heap comes in. It acts as a kind of token, of which there is only a single one active at a time in any given thread. This token is passed around by reference, and must be used in order to dereference roots.

If you have a shared reference to the heap, you can dereference as many GcRoots as you want to obtain shared access to them.
If you have a unique reference to the heap, you can dereference exactly one GcRoot to obtain a unique mutable reference to it.

Additionally, to trigger a collection, you need a &mut Heap. This means collections may not happen while Rust &T or &mut T are held on the stack - they must not exist across a call to collect, and the borrow checker enforces this for us.

The consequence is that Rust’s guarantees for shared xor mutable access are never violated, and we can have &mut T to GC’d objects, which means we don’t need any kind of interior mutability scheme! But we may only have one mutable reference at a time. That’s actually a problem, which we have a workaround for:

as_ref returns GcRef<'heap, T> instead of &'heap T
as_mut returns GcRefMut<'heap, T> instead of &'heap mut T

There is no other way to construct a GcRef or GcRefMut. This guarantees that if you have a GcRef or a GcRefMut, then the underlying object is rooted. Any value stored in a rooted object becomes transitively rooted, because it’s reachabable through its rooted parent.

Methods which read the object are implemented on GcRef<'heap, T>, and methods which mutate it are implemented on GcRefMut<'heap, T>. We can do this because GcRef, GcRefMut, and all the object types are declared within the same crate, so we aren’t stopped by orphan rules. This is one place where the GC not being generic is important.

And finally: Mutating methods which accept an object and store it within the receiver always accept GcRoots instead of GcRefs. Internally, they may unwrap the GcRoot and store the raw GcPtr<T>. This is always safe, because the container is guaranteed to be rooted.

That’s it! The resulting GC and object API is actually pretty nice, and it is fully safe to use. Compiler-enforced safety is vital here. Not only because GC bugs can be some of the worst heisenbugs in existence, and I wouldn’t wish them upon anybody, but also because the API is exposed as the embedding API as well. If it’s exposed to users, it must be safe code.

Keyboard shortcuts

Hebi