Custom Allocators in Wasm
How Wasm Memory Allocation Works
WebAssembly has a single linear memory — a contiguous, growable byte array. There is no built-in heap, no garbage collector, and no malloc. Your Rust program must bring its own allocator.
Wasm Linear Memory
┌──────────────────────────────────────────────────┐
│ Stack │ Static Data │ Heap → │
│ (grows│ (.data, │ (managed by allocator) │
│ down)│ .rodata) │ │
└──────────────────────────────────────────────────┘
0x0000 memory.size
(pages × 64KB)Key facts:
- Memory is measured in pages of 64 KB each
memory.grow(n)addsnpages; it can fail if the host imposes limits- There is no memory protection — any offset within bounds is accessible
- The allocator is linked into your
.wasmbinary, adding to its size
Default Allocator: dlmalloc
By default, Rust's wasm32-unknown-unknown target uses a port of dlmalloc (Doug Lea's malloc). It is a general-purpose allocator optimized for a mix of allocation sizes.
dlmalloc internals (simplified)
┌────────────────────────────────────────────┐
│ Free list bins (by size class) │
│ ┌─────┐ ┌─────┐ ┌──────┐ ┌───────────┐ │
│ │ 8B │ │ 16B │ │ 64B │ │ 256B+ │ │
│ │ free│ │ free│ │ free │ │ free list │ │
│ │ list│ │ list│ │ list │ │ (tree) │ │
│ └─────┘ └─────┘ └──────┘ └───────────┘ │
│ │
│ On alloc: find best-fit bin, split chunk │
│ On free: coalesce neighbors, return bin │
└────────────────────────────────────────────┘| Property | dlmalloc |
|---|---|
| Binary overhead | ~10 KB |
| Alloc complexity | O(1) for small, O(log n) large |
| Fragmentation | Low (coalescing) |
| Thread-safe | Not needed (Wasm is single-threaded) |
wee_alloc: The Tiny Alternative
wee_alloc was designed specifically for Wasm — it trades allocation speed and fragmentation resistance for a ~1 KB binary footprint.
#[global_allocator]
static ALLOC: wee_alloc::WeeAlloc = wee_alloc::WeeAlloc::INIT;It works by maintaining a linked free list with minimal bookkeeping:
wee_alloc free list
┌──────┐ ┌──────┐ ┌──────┐
│ cell │───>│ cell │───>│ cell │───> null
│ 32B │ │ 16B │ │ 64B │
└──────┘ └──────┘ └──────┘
Alloc: walk list, find first fit
Free: prepend to list (no coalescing!)Caveat:
wee_allocdoes not coalesce adjacent free blocks. Over time, memory can fragment badly. It is no longer maintained — considerlol_allocortalcfor new projects.
Bump Allocators
A bump allocator is the simplest and fastest possible allocator. It maintains a single pointer that moves forward on each allocation. Individual frees are impossible — all memory is reclaimed at once.
Bump Allocator State
After 3 allocations:
┌───────┬────────┬──────────────┬─────────────────────┐
│ obj A │ obj B │ obj C │ free space │
└───────┴────────┴──────────────┴─────────────────────┘
^ ^ ^
base offset end
Alloc: offset += size (after alignment)
Free: no-op
Reset: offset = base (reclaims everything)When to Use Bump Allocation
| Use case | Good fit? |
|---|---|
| Per-frame game allocations | Yes |
| Request-scoped web handler data | Yes |
| Short-lived computation scratch | Yes |
| Long-running with varied lifetimes | No |
| Need to free individual objects | No |
Alignment
The allocator must respect each type's alignment requirement. On Wasm, u32 needs 4-byte alignment, u64 needs 8-byte alignment:
Before alignment: offset = 5
Requested align: 8
aligned = (5 + 8 - 1) & !(8 - 1)
= 12 & !7
= 12 & 0xFFFF_FFF8
= 8
Memory:
┌─┬─┬─┬─┬─┬─┬─┬─┬─────────────────┐
│0│1│2│3│4│ │ │ │8 9 10 11 12 ... │
└─┴─┴─┴─┴─┴─┴─┴─┴─────────────────┘
^ ^
wasted aligned offset
(padding)Arena Allocation
An arena is a bump allocator that grows by allocating new chunks when the current one fills up:
struct Arena {
chunks: Vec<Vec<u8>>,
current: *mut u8,
remaining: usize,
}
impl Arena {
fn new() -> Self {
let chunk = vec![0u8; 4096];
let ptr = chunk.as_ptr() as *mut u8;
Arena {
chunks: vec![chunk],
current: ptr,
remaining: 4096,
}
}
fn alloc(&mut self, layout: Layout) -> *mut u8 {
// Align
let align = layout.align();
let padding = self.current.align_offset(align);
let needed = padding + layout.size();
if needed > self.remaining {
// Allocate a new chunk
let size = needed.max(4096);
let chunk = vec![0u8; size];
let ptr = chunk.as_ptr() as *mut u8;
self.chunks.push(chunk);
self.current = ptr;
self.remaining = size;
return self.alloc(layout); // retry
}
unsafe {
let result = self.current.add(padding);
self.current = self.current.add(needed);
self.remaining -= needed;
result
}
}
fn reset(&mut self) {
// Keep only the first chunk
self.chunks.truncate(1);
let chunk = &self.chunks[0];
self.current = chunk.as_ptr() as *mut u8;
self.remaining = chunk.len();
}
} Arena with multiple chunks:
Chunk 0 (full) Chunk 1 (full) Chunk 2 (active)
┌──────────────┐ ┌──────────────┐ ┌───────┬────────┐
│██████████████│ │██████████████│ │███████│ free │
└──────────────┘ └──────────────┘ └───────┴────────┘
^currentThe GlobalAlloc Trait
Rust's GlobalAlloc trait is the interface every allocator must implement:
use std::alloc::{GlobalAlloc, Layout};
unsafe trait GlobalAlloc {
unsafe fn alloc(&self, layout: Layout) -> *mut u8;
unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout);
// Optional — default implementations call alloc + copy
unsafe fn alloc_zeroed(&self, layout: Layout) -> *mut u8 { ... }
unsafe fn realloc(&self, ptr: *mut u8, layout: Layout, new_size: usize) -> *mut u8 { ... }
}Layout
Layout describes the size and alignment of a memory request:
let layout = Layout::new::<u32>();
assert_eq!(layout.size(), 4);
assert_eq!(layout.align(), 4);
let layout = Layout::array::<f64>(10).unwrap();
assert_eq!(layout.size(), 80); // 10 * 8
assert_eq!(layout.align(), 8);
let layout = Layout::from_size_align(256, 16).unwrap();Registering Your Allocator
#[global_allocator]
static ALLOC: MyAllocator = MyAllocator::new();This replaces the default allocator for all heap allocations in your binary — Box, Vec, String, HashMap, everything.
Allocator Comparison
| Allocator | Binary Size | Alloc Speed | Free Speed | Fragmentation | Best For |
|---|---|---|---|---|---|
| dlmalloc | ~10 KB | Fast | Fast | Low | General purpose |
| wee_alloc | ~1 KB | Medium | Fast | High | Tiny Wasm binaries |
| Bump | ~0.1 KB | Fastest | N/A | None* | Per-frame / scoped |
| Arena | ~0.5 KB | Very fast | N/A | None* | Batch processing |
| lol_alloc | ~0.2 KB | Medium | Slow | Medium | Minimal Wasm |
| talc | ~5 KB | Fast | Fast | Low | General (modern) |
*No fragmentation because individual frees don't happen.
Pool Allocator Pattern
For types that are allocated and freed frequently (e.g., game entities, network packets), a pool allocator pre-allocates a fixed number of slots:
struct Pool<T> {
storage: Vec<Option<T>>,
free_indices: Vec<usize>,
}
impl<T> Pool<T> {
fn new(capacity: usize) -> Self {
Pool {
storage: (0..capacity).map(|_| None).collect(),
free_indices: (0..capacity).rev().collect(),
}
}
fn alloc(&mut self, value: T) -> Option<usize> {
let idx = self.free_indices.pop()?;
self.storage[idx] = Some(value);
Some(idx)
}
fn free(&mut self, idx: usize) {
self.storage[idx] = None;
self.free_indices.push(idx);
}
fn get(&self, idx: usize) -> Option<&T> {
self.storage.get(idx)?.as_ref()
}
} Pool<Entity> with capacity 8:
┌─────┬─────┬─────┬─────┬─────┬─────┬─────┬─────┐
│ E │ │ E │ E │ │ E │ │ E │
│ [0] │ [1] │ [2] │ [3] │ [4] │ [5] │ [6] │ [7] │
└─────┴─────┴─────┴─────┴─────┴─────┴─────┴─────┘
used free used used free used free used
free_indices = [6, 4, 1] (stack order)Summary
Wasm's linear memory model means you control allocation entirely. The default dlmalloc works well for most applications. For size-critical builds, swap in a tiny allocator. For performance-critical workloads with predictable lifetimes, bump and arena allocators give unbeatable speed by deferring all frees to a single reset.