Advanced data-structures

Custom Allocators in Wasm

How Wasm Memory Allocation Works

WebAssembly has a single linear memory — a contiguous, growable byte array. There is no built-in heap, no garbage collector, and no malloc. Your Rust program must bring its own allocator.

  Wasm Linear Memory
  ┌──────────────────────────────────────────────────┐
  │ Stack │ Static Data │         Heap →             │
  │ (grows│ (.data,     │   (managed by allocator)   │
  │  down)│  .rodata)   │                            │
  └──────────────────────────────────────────────────┘
  0x0000                                      memory.size
                                              (pages × 64KB)

Key facts:

Memory is measured in pages of 64 KB each
memory.grow(n) adds n pages; it can fail if the host imposes limits
There is no memory protection — any offset within bounds is accessible
The allocator is linked into your .wasm binary, adding to its size

Default Allocator: dlmalloc

By default, Rust's wasm32-unknown-unknown target uses a port of dlmalloc (Doug Lea's malloc). It is a general-purpose allocator optimized for a mix of allocation sizes.

  dlmalloc internals (simplified)
  ┌────────────────────────────────────────────┐
  │  Free list bins (by size class)            │
  │  ┌─────┐ ┌─────┐ ┌──────┐ ┌───────────┐  │
  │  │ 8B  │ │ 16B │ │ 64B  │ │ 256B+     │  │
  │  │ free│ │ free│ │ free │ │ free list │  │
  │  │ list│ │ list│ │ list │ │ (tree)    │  │
  │  └─────┘ └─────┘ └──────┘ └───────────┘  │
  │                                            │
  │  On alloc: find best-fit bin, split chunk  │
  │  On free:  coalesce neighbors, return bin  │
  └────────────────────────────────────────────┘

Property	dlmalloc
Binary overhead	~10 KB
Alloc complexity	O(1) for small, O(log n) large
Fragmentation	Low (coalescing)
Thread-safe	Not needed (Wasm is single-threaded)

wee_alloc: The Tiny Alternative

wee_alloc was designed specifically for Wasm — it trades allocation speed and fragmentation resistance for a ~1 KB binary footprint.

#[global_allocator]
static ALLOC: wee_alloc::WeeAlloc = wee_alloc::WeeAlloc::INIT;

It works by maintaining a linked free list with minimal bookkeeping:

  wee_alloc free list
  ┌──────┐    ┌──────┐    ┌──────┐
  │ cell │───>│ cell │───>│ cell │───> null
  │ 32B  │    │ 16B  │    │ 64B  │
  └──────┘    └──────┘    └──────┘

  Alloc: walk list, find first fit
  Free:  prepend to list (no coalescing!)

Caveat: wee_alloc does not coalesce adjacent free blocks. Over time, memory can fragment badly. It is no longer maintained — consider lol_alloc or talc for new projects.

Bump Allocators

A bump allocator is the simplest and fastest possible allocator. It maintains a single pointer that moves forward on each allocation. Individual frees are impossible — all memory is reclaimed at once.

  Bump Allocator State

  After 3 allocations:
  ┌───────┬────────┬──────────────┬─────────────────────┐
  │ obj A │ obj B  │   obj C      │     free space       │
  └───────┴────────┴──────────────┴─────────────────────┘
  ^                                ^                     ^
  base                           offset                 end

  Alloc:  offset += size (after alignment)
  Free:   no-op
  Reset:  offset = base (reclaims everything)

When to Use Bump Allocation

Use case	Good fit?
Per-frame game allocations	Yes
Request-scoped web handler data	Yes
Short-lived computation scratch	Yes
Long-running with varied lifetimes	No
Need to free individual objects	No

Alignment

The allocator must respect each type's alignment requirement. On Wasm, u32 needs 4-byte alignment, u64 needs 8-byte alignment:

  Before alignment:  offset = 5
  Requested align:   8

  aligned = (5 + 8 - 1) & !(8 - 1)
          = 12 & !7
          = 12 & 0xFFFF_FFF8
          = 8

  Memory:
  ┌─┬─┬─┬─┬─┬─┬─┬─┬─────────────────┐
  │0│1│2│3│4│ │ │ │8  9 10 11 12 ... │
  └─┴─┴─┴─┴─┴─┴─┴─┴─────────────────┘
                ^   ^
            wasted  aligned offset
           (padding)

Arena Allocation

An arena is a bump allocator that grows by allocating new chunks when the current one fills up:

struct Arena {
    chunks: Vec<Vec<u8>>,
    current: *mut u8,
    remaining: usize,
}

impl Arena {
    fn new() -> Self {
        let chunk = vec![0u8; 4096];
        let ptr = chunk.as_ptr() as *mut u8;
        Arena {
            chunks: vec![chunk],
            current: ptr,
            remaining: 4096,
        }
    }

    fn alloc(&mut self, layout: Layout) -> *mut u8 {
        // Align
        let align = layout.align();
        let padding = self.current.align_offset(align);
        let needed = padding + layout.size();

        if needed > self.remaining {
            // Allocate a new chunk
            let size = needed.max(4096);
            let chunk = vec![0u8; size];
            let ptr = chunk.as_ptr() as *mut u8;
            self.chunks.push(chunk);
            self.current = ptr;
            self.remaining = size;
            return self.alloc(layout); // retry
        }

        unsafe {
            let result = self.current.add(padding);
            self.current = self.current.add(needed);
            self.remaining -= needed;
            result
        }
    }

    fn reset(&mut self) {
        // Keep only the first chunk
        self.chunks.truncate(1);
        let chunk = &self.chunks[0];
        self.current = chunk.as_ptr() as *mut u8;
        self.remaining = chunk.len();
    }
}

  Arena with multiple chunks:

  Chunk 0 (full)        Chunk 1 (full)        Chunk 2 (active)
  ┌──────────────┐     ┌──────────────┐     ┌───────┬────────┐
  │██████████████│     │██████████████│     │███████│  free  │
  └──────────────┘     └──────────────┘     └───────┴────────┘
                                              ^current

The GlobalAlloc Trait

Rust's GlobalAlloc trait is the interface every allocator must implement:

use std::alloc::{GlobalAlloc, Layout};

unsafe trait GlobalAlloc {
    unsafe fn alloc(&self, layout: Layout) -> *mut u8;
    unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout);

    // Optional — default implementations call alloc + copy
    unsafe fn alloc_zeroed(&self, layout: Layout) -> *mut u8 { ... }
    unsafe fn realloc(&self, ptr: *mut u8, layout: Layout, new_size: usize) -> *mut u8 { ... }
}

Layout

Layout describes the size and alignment of a memory request:

let layout = Layout::new::<u32>();
assert_eq!(layout.size(), 4);
assert_eq!(layout.align(), 4);

let layout = Layout::array::<f64>(10).unwrap();
assert_eq!(layout.size(), 80);    // 10 * 8
assert_eq!(layout.align(), 8);

let layout = Layout::from_size_align(256, 16).unwrap();

Registering Your Allocator

#[global_allocator]
static ALLOC: MyAllocator = MyAllocator::new();

This replaces the default allocator for all heap allocations in your binary — Box, Vec, String, HashMap, everything.

Allocator Comparison

Allocator	Binary Size	Alloc Speed	Free Speed	Fragmentation	Best For
dlmalloc	~10 KB	Fast	Fast	Low	General purpose
wee_alloc	~1 KB	Medium	Fast	High	Tiny Wasm binaries
Bump	~0.1 KB	Fastest	N/A	None*	Per-frame / scoped
Arena	~0.5 KB	Very fast	N/A	None*	Batch processing
lol_alloc	~0.2 KB	Medium	Slow	Medium	Minimal Wasm
talc	~5 KB	Fast	Fast	Low	General (modern)

*No fragmentation because individual frees don't happen.

Pool Allocator Pattern

For types that are allocated and freed frequently (e.g., game entities, network packets), a pool allocator pre-allocates a fixed number of slots:

struct Pool<T> {
    storage: Vec<Option<T>>,
    free_indices: Vec<usize>,
}

impl<T> Pool<T> {
    fn new(capacity: usize) -> Self {
        Pool {
            storage: (0..capacity).map(|_| None).collect(),
            free_indices: (0..capacity).rev().collect(),
        }
    }

    fn alloc(&mut self, value: T) -> Option<usize> {
        let idx = self.free_indices.pop()?;
        self.storage[idx] = Some(value);
        Some(idx)
    }

    fn free(&mut self, idx: usize) {
        self.storage[idx] = None;
        self.free_indices.push(idx);
    }

    fn get(&self, idx: usize) -> Option<&T> {
        self.storage.get(idx)?.as_ref()
    }
}

  Pool<Entity> with capacity 8:

  ┌─────┬─────┬─────┬─────┬─────┬─────┬─────┬─────┐
  │  E  │     │  E  │  E  │     │  E  │     │  E  │
  │ [0] │ [1] │ [2] │ [3] │ [4] │ [5] │ [6] │ [7] │
  └─────┴─────┴─────┴─────┴─────┴─────┴─────┴─────┘
    used  free  used  used  free  used  free  used

  free_indices = [6, 4, 1]  (stack order)

Summary

Wasm's linear memory model means you control allocation entirely. The default dlmalloc works well for most applications. For size-critical builds, swap in a tiny allocator. For performance-critical workloads with predictable lifetimes, bump and arena allocators give unbeatable speed by deferring all frees to a single reset.

Try It

main.rs

Report an issue

Wasm Size Optimization

Wasm + WebRTC