Wasm Size Optimization
Why Binary Size Matters
Every byte of your .wasm file must be downloaded, parsed, and compiled by the browser before your application starts. Unlike JavaScript, Wasm is a compact binary format — but a naive Rust build can still produce multi-megabyte files.
Download Parse Compile
┌──────┐ ┌──────────┐ ┌─────────┐ ┌─────────┐
│ .wasm│──────>│ Network │───>│ Decoder │───>│ JIT/AOT │──> Run
│ file │ │ transfer │ │ │ │ compile │
└──────┘ └──────────┘ └─────────┘ └─────────┘
^ ^
│ │
Smaller file = faster startup everywhere| File Size | 3G (2 Mbps) | 4G (20 Mbps) | Wi-Fi (50 Mbps) |
|---|---|---|---|
| 100 KB | 0.4 s | 0.04 s | 0.02 s |
| 500 KB | 2.0 s | 0.2 s | 0.08 s |
| 2 MB | 8.0 s | 0.8 s | 0.32 s |
| 10 MB | 40.0 s | 4.0 s | 1.6 s |
Cargo.toml Release Profile
The single most impactful change is tuning your [profile.release] section:
[profile.release]
opt-level = "z" # Optimize for size (s = size, z = even smaller)
lto = true # Link-Time Optimization — whole-program analysis
codegen-units = 1 # Single codegen unit = better inlining / dead code
strip = true # Remove debug symbols
panic = "abort" # No unwinding machinery (saves ~10-20 KB)What Each Setting Does
opt-level = "z"
┌──────────────────────────────────────────────────────┐
│ "0" — no optimization (debug builds) │
│ "1" — basic optimization │
│ "2" — standard optimization (default release) │
│ "3" — aggressive speed optimization │
│ "s" — optimize for binary size │
│ "z" — optimize even harder for size │
└──────────────────────────────────────────────────────┘
lto = true
┌────────────────────────────────────────┐
│ Without LTO: │
│ crate_a.o ──┐ │
│ crate_b.o ──┼── link ── binary │
│ crate_c.o ──┘ (dead code remains) │
│ │
│ With LTO: │
│ crate_a.o ──┐ │
│ crate_b.o ──┼── analyze ── optimize │
│ crate_c.o ──┘ all at ── binary │
│ once (smaller!) │
└────────────────────────────────────────┘
codegen-units = 1
┌─────────────────────────────────────────────────┐
│ Default: 16 units — faster compile, less opt. │
│ Setting 1: single unit — slower compile, but │
│ the compiler sees everything and can inline │
│ and eliminate more dead code. │
└─────────────────────────────────────────────────┘wasm-opt: The Post-Build Optimizer
Binaryen's wasm-opt performs Wasm-specific optimizations the Rust compiler cannot:
# Install
cargo install wasm-opt
# or: npm install -g binaryen
# Optimize for size (-Oz) or speed (-O3)
wasm-opt -Oz -o output.wasm input.wasmTypical savings from wasm-opt -Oz:
| Stage | Size |
|---|---|
After cargo build |
180 KB |
After wasm-opt |
120 KB |
| After gzip | 45 KB |
| After brotli | 38 KB |
Tree Shaking and Dead Code Elimination
Rust's monomorphization can create many copies of generic functions. Each instantiation of Vec<T> for a different T generates separate machine code.
Your code uses: Monomorphized into:
Vec<u8> ──────> Vec_u8_push, Vec_u8_pop, Vec_u8_len ...
Vec<String> ──────> Vec_String_push, Vec_String_pop ...
Vec<(i32,i32)> ──────> Vec_tuple_push, Vec_tuple_pop ...Strategies to reduce monomorphization bloat:
- Use trait objects instead of generics where performance is not critical:
// Generates code per type:
fn process<T: Display>(items: &[T]) { ... }
// Single generated function:
fn process(items: &[&dyn Display]) { ... }- Factor non-generic code out of generic functions:
// BAD — entire function duplicated per T
fn insert<T: Ord>(vec: &mut Vec<T>, item: T) {
let idx = vec.binary_search(&item).unwrap_or_else(|i| i);
vec.insert(idx, item);
log_size(vec.len()); // This line doesn't use T
}
// BETTER — log_size is compiled once
fn log_size(n: usize) { /* ... */ }
fn insert<T: Ord>(vec: &mut Vec<T>, item: T) {
let idx = vec.binary_search(&item).unwrap_or_else(|i| i);
vec.insert(idx, item);
log_size(vec.len());
}Avoiding Hidden Bloat
The format! and panic! Problem
Every format!(), panic!(), unwrap(), and expect() call pulls in Rust's formatting machinery — roughly 10–20 KB of code.
// BAD — pulls in Display + formatting for the panic string
fn get(idx: usize) -> u8 {
DATA[idx] // panics with full formatting on OOB
}
// BETTER — use get() + a tiny abort
fn get(idx: usize) -> u8 {
match DATA.get(idx) {
Some(&v) => v,
None => abort_with_code(1),
}
}String Bloat Checklist
| Pattern | Adds to binary? | Alternative |
|---|---|---|
format!("x = {}", x) |
Yes — formatting infra | Manual conversion or &str |
panic!("bad index {}", i) |
Yes — panic + format | unreachable_unchecked()* |
.unwrap() |
Yes — panic strings | .unwrap_or(default) |
.expect("msg") |
Yes — panic + string | Match + abort |
#[derive(Debug)] on types |
Yes — Debug impls | Only derive when needed |
to_string() on numbers |
Yes — Display trait | itoa crate or manual buf |
*unreachable_unchecked is unsafe and should only be used when you can guarantee the path is unreachable.
wee_alloc: A Tiny Allocator
The default Rust allocator (dlmalloc in Wasm) adds 10 KB. 1 KB):wee_alloc trades performance for size (
// In your lib.rs
#[global_allocator]
static ALLOC: wee_alloc::WeeAlloc = wee_alloc::WeeAlloc::INIT;[dependencies]
wee_alloc = "0.4"Trade-offs:
| Property | dlmalloc | wee_alloc |
|---|---|---|
| Binary overhead | ~10 KB | ~1 KB |
| Alloc speed | Fast | Slower |
| Fragmentation | Good | Poor |
| Free reclaims? | Yes | Partially |
| Best for | General use | Small Wasm |
Note:
wee_allocis no longer actively maintained. For production use, consider thelol_allocortalccrates, or write your own bump allocator (see Lesson 35).
Measuring with twiggy
twiggy analyzes your .wasm binary to show what takes up space:
# Install
cargo install twiggy
# Show the top 20 largest items
twiggy top -n 20 my_app_bg.wasm
# Show a call graph of what retains what
twiggy dominators my_app_bg.wasm
# Diff two builds
twiggy diff old.wasm new.wasmExample twiggy top output:
Shallow Bytes │ Shallow % │ Item
───────────────┼───────────┼────────────────────────────
12480 │ 8.31% │ data[0]
6204 │ 4.13% │ "function names" subsection
3120 │ 2.08% │ core::fmt::write
2800 │ 1.86% │ dlmalloc::dlmalloc::Dlmalloc::malloc
2476 │ 1.65% │ core::fmt::Formatter::padCompression: The Final Step
After all optimizations, apply HTTP compression. Most servers support gzip and Brotli:
Raw .wasm gzip (-9) Brotli (-11)
┌─────────┐ ┌─────────┐ ┌─────────┐
│ 120 KB │ ────────> │ 45 KB │ ────────> │ 38 KB │
└─────────┘ ~62% └─────────┘ ~15% └─────────┘
saving moreConfigure your web server:
# Nginx
location ~ \.wasm$ {
gzip on;
gzip_types application/wasm;
# Or serve pre-compressed files:
gzip_static on;
brotli_static on;
}Complete Optimization Pipeline
cargo build --release ← Cargo.toml profile settings
│
▼
wasm-opt -Oz ← Binaryen post-processing
│
▼
wasm-strip ← Remove names section (optional)
│
▼
brotli / gzip ← HTTP transfer compression
│
▼
✓ Minimal .wasm served to browserQuick Reference: Impact by Technique
| Technique | Typical Savings |
|---|---|
opt-level = "z" |
10–25% |
lto = true |
10–20% |
codegen-units = 1 |
5–10% |
panic = "abort" |
5–15% |
strip = true |
5–15% |
wasm-opt -Oz |
15–30% |
Replace wee_alloc |
~10 KB flat |
Remove format! / Debug derives |
varies widely |
| Brotli compression | 60–70% |
Summary
Binary size optimization is a spectrum — you don't need every technique for every project. Start with Cargo.toml settings and wasm-opt (the 80/20 wins), then use twiggy to find remaining bloat. For sub-50 KB targets, you'll need to avoid the formatting infrastructure and consider a custom allocator.