← Back to Lessons Lesson 40 of 48
Advanced concurrency

Multithreading with Wasm

How Multithreading Works in WebAssembly

Browsers are single-threaded by default. To run parallel Wasm code, the platform provides two building blocks:

  Main Thread                 Web Workers (Thread Pool)
  ┌─────────────────┐        ┌─────────────────┐
  │  JavaScript      │        │  Worker 1        │
  │  + Wasm instance │        │  Wasm instance   │
  │                  │  spawn │  (shared memory) │
  │  postMessage() ──┼───────>│                  │
  │                  │        └─────────────────┘
  │                  │        ┌─────────────────┐
  │                  │        │  Worker 2        │
  │  SharedArray  ───┼───────>│  Wasm instance   │
  │  Buffer          │        │  (shared memory) │
  │                  │        └─────────────────┘
  │                  │        ┌─────────────────┐
  │                  │        │  Worker 3        │
  │  Atomics.wait()  │        │  Wasm instance   │
  │  Atomics.notify()│<──────>│  Atomics ops     │
  └─────────────────┘        └─────────────────┘

SharedArrayBuffer

SharedArrayBuffer is the key primitive. Unlike ArrayBuffer, it can be shared between the main thread and workers without copying. Wasm's linear memory can be backed by a SharedArrayBuffer, giving all threads access to the same memory:

// Creating shared Wasm memory
const memory = new WebAssembly.Memory({
  initial: 256,   // 256 pages (16 MB)
  maximum: 4096,  // 4096 pages (256 MB)
  shared: true     // ← This enables SharedArrayBuffer backing
});

Atomics

The Atomics API provides low-level synchronization primitives that map directly to Wasm's memory.atomic.* instructions:

Atomics Method Purpose Wasm Equivalent
Atomics.load() Read shared value i32.atomic.load
Atomics.store() Write shared value i32.atomic.store
Atomics.add() Atomic increment i32.atomic.rmw.add
Atomics.compareExchange() CAS operation i32.atomic.rmw.cmpxchg
Atomics.wait() Block until notified (futex) memory.atomic.wait32
Atomics.notify() Wake waiting threads memory.atomic.notify

wasm-bindgen-rayon: Parallel Iterators in Wasm

The rayon crate provides data-parallel iterators in Rust. The wasm-bindgen-rayon adapter bridges rayon's thread pool to Web Workers:

// Cargo.toml
// [dependencies]
// rayon = "1.8"
// wasm-bindgen-rayon = "1.2"

use rayon::prelude::*;
use wasm_bindgen::prelude::*;

#[wasm_bindgen]
pub fn parallel_sum(data: &[f64]) -> f64 {
    data.par_iter().sum()
}

#[wasm_bindgen]
pub fn parallel_mandelbrot(width: u32, height: u32) -> Vec<u8> {
    let pixels: Vec<u8> = (0..height)
        .into_par_iter()  // ← parallel iteration
        .flat_map(|y| {
            (0..width).map(move |x| compute_pixel(x, y, width, height))
        })
        .collect();
    pixels
}

Setup

// lib.rs — must call this before using rayon
pub use wasm_bindgen_rayon::init_thread_pool;
// JavaScript side
import init, { initThreadPool, parallel_sum } from './pkg/my_crate';

async function main() {
    await init();
    await initThreadPool(navigator.hardwareConcurrency);
    // Now rayon's par_iter() uses Web Workers!
    const result = parallel_sum(new Float64Array([1, 2, 3, 4, 5]));
}

Build Configuration

# .cargo/config.toml
[target.wasm32-unknown-unknown]
rustflags = ["-C", "target-feature=+atomics,+bulk-memory,+mutable-globals"]

[unstable]
build-std = ["panic_abort", "std"]

COOP/COEP Headers (Required!)

SharedArrayBuffer is gated behind Cross-Origin Isolation. Your server must send these headers:

Cross-Origin-Opener-Policy: same-origin
Cross-Origin-Embedder-Policy: require-corp
  Browser Security Model:

  Without headers:              With headers:
  ┌──────────────────┐         ┌──────────────────┐
  │ SharedArrayBuffer │         │ SharedArrayBuffer │
  │     BLOCKED       │         │     ALLOWED       │
  │                  │         │                  │
  │ "SecurityError:  │         │ Cross-origin     │
  │  SharedArray     │         │ isolated = true  │
  │  Buffer is not   │         │                  │
  │  defined"        │         │ COOP: same-origin│
  └──────────────────┘         │ COEP: require-corp│
                               └──────────────────┘

Server configuration examples

# Nginx
add_header Cross-Origin-Opener-Policy same-origin;
add_header Cross-Origin-Embedder-Policy require-corp;

# Apache (.htaccess)
Header set Cross-Origin-Opener-Policy "same-origin"
Header set Cross-Origin-Embedder-Policy "require-corp"

# Vite (vite.config.ts)
export default {
  server: {
    headers: {
      'Cross-Origin-Opener-Policy': 'same-origin',
      'Cross-Origin-Embedder-Policy': 'require-corp',
    },
  },
};

When Threading Helps vs Hurts

Threading adds overhead (worker creation, synchronization, message passing). It only helps when the work is large enough to amortize that cost:

  Speedup
  │
  │          ●  ideal (linear)
  │        ●╱
  │      ●╱    ●── real (Amdahl's law)
  │    ●╱    ●
  │  ●╱   ●
  │●╱  ●         ●── diminishing returns
  │╱●
  │●
  ├───────────────────── Number of threads
  1   2   4   8  16

  Amdahl's Law: Speedup = 1 / (S + P/N)
    S = serial fraction
    P = parallel fraction (S + P = 1)
    N = number of threads
Workload Threads Help? Why
Mandelbrot rendering (1024x1024) Yes Embarrassingly parallel, CPU-heavy
Image filter (large image) Yes Each pixel independent
Sorting 1000 items No Too small, overhead > savings
JSON parsing No Mostly sequential
Matrix multiplication (large) Yes Divide rows across threads
DOM manipulation No Must happen on main thread
Physics simulation (1000+ bodies) Yes Each body update is independent
SHA-256 of a small string No Sequential algorithm, tiny input

Rule of thumb: if the sequential work takes less than ~5ms, threading overhead will eat any gains.

Thread Pool Architecture

  ┌─────────────────────────────────────────────────────────┐
  │  Main Thread                                            │
  │                                                         │
  │  1. initThreadPool(4)                                   │
  │     ├── spawn Worker 1 ──┐                              │
  │     ├── spawn Worker 2 ──┤  Workers load same .wasm     │
  │     ├── spawn Worker 3 ──┤  with shared memory          │
  │     └── spawn Worker 4 ──┘                              │
  │                                                         │
  │  2. parallel_sum(data)                                  │
  │     │                                                   │
  │     ▼                                                   │
  │  ┌──────────────────────────────────────┐               │
  │  │ rayon work-stealing scheduler        │               │
  │  │                                      │               │
  │  │  Task queue: [chunk1][chunk2]...     │               │
  │  │                                      │               │
  │  │  W1 ← steal ← W2 ← steal ← W3     │               │
  │  │                                      │               │
  │  │  Each worker processes chunks from   │               │
  │  │  shared memory via atomic load/store │               │
  │  └──────────────────────────────────────┘               │
  │                                                         │
  │  3. Result returned to main thread                      │
  └─────────────────────────────────────────────────────────┘

Browser Support

Browser SharedArrayBuffer Wasm Threads Status
Chrome 91+ Yes Yes Full support
Firefox 79+ Yes Yes Full support
Safari 15.2+ Yes Yes Full support
Edge 91+ Yes Yes Full support (Chromium)
Node.js 16+ Yes Yes --experimental-wasm-threads
Deno 1.9+ Yes Yes Full support

Feature detection

function supportsWasmThreads() {
    try {
        // Check SharedArrayBuffer
        new SharedArrayBuffer(1);

        // Check Atomics
        if (typeof Atomics === 'undefined') return false;

        // Check cross-origin isolation
        if (!crossOriginIsolated) return false;

        // Check Wasm threads support
        const mem = new WebAssembly.Memory({ initial: 1, maximum: 1, shared: true });
        return mem.buffer instanceof SharedArrayBuffer;
    } catch {
        return false;
    }
}

Practical Example: Parallel Image Processing

use rayon::prelude::*;
use wasm_bindgen::prelude::*;

#[wasm_bindgen]
pub fn blur_image(pixels: &mut [u8], width: usize, height: usize, radius: usize) {
    let input = pixels.to_vec();

    // Process rows in parallel
    pixels
        .par_chunks_mut(width * 4)
        .enumerate()
        .for_each(|(y, row)| {
            for x in 0..width {
                let (mut r, mut g, mut b, mut count) = (0u32, 0u32, 0u32, 0u32);

                for dy in -(radius as i32)..=(radius as i32) {
                    for dx in -(radius as i32)..=(radius as i32) {
                        let ny = (y as i32 + dy).clamp(0, height as i32 - 1) as usize;
                        let nx = (x as i32 + dx).clamp(0, width as i32 - 1) as usize;
                        let idx = (ny * width + nx) * 4;
                        r += input[idx] as u32;
                        g += input[idx + 1] as u32;
                        b += input[idx + 2] as u32;
                        count += 1;
                    }
                }

                let idx = x * 4;
                row[idx] = (r / count) as u8;
                row[idx + 1] = (g / count) as u8;
                row[idx + 2] = (b / count) as u8;
                // Alpha unchanged
            }
        });
}

Summary

Wasm multithreading uses SharedArrayBuffer for shared memory and Web Workers as the thread pool. The wasm-bindgen-rayon crate makes it feel like writing normal Rust parallel code — just use par_iter() instead of iter(). Remember to set COOP/COEP headers, build with +atomics, and only parallelize workloads that are large enough to overcome the threading overhead. Most modern browsers fully support Wasm threads as of 2024.

Try It