Advanced concurrency

Multithreading with Wasm

How Multithreading Works in WebAssembly

Browsers are single-threaded by default. To run parallel Wasm code, the platform provides two building blocks:

  Main Thread                 Web Workers (Thread Pool)
  ┌─────────────────┐        ┌─────────────────┐
  │  JavaScript      │        │  Worker 1        │
  │  + Wasm instance │        │  Wasm instance   │
  │                  │  spawn │  (shared memory) │
  │  postMessage() ──┼───────>│                  │
  │                  │        └─────────────────┘
  │                  │        ┌─────────────────┐
  │                  │        │  Worker 2        │
  │  SharedArray  ───┼───────>│  Wasm instance   │
  │  Buffer          │        │  (shared memory) │
  │                  │        └─────────────────┘
  │                  │        ┌─────────────────┐
  │                  │        │  Worker 3        │
  │  Atomics.wait()  │        │  Wasm instance   │
  │  Atomics.notify()│<──────>│  Atomics ops     │
  └─────────────────┘        └─────────────────┘

SharedArrayBuffer

SharedArrayBuffer is the key primitive. Unlike ArrayBuffer, it can be shared between the main thread and workers without copying. Wasm's linear memory can be backed by a SharedArrayBuffer, giving all threads access to the same memory:

// Creating shared Wasm memory
const memory = new WebAssembly.Memory({
  initial: 256,   // 256 pages (16 MB)
  maximum: 4096,  // 4096 pages (256 MB)
  shared: true     // ← This enables SharedArrayBuffer backing
});

Atomics

The Atomics API provides low-level synchronization primitives that map directly to Wasm's memory.atomic.* instructions:

Atomics Method	Purpose	Wasm Equivalent
`Atomics.load()`	Read shared value	`i32.atomic.load`
`Atomics.store()`	Write shared value	`i32.atomic.store`
`Atomics.add()`	Atomic increment	`i32.atomic.rmw.add`
`Atomics.compareExchange()`	CAS operation	`i32.atomic.rmw.cmpxchg`
`Atomics.wait()`	Block until notified (futex)	`memory.atomic.wait32`
`Atomics.notify()`	Wake waiting threads	`memory.atomic.notify`

wasm-bindgen-rayon: Parallel Iterators in Wasm

The rayon crate provides data-parallel iterators in Rust. The wasm-bindgen-rayon adapter bridges rayon's thread pool to Web Workers:

// Cargo.toml
// [dependencies]
// rayon = "1.8"
// wasm-bindgen-rayon = "1.2"

use rayon::prelude::*;
use wasm_bindgen::prelude::*;

#[wasm_bindgen]
pub fn parallel_sum(data: &[f64]) -> f64 {
    data.par_iter().sum()
}

#[wasm_bindgen]
pub fn parallel_mandelbrot(width: u32, height: u32) -> Vec<u8> {
    let pixels: Vec<u8> = (0..height)
        .into_par_iter()  // ← parallel iteration
        .flat_map(|y| {
            (0..width).map(move |x| compute_pixel(x, y, width, height))
        })
        .collect();
    pixels
}

Setup

// lib.rs — must call this before using rayon
pub use wasm_bindgen_rayon::init_thread_pool;

// JavaScript side
import init, { initThreadPool, parallel_sum } from './pkg/my_crate';

async function main() {
    await init();
    await initThreadPool(navigator.hardwareConcurrency);
    // Now rayon's par_iter() uses Web Workers!
    const result = parallel_sum(new Float64Array([1, 2, 3, 4, 5]));
}

Build Configuration

# .cargo/config.toml
[target.wasm32-unknown-unknown]
rustflags = ["-C", "target-feature=+atomics,+bulk-memory,+mutable-globals"]

[unstable]
build-std = ["panic_abort", "std"]

COOP/COEP Headers (Required!)

SharedArrayBuffer is gated behind Cross-Origin Isolation. Your server must send these headers:

Cross-Origin-Opener-Policy: same-origin
Cross-Origin-Embedder-Policy: require-corp

  Browser Security Model:

  Without headers:              With headers:
  ┌──────────────────┐         ┌──────────────────┐
  │ SharedArrayBuffer │         │ SharedArrayBuffer │
  │     BLOCKED       │         │     ALLOWED       │
  │                  │         │                  │
  │ "SecurityError:  │         │ Cross-origin     │
  │  SharedArray     │         │ isolated = true  │
  │  Buffer is not   │         │                  │
  │  defined"        │         │ COOP: same-origin│
  └──────────────────┘         │ COEP: require-corp│
                               └──────────────────┘

Server configuration examples

# Nginx
add_header Cross-Origin-Opener-Policy same-origin;
add_header Cross-Origin-Embedder-Policy require-corp;

# Apache (.htaccess)
Header set Cross-Origin-Opener-Policy "same-origin"
Header set Cross-Origin-Embedder-Policy "require-corp"

# Vite (vite.config.ts)
export default {
  server: {
    headers: {
      'Cross-Origin-Opener-Policy': 'same-origin',
      'Cross-Origin-Embedder-Policy': 'require-corp',
    },
  },
};

When Threading Helps vs Hurts

Threading adds overhead (worker creation, synchronization, message passing). It only helps when the work is large enough to amortize that cost:

  Speedup
  │
  │          ●  ideal (linear)
  │        ●╱
  │      ●╱    ●── real (Amdahl's law)
  │    ●╱    ●
  │  ●╱   ●
  │●╱  ●         ●── diminishing returns
  │╱●
  │●
  ├───────────────────── Number of threads
  1   2   4   8  16

  Amdahl's Law: Speedup = 1 / (S + P/N)
    S = serial fraction
    P = parallel fraction (S + P = 1)
    N = number of threads

Workload	Threads Help?	Why
Mandelbrot rendering (1024x1024)	Yes	Embarrassingly parallel, CPU-heavy
Image filter (large image)	Yes	Each pixel independent
Sorting 1000 items	No	Too small, overhead > savings
JSON parsing	No	Mostly sequential
Matrix multiplication (large)	Yes	Divide rows across threads
DOM manipulation	No	Must happen on main thread
Physics simulation (1000+ bodies)	Yes	Each body update is independent
SHA-256 of a small string	No	Sequential algorithm, tiny input

Rule of thumb: if the sequential work takes less than ~5ms, threading overhead will eat any gains.

Thread Pool Architecture

  ┌─────────────────────────────────────────────────────────┐
  │  Main Thread                                            │
  │                                                         │
  │  1. initThreadPool(4)                                   │
  │     ├── spawn Worker 1 ──┐                              │
  │     ├── spawn Worker 2 ──┤  Workers load same .wasm     │
  │     ├── spawn Worker 3 ──┤  with shared memory          │
  │     └── spawn Worker 4 ──┘                              │
  │                                                         │
  │  2. parallel_sum(data)                                  │
  │     │                                                   │
  │     ▼                                                   │
  │  ┌──────────────────────────────────────┐               │
  │  │ rayon work-stealing scheduler        │               │
  │  │                                      │               │
  │  │  Task queue: [chunk1][chunk2]...     │               │
  │  │                                      │               │
  │  │  W1 ← steal ← W2 ← steal ← W3     │               │
  │  │                                      │               │
  │  │  Each worker processes chunks from   │               │
  │  │  shared memory via atomic load/store │               │
  │  └──────────────────────────────────────┘               │
  │                                                         │
  │  3. Result returned to main thread                      │
  └─────────────────────────────────────────────────────────┘

Browser Support

Browser	SharedArrayBuffer	Wasm Threads	Status
Chrome 91+	Yes	Yes	Full support
Firefox 79+	Yes	Yes	Full support
Safari 15.2+	Yes	Yes	Full support
Edge 91+	Yes	Yes	Full support (Chromium)
Node.js 16+	Yes	Yes	`--experimental-wasm-threads`
Deno 1.9+	Yes	Yes	Full support

Feature detection

function supportsWasmThreads() {
    try {
        // Check SharedArrayBuffer
        new SharedArrayBuffer(1);

        // Check Atomics
        if (typeof Atomics === 'undefined') return false;

        // Check cross-origin isolation
        if (!crossOriginIsolated) return false;

        // Check Wasm threads support
        const mem = new WebAssembly.Memory({ initial: 1, maximum: 1, shared: true });
        return mem.buffer instanceof SharedArrayBuffer;
    } catch {
        return false;
    }
}

Practical Example: Parallel Image Processing

use rayon::prelude::*;
use wasm_bindgen::prelude::*;

#[wasm_bindgen]
pub fn blur_image(pixels: &mut [u8], width: usize, height: usize, radius: usize) {
    let input = pixels.to_vec();

    // Process rows in parallel
    pixels
        .par_chunks_mut(width * 4)
        .enumerate()
        .for_each(|(y, row)| {
            for x in 0..width {
                let (mut r, mut g, mut b, mut count) = (0u32, 0u32, 0u32, 0u32);

                for dy in -(radius as i32)..=(radius as i32) {
                    for dx in -(radius as i32)..=(radius as i32) {
                        let ny = (y as i32 + dy).clamp(0, height as i32 - 1) as usize;
                        let nx = (x as i32 + dx).clamp(0, width as i32 - 1) as usize;
                        let idx = (ny * width + nx) * 4;
                        r += input[idx] as u32;
                        g += input[idx + 1] as u32;
                        b += input[idx + 2] as u32;
                        count += 1;
                    }
                }

                let idx = x * 4;
                row[idx] = (r / count) as u8;
                row[idx + 1] = (g / count) as u8;
                row[idx + 2] = (b / count) as u8;
                // Alpha unchanged
            }
        });
}

Summary

Wasm multithreading uses SharedArrayBuffer for shared memory and Web Workers as the thread pool. The wasm-bindgen-rayon crate makes it feel like writing normal Rust parallel code — just use par_iter() instead of iter(). Remember to set COOP/COEP headers, build with +atomics, and only parallelize workloads that are large enough to overcome the threading overhead. Most modern browsers fully support Wasm threads as of 2024.

Try It

main.rs

Report an issue

Wasm + TypeScript Integration

Wasm + Machine Learning