← Back to Lessons Lesson 22 of 28
Intermediate data-structures

String Handling Deep Dive

The Encoding Mismatch

This is the most common performance gotcha in Wasm:

Rust / Wasm JavaScript
Encoding UTF-8 UTF-16
Bytes per ASCII char 1 2
Bytes per Japanese char 3 2
Bytes per emoji 4 4

Every time a string crosses the Wasm boundary, it must be re-encoded:

JS string (UTF-16) → encode to UTF-8 → copy into Wasm memory → Rust &str
Rust String (UTF-8) → decode from UTF-8 → create JS string (UTF-16)

This copy happens every time — there's no way to avoid it with current Wasm.

How wasm-bindgen Handles Strings

JS → Rust (&str)

#[wasm_bindgen]
pub fn process(input: &str) -> String {
    // input is already UTF-8 in Wasm memory
    // wasm-bindgen encoded it from JS UTF-16 before calling
    format!("Processed: {}", input)
    // Return String → wasm-bindgen decodes UTF-8 → JS gets UTF-16 string
}

Behind the scenes:

  1. JS calls TextEncoder.encode() to convert UTF-16 → UTF-8
  2. Copies UTF-8 bytes into Wasm linear memory
  3. Passes pointer + length to Rust
  4. Rust sees a valid &str

Rust → JS (String)

  1. Rust writes UTF-8 bytes to linear memory
  2. wasm-bindgen calls TextDecoder.decode() to convert UTF-8 → UTF-16
  3. JS gets a native string

Performance Cost

Operation               Time (1MB string)
─────────────────────────────────────────
JS → Rust (&str)        ~2ms (encode + copy)
Rust → JS (String)      ~2ms (decode + copy)
Rust internal (no copy) ~0ms (just a pointer)

Rule: Minimize string boundary crossings. Do all string work on one side.

Optimization Strategies

1. Batch operations — don't cross per-item

// BAD: crosses boundary N times
#[wasm_bindgen]
pub fn process_one(item: &str) -> String { /* ... */ }

// GOOD: crosses boundary once
#[wasm_bindgen]
pub fn process_all(items_json: &str) -> String {
    // Parse all items, process, return all results
    // Only 2 boundary crossings total
}

2. Use numeric IDs instead of strings

// BAD: passing strings back and forth
pub fn get_user_name(name: &str) -> String { /* ... */ }

// GOOD: pass ID, keep strings in Rust
pub fn create_user(name: &str) -> u32 { /* returns ID */ }
pub fn get_user_name(id: u32) -> String { /* lookup by ID */ }

3. Use &[u8] for binary data

// DON'T encode binary data as strings
pub fn process_base64(data: &str) -> String { /* ... */ }

// DO pass raw bytes
pub fn process_bytes(data: &[u8]) -> Vec<u8> { /* ... */ }

4. Pre-allocate with capacity

// BAD: many small allocations
let mut result = String::new();
for item in items {
    result.push_str(&format!("{},", item));
}

// GOOD: one allocation
let mut result = String::with_capacity(items.len() * 20);
for item in items {
    result.push_str(&format!("{},", item));
}

Common String Patterns in Wasm

CSV/JSON processing

#[wasm_bindgen]
pub fn parse_csv(input: &str) -> JsValue {
    let rows: Vec<Vec<&str>> = input
        .lines()
        .map(|line| line.split(',').collect())
        .collect();
    serde_wasm_bindgen::to_value(&rows).unwrap()
}

Template rendering

#[wasm_bindgen]
pub fn render_template(template: &str, name: &str, count: u32) -> String {
    template
        .replace("{{name}}", name)
        .replace("{{count}}", &count.to_string())
}

Try It

Click Run to see UTF-8 encoding in action — byte lengths, multi-byte characters, and string building. Notice how Japanese characters take 3 bytes in UTF-8 but only 2 in JavaScript's UTF-16.

Try It

Chapter Quiz

Pass all questions to complete this lesson