← Back to Lessons Lesson 22 of 28
Intermediate data-structures
String Handling Deep Dive
The Encoding Mismatch
This is the most common performance gotcha in Wasm:
| Rust / Wasm | JavaScript | |
|---|---|---|
| Encoding | UTF-8 | UTF-16 |
| Bytes per ASCII char | 1 | 2 |
| Bytes per Japanese char | 3 | 2 |
| Bytes per emoji | 4 | 4 |
Every time a string crosses the Wasm boundary, it must be re-encoded:
JS string (UTF-16) → encode to UTF-8 → copy into Wasm memory → Rust &str
Rust String (UTF-8) → decode from UTF-8 → create JS string (UTF-16)This copy happens every time — there's no way to avoid it with current Wasm.
How wasm-bindgen Handles Strings
JS → Rust (&str)
#[wasm_bindgen]
pub fn process(input: &str) -> String {
// input is already UTF-8 in Wasm memory
// wasm-bindgen encoded it from JS UTF-16 before calling
format!("Processed: {}", input)
// Return String → wasm-bindgen decodes UTF-8 → JS gets UTF-16 string
}Behind the scenes:
- JS calls
TextEncoder.encode()to convert UTF-16 → UTF-8 - Copies UTF-8 bytes into Wasm linear memory
- Passes pointer + length to Rust
- Rust sees a valid
&str
Rust → JS (String)
- Rust writes UTF-8 bytes to linear memory
- wasm-bindgen calls
TextDecoder.decode()to convert UTF-8 → UTF-16 - JS gets a native string
Performance Cost
Operation Time (1MB string)
─────────────────────────────────────────
JS → Rust (&str) ~2ms (encode + copy)
Rust → JS (String) ~2ms (decode + copy)
Rust internal (no copy) ~0ms (just a pointer)Rule: Minimize string boundary crossings. Do all string work on one side.
Optimization Strategies
1. Batch operations — don't cross per-item
// BAD: crosses boundary N times
#[wasm_bindgen]
pub fn process_one(item: &str) -> String { /* ... */ }
// GOOD: crosses boundary once
#[wasm_bindgen]
pub fn process_all(items_json: &str) -> String {
// Parse all items, process, return all results
// Only 2 boundary crossings total
}2. Use numeric IDs instead of strings
// BAD: passing strings back and forth
pub fn get_user_name(name: &str) -> String { /* ... */ }
// GOOD: pass ID, keep strings in Rust
pub fn create_user(name: &str) -> u32 { /* returns ID */ }
pub fn get_user_name(id: u32) -> String { /* lookup by ID */ }3. Use &[u8] for binary data
// DON'T encode binary data as strings
pub fn process_base64(data: &str) -> String { /* ... */ }
// DO pass raw bytes
pub fn process_bytes(data: &[u8]) -> Vec<u8> { /* ... */ }4. Pre-allocate with capacity
// BAD: many small allocations
let mut result = String::new();
for item in items {
result.push_str(&format!("{},", item));
}
// GOOD: one allocation
let mut result = String::with_capacity(items.len() * 20);
for item in items {
result.push_str(&format!("{},", item));
}Common String Patterns in Wasm
CSV/JSON processing
#[wasm_bindgen]
pub fn parse_csv(input: &str) -> JsValue {
let rows: Vec<Vec<&str>> = input
.lines()
.map(|line| line.split(',').collect())
.collect();
serde_wasm_bindgen::to_value(&rows).unwrap()
}Template rendering
#[wasm_bindgen]
pub fn render_template(template: &str, name: &str, count: u32) -> String {
template
.replace("{{name}}", name)
.replace("{{count}}", &count.to_string())
}Try It
Click Run to see UTF-8 encoding in action — byte lengths, multi-byte characters, and string building. Notice how Japanese characters take 3 bytes in UTF-8 but only 2 in JavaScript's UTF-16.
Try It
Chapter Quiz
Pass all questions to complete this lesson