Wasm + PDF Generation
Introduction
Generating PDFs on the client side with Rust/Wasm eliminates the need to send sensitive data to a server for document creation. Invoices, reports, certificates, and receipts can all be generated directly in the browser. This lesson builds a minimal but valid PDF generator from scratch and explains the PDF file format internals.
Why client-side PDF generation?
| Approach | Latency | Privacy | Offline | Server cost |
|---|---|---|---|---|
| Server-side (wkhtmltopdf) | 200-500ms + network | Data sent to server | No | High |
| Server-side (Puppeteer) | 500-2000ms + network | Data sent to server | No | Very high |
| Client JS (jsPDF) | 50-200ms | Data stays local | Yes | None |
| Client Wasm (Rust) | 10-50ms | Data stays local | Yes | None |
Wasm is 3-5x faster than JavaScript for PDF generation because it avoids GC pressure when building large byte arrays and performs efficient string formatting.
PDF file format anatomy
A PDF file has four main sections:
┌──────────────────────────────┐
│ Header │ %PDF-1.4
├──────────────────────────────┤
│ Body │ Objects (pages, fonts,
│ (numbered objects) │ images, content streams)
├──────────────────────────────┤
│ Cross-Reference Table │ Byte offsets of each object
│ (xref) │ for random access
├──────────────────────────────┤
│ Trailer │ Points to root object
│ │ and xref table
└──────────────────────────────┘Object hierarchy
Every PDF has this tree structure:
Catalog (root)
└── Pages (collection)
├── Page 1
│ ├── MediaBox [0 0 612 792] (US Letter size in points)
│ ├── Resources
│ │ └── Font /F1 → Helvetica
│ └── Contents → Stream object
│ └── "BT /F1 12 Tf (Hello) Tj ET"
├── Page 2
│ └── ...
└── Page NPDF objects
Objects are numbered (1 0 obj, 2 0 obj, etc.) and can be dictionaries, arrays, streams, or primitive values:
% Dictionary object
1 0 obj
<< /Type /Catalog /Pages 2 0 R >>
endobj
% Stream object (contains drawing commands)
3 0 obj
<< /Length 44 >>
stream
BT
/F1 12 Tf
1 0 0 1 50 750 Tm
(Hello World) Tj
ET
endstream
endobjThe R in 2 0 R means "reference to object 2, generation 0" -- this is how objects link to each other.
Content stream commands
PDF content streams use a stack-based drawing language similar to PostScript:
| Command | Meaning | Example |
|---|---|---|
BT |
Begin text block | BT |
ET |
End text block | ET |
Tf |
Set font and size | /F1 12 Tf |
Tm |
Set text matrix (position) | 1 0 0 1 50 750 Tm |
Tj |
Show text string | (Hello) Tj |
Td |
Move text position | 0 -14 Td |
m |
Move to point | 50 700 m |
l |
Line to point | 550 700 l |
S |
Stroke path | S |
re |
Rectangle | 50 50 500 700 re |
rg |
Set fill color (RGB) | 1 0 0 rg (red) |
The coordinate system starts at the bottom-left corner. US Letter is 612 x 792 points (1 point = 1/72 inch).
Built-in fonts
PDF defines 14 standard fonts that every PDF reader must support -- no embedding required:
Helvetica Helvetica-Bold
Helvetica-Oblique Helvetica-BoldOblique
Times-Roman Times-Bold
Times-Italic Times-BoldItalic
Courier Courier-Bold
Courier-Oblique Courier-BoldOblique
Symbol ZapfDingbatsUsing these fonts keeps the PDF small. Custom fonts require embedding the font file, which can add hundreds of kilobytes.
The cross-reference table
The xref table enables random access to objects without reading the entire file. Each entry records the byte offset of an object from the start of the file:
xref
0 5
0000000000 65535 f ← free object (always first)
0000000009 00000 n ← object 1 at byte 9
0000000058 00000 n ← object 2 at byte 58
0000000115 00000 n ← object 3 at byte 115
0000000258 00000 n ← object 4 at byte 258This is critical for large PDFs -- a reader can jump directly to page 500 without parsing pages 1-499.
Adding images to PDFs
Images in PDF are stored as stream objects with specific filters:
% Image object (conceptual)
5 0 obj
<< /Type /XObject
/Subtype /Image
/Width 200
/Height 150
/ColorSpace /DeviceRGB
/BitsPerComponent 8
/Length 90000
/Filter /DCTDecode ← This means JPEG data
>>
stream
[raw JPEG bytes here]
endstream
endobjFor Wasm, you can pass image bytes from JavaScript to Rust and embed them directly as JPEG (DCTDecode) or PNG (FlateDecode) streams.
Table generation
Tables in PDF are drawn manually using line and text commands:
Content stream for a simple table:
% Draw grid lines
0.5 w % line width
50 700 m 550 700 l S % top border
50 680 m 550 680 l S % header separator
50 660 m 550 660 l S % row 1
50 700 m 50 660 l S % left border
300 700 m 300 660 l S % column separator
550 700 m 550 660 l S % right border
% Header text
BT /F1 10 Tf
1 0 0 1 55 685 Tm (Name) Tj
1 0 0 1 305 685 Tm (Value) Tj
ETThis is why PDF libraries are valuable -- calculating cell positions, handling text wrapping, and managing page breaks is tedious to do manually.
Production crates: printpdf and genpdf
For real applications, these Rust crates compile to Wasm:
printpdf -- Low-level PDF generation:
// Requires printpdf crate
use printpdf::*;
let (doc, page1, layer1) = PdfDocument::new(
"My Document", Mm(210.0), Mm(297.0), "Layer 1"
);
let font = doc.add_builtin_font(BuiltinFont::Helvetica).unwrap();
let layer = doc.get_page(page1).get_layer(layer1);
layer.use_text("Hello from Rust!", 14.0, Mm(10.0), Mm(280.0), &font);genpdf -- Higher-level with layout engine:
// Requires genpdf crate
let font = genpdf::fonts::from_files("./fonts", "Helvetica", None).unwrap();
let mut doc = genpdf::Document::new(font);
doc.push(genpdf::elements::Paragraph::new("Hello from Rust!"));
doc.push(genpdf::elements::Break::new(1));
doc.push(genpdf::elements::Paragraph::new("Second paragraph."));Try it
Extend the PDF generator to:
- Add drawing commands for horizontal rules (lines across the page)
- Support bold text by switching to
/F2(Helvetica-Bold) mid-stream - Add page numbers at the bottom of each page
- Generate a table with grid lines and cell text
- Add color to text using the
rg(fill color) command