Snapshot Testing

This page explains why Holt includes snapshot testing and how the system works under the hood.

Why Snapshot Testing?

Unit tests verify behavior. Type systems catch interface mismatches. But neither catches when a CSS change accidentally hides a button, or when a dependency update subtly shifts your layout.

Snapshot testing fills this gap by comparing screenshots of your components against known-good baselines. If anything looks different, you find out immediately—before users do.

For a component library like Holt, snapshot testing is essential because:

Styling is the product: Components exist to look right
CSS is global and fragile: One change can cascade unexpectedly
Cross-browser rendering varies: What works in Chrome may shift in Firefox

How Screenshot Comparison Works

Holt uses byte-level comparison: if the PNG files are identical byte-for-byte, the test passes. This approach is:

Fast: No image decoding or pixel analysis needed
Deterministic: Either it matches or it doesn't
Sensitive: Any change, even a single pixel, triggers a diff

The tradeoff is sensitivity. Identical visual output can produce different bytes due to:

PNG encoder differences
Font rendering variations between OS versions
Timing differences in animations

That's why Holt captures all baselines with the same browser (Firefox) and recommends pinning browser versions in CI.

Local vs CI Workflows

Snapshot testing behaves differently depending on context.

Local Development

When you run holt snapshot locally:

Firefox opens visibly (not headless)
Screenshots are captured at 1280x720
Differences trigger a GUI comparison window
You decide whether to accept or reject each change
Orphaned baselines (from deleted stories) are cleaned up

The GUI makes it easy to review changes interactively. Toggle between baseline and new screenshot to spot differences, then accept or reject with a click.

CI Environment

When the CI environment variable is set:

Firefox runs headless (no display needed)
Screenshots are captured at the same resolution
Differences cause the test to fail
New screenshots are saved for artifact upload
Orphan cleanup is skipped (baselines shouldn't change in CI)

This workflow lets you:

Run tests to detect regressions
Download artifacts when tests fail
Review the new screenshots locally
Update baselines and commit if the changes are intentional

GUI vs Terminal Approval

The comparison GUI requires a display. On systems without one (SSH sessions, some CI environments, Linux without X11), Holt falls back to terminal mode:

Both images are saved to temp files
Your OS's default image viewer opens them (if available)
You're prompted to accept or reject via stdin

Terminal mode is functional but less convenient. The GUI is preferred when available.

Orphan Cleanup

When you delete a story or rename a variant, its baseline becomes orphaned—it no longer corresponds to anything in your storybook. Holt detects and removes these automatically during local runs.

Orphan cleanup only runs locally (not in CI) because:

CI shouldn't modify committed baselines
Detecting orphans requires knowing all current variants
Accidental deletions are easier to recover locally

If you see "Cleaning up N orphaned baseline(s)" in output, those files will be deleted. Review the list to make sure they're actually obsolete.

Limitations and Tradeoffs

Byte Comparison is Strict

Any rendering difference fails the test. This catches real regressions but can cause false positives from:

OS font rendering changes
Browser updates
Timing issues with async content

Mitigation: Pin browser versions in CI and ensure components are fully rendered before capture.

Firefox Only

Holt uses Firefox via geckodriver. This simplifies setup (one browser to install) but means you're not testing Chrome or Safari rendering.

For cross-browser snapshot testing, consider additional tooling or a service like Percy or Chromatic that tests multiple browsers.

No Pixel Tolerance

Some snapshot testing tools allow "fuzzy" matching that ignores small differences. Holt's byte comparison doesn't. This is intentional—if pixels changed, you should know.

If you need tolerance, you could implement perceptual hashing or diff percentages, but that adds complexity and can mask real issues.

Design Decisions

Why geckodriver/Firefox?

Open source with stable WebDriver support
Consistent rendering across platforms
Headless mode works reliably
No licensing concerns

Why Store Baselines in the Repo?

Baselines in version control mean:

CI can compare without external dependencies
Changes are code-reviewed along with the code that caused them
History shows exactly when visual changes occurred
Anyone can run tests without downloading baselines separately

The cost is repository size, but PNG screenshots of components are typically small (10-50KB each).

Why a GUI for Review?

Terminal-based approval (y/n prompts) works but makes it hard to actually see the difference. The GUI lets you toggle between images instantly, making review faster and more confident.

The fallback to terminal mode ensures the tool works everywhere, even if less conveniently.

Why Snapshot Testing?​

How Screenshot Comparison Works​

Local vs CI Workflows​

Local Development​

CI Environment​

GUI vs Terminal Approval​

Orphan Cleanup​

Limitations and Tradeoffs​

Byte Comparison is Strict​

Firefox Only​

No Pixel Tolerance​

Design Decisions​

Why geckodriver/Firefox?​

Why Store Baselines in the Repo?​

Why a GUI for Review?​