From flaky to reliable: making your visual regression suite trustworthy
Visual regression testing is the most-rage-quit kind of automation. It's also the most useful — when you set it up so the team actually trusts it.
Visual regression testing has a reputation. Every team has a story about a suite that turned every PR red because of a 1-pixel anti-alias difference, and nobody's used it since.
The reputation is earned. The technique is also genuinely the highest-leverage UI test you can write — if you set it up so people trust it.
Where visual regression goes wrong
Three failure modes account for almost every dead suite:
- Pixel-perfect diffing. A small font-rendering shift between OS versions = 100 broken tests. Solve with a tolerance threshold (typically 0.1–0.5% per region) and anti-aliased diff algorithms.
- No approval workflow. When the diff is real (intentional design change), the team has no fast path to update the baseline, so they ignore the failure. Build a one-click "approve as new baseline" action — ideally the same person who made the change.
- Snapshot per page, not per component. When you snapshot the whole page, every change pulls a diff. Snapshot the component you care about; the rest is noise.
How we structure it in Lepta QA
When you create a visual test in Lepta QA, the default is:
- One screenshot per component-on-page tuple.
- 0.2% per-region tolerance (configurable).
- Auto-baseline on the first run of the default branch — not on PRs.
- Approval permission lives with whoever owns the component (we look at git blame to suggest reviewers).
import { visual } from "@lepta/visual";
visual.test("checkout summary", async ({ page }) => {
await page.goto("/checkout");
await visual.expect(page.locator("[data-testid=summary]")).toMatchBaseline();
});
That's it. No screenshot directory hand-management, no Photoshop diffing.
The rule that fixes most flaky-VRT problems
If a visual test fails twice on the same PR with no intentional change, mute the test, file a bug, and pull a developer into the failure. Do not retry. Do not ignore. The whole value proposition of visual regression is that it surfaces real-world rendering bugs you can't write a unit test for — every time you retry-and-pass, you erode the team's belief that the suite is honest.
A visual regression suite is exactly as useful as the team's belief that its red badges mean something.
What to baseline first
When you adopt VRT, the temptation is to snapshot every page. Don't. Start with:
- Top three highest-trafficked pages (almost always: home, dashboard, checkout-equivalent).
- Top three components that show up everywhere (header/nav, buttons, input states).
- Empty / loading / error states — these regress the most because nobody manually clicks through them.
That's nine snapshots. You'll catch ~70% of meaningful UI rot with that footprint.
And then?
Then you wait. The first time a designer changes spacing on the global card component and your suite catches a regression nobody else noticed — that's when the team buys in. Until then, keep the surface small, the tolerance generous, and the approval workflow boringly easy.
Stop juggling tabs. Ship with confidence.
Run live testing rooms, capture bugs, and let AI summarize the work — all in one workspace.