Lighthouse CI & WebPageTest Integration
A performance gate is only as trustworthy as the determinism of the runs behind it. Bolting an audit onto a pipeline as an advisory comment changes nothing; turning that audit into a required status check that exits non-zero on a real regression changes everything. This reference treats Lighthouse CI and WebPageTest as two halves of one enforcement contract: Lighthouse CI supplies fast, reproducible lab metrics that gate every pull request, and WebPageTest supplies the deep network-waterfall and filmstrip diagnostics that explain why a metric moved. Wired together behind a single assertion layer, they convert subjective "the site feels slow" debates into quantified thresholds that block suboptimal code before it merges.
The integration splits into five engineering concerns: how synthetic runs are collected deterministically, what metrics and thresholds define a breach, how the gate is wired into CI/CD, how regressions are caught in production through observability, and what happens when the gate itself fails. This page is the top-level spec for all five. It links down to the detailed guides for each surface — the collection and storage layer in Lighthouse CI Configuration & Storage, field telemetry in Custom Performance Beacons & RUM, parallel coverage in GitHub Actions Performance Matrices, isolated network agents in WebPageTest Private Instance Setup, and tool selection in Comparing Performance Testing Tools.
Architecture Overview
A pull request fans out to two synthetic engines in parallel. Lighthouse CI collects a median of repeated runs and emits a JSON report; a WebPageTest agent runs the same URLs through a controlled connection profile and returns its own metrics. Both result sets feed a single assertion step that compares values against the budget. A pass releases the merge through a status check; a breach exits non-zero. Either way, artifacts upload to a storage backend that powers the trend dashboard and the regression baseline.
The key design rule is that Lighthouse CI and WebPageTest never disagree on whether the gate passed — assertions live in one place. They differ only in depth: Lighthouse CI is the fast verdict on every commit, WebPageTest is the slower forensic engine you reach for when a verdict needs explaining or when you need real network control that simulated throttling cannot reproduce.
Metric Selection & Threshold Matrix
Gate on a small set of primary metrics and keep the rest as diagnostic signals. Primary metrics block the merge; diagnostic signals annotate the PR but never fail it. Map every lab threshold back to a field percentile — set the lab assertion 10–15% tighter than your P75 field value to absorb the lab-to-field gap, and always pin the device class and connection profile the number applies to.
| Metric | Role | Desktop / Cable (P75) | High-end mobile / 4G (P75) | Mid-range mobile / Fast 3G (P75) |
|---|---|---|---|---|
| LCP | Primary gate | 2000 ms | 2500 ms | 3500 ms |
| INP | Primary gate | 150 ms | 200 ms | 300 ms |
| CLS | Primary gate | 0.10 | 0.10 | 0.10 |
| TBT | Diagnostic (lab proxy for INP) | 150 ms | 200 ms | 350 ms |
| Script transfer | Primary gate | 200 KB | 170 KB | 150 KB |
| Speed Index | Diagnostic | 2300 ms | 3000 ms | 4200 ms |
INP is a field-only metric, so the lab gate uses TBT as its proxy and the real INP ceiling is enforced through field telemetry. The methodology for deriving these numbers from your own traffic lives in the threshold calibration reference; this matrix is a representative starting point, not a copy-paste constant.
Asset & Tool Configuration
Both engines read from version-controlled config so runs are reproducible across machines. Lighthouse CI is configured through lighthouserc.json; the canonical annotated version of that file, including collection determinism and storage targets, is the subject of Lighthouse CI Configuration & Storage. The collect and assert blocks that matter for the gate:
{
"ci": {
"collect": {
"url": ["https://staging.example.com/", "https://staging.example.com/checkout"],
"numberOfRuns": 3,
"settings": {
"preset": "desktop",
"throttlingMethod": "simulate"
}
},
"assert": {
"assertions": {
"categories:performance": ["error", { "minScore": 0.9 }],
"largest-contentful-paint": ["error", { "maxNumericValue": 2500 }],
"cumulative-layout-shift": ["error", { "maxNumericValue": 0.1 }],
"total-blocking-time": ["error", { "maxNumericValue": 200 }],
"resource-summary:script:size": ["error", { "maxNumericValue": 200000 }]
}
}
}
}
WebPageTest runs from a script that submits the same URLs to an agent and polls for the result, then maps the returned metrics back to the same budget. Provisioning the dedicated agent is covered in WebPageTest Private Instance Setup; the comparison harness reads the WebPageTest JSON and exits non-zero on a breach:
// wpt-gate.js — submit URLs to a WebPageTest agent and assert the result
const WebPageTest = require("webpagetest");
const wpt = new WebPageTest(process.env.WPT_SERVER, process.env.WPT_API_KEY);
const BUDGET = { lcp: 2500, cls: 0.1, tbt: 200, bytesJs: 200000 };
const URLS = ["https://staging.example.com/", "https://staging.example.com/checkout"];
function run(url) {
return new Promise((resolve, reject) => {
wpt.runTest(
url,
{ connectivity: "Cable", runs: 3, location: "ci-agent:Chrome", pollResults: 5, timeout: 300 },
(err, result) => (err ? reject(err) : resolve(result.data.median.firstView))
);
});
}
(async () => {
let failed = false;
for (const url of URLS) {
const m = await run(url);
const checks = {
lcp: m["chromeUserTiming.LargestContentfulPaint"],
cls: m["chromeUserTiming.CumulativeLayoutShift"],
tbt: m.TotalBlockingTime,
bytesJs: m.breakdown.js.bytes,
};
for (const [k, v] of Object.entries(checks)) {
if (v > BUDGET[k]) {
console.error(`✗ ${url} ${k}=${v} exceeds budget ${BUDGET[k]}`);
failed = true;
} else {
console.log(`✓ ${url} ${k}=${v} within ${BUDGET[k]}`);
}
}
}
process.exit(failed ? 1 : 0);
})();
The two budgets — assertions in lighthouserc.json and BUDGET in wpt-gate.js — are the same numbers expressed twice. Keep them in one shared JSON file imported by both if you run both engines as required checks.
CI/CD Gating Integration
The gate lives in a GitHub Actions workflow on pull_request. Lighthouse CI runs lhci autorun, which collects, asserts, and uploads in one command; a non-zero exit blocks the merge once the check is required in branch protection. The per-PR comment and status-check pattern is detailed in Running Lighthouse CI on Every Pull Request, and fanning the same run across viewports and routes is covered in GitHub Actions Performance Matrices.
name: Performance Gating
on:
pull_request:
branches: [main]
jobs:
lighthouse-ci:
runs-on: ubuntu-latest
timeout-minutes: 15
concurrency:
group: lhci-${{ github.ref }}
cancel-in-progress: true
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: "20"
cache: "npm"
- run: npm ci
- run: npm run build
- name: Run Lighthouse CI
run: npx lhci autorun
env:
LHCI_TOKEN: ${{ secrets.LHCI_TOKEN }}
LHCI_SERVER_BASE_URL: ${{ secrets.LHCI_SERVER_BASE_URL }}
webpagetest:
runs-on: ubuntu-latest
timeout-minutes: 20
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with: { node-version: "20", cache: "npm" }
- run: npm ci
- name: WebPageTest gate
run: node wpt-gate.js
env:
WPT_SERVER: ${{ secrets.WPT_SERVER }}
WPT_API_KEY: ${{ secrets.WPT_API_KEY }}
Run Lighthouse CI as a required check on every PR because it is fast; run WebPageTest as a required check only on routes where network-level control matters, or on a nightly schedule, because its turnaround is measured in minutes rather than seconds. Require the lighthouse-ci check in branch protection so a breach is unmergeable.
Observability & Regression Detection
Lab gates catch regressions a developer introduces; they cannot catch regressions that only appear under real device and network diversity. Close that gap with two feeds. First, a scheduled synthetic run against production on a fixed cadence, comparing each run to a rolling baseline so slow drift surfaces before it crosses a hard threshold. Second, real-user telemetry through Custom Performance Beacons & RUM, which captures field LCP, INP, and CLS at P75 and P99 and feeds those percentiles back into the threshold matrix so lab budgets stay anchored to reality.
// minimal field beacon for the metrics the lab gate proxies
import { onLCP, onINP, onCLS } from "web-vitals";
function send(metric) {
navigator.sendBeacon(
"/rum",
JSON.stringify({ name: metric.name, value: metric.value, id: metric.id, path: location.pathname })
);
}
onLCP(send);
onINP(send);
onCLS(send);
When the field P75 for a route drifts above the lab ceiling, the lab number is stale and should be retightened; when the field is comfortably under but the lab keeps failing, the lab environment is noisier than production and the runner needs attention, not the budget.
Failure Modes & Escalation
- Flaky gate, no real regression → variance from a noisy runner or real-network throttling. Raise
numberOfRunsto 5, switch tothrottlingMethod: simulate, and confirm the metric swings under 10% across runs before tightening anything. - Lighthouse passes, WebPageTest fails → the two engines saw different network conditions. Lighthouse simulates the connection; WebPageTest shaped a real one. Treat WebPageTest as authoritative for connection-sensitive metrics and reconcile in Comparing Performance Testing Tools.
- Third-party drift breaks the build → a vendor shipped a heavier tag. Pin vendor versions or widen the tolerance, and track the budget against Third-Party Script Constraints.
- Token or agent unreachable → an expired
LHCI_TOKENor down WebPageTest agent fails the upload, not the audit. Fail the upload soft (if: always()) so a storage outage never blocks a clean merge, and alert the owning team. - Escalation path → first breach annotates the PR; a breach that recurs on
mainopens a tracked regression ticket with the WebPageTest filmstrip and Lighthouse report attached, owned by the team that merged the change.
Frequently Asked Questions
Do I need both Lighthouse CI and WebPageTest, or is one enough?
Most teams start with Lighthouse CI alone because it gates every PR in seconds. Add WebPageTest when you need real connection control, multi-location runs, or filmstrip and waterfall diagnostics that explain a regression Lighthouse only flags. They share one budget, so adding the second engine does not mean a second set of thresholds. See Comparing Performance Testing Tools for the decision.
Which metrics should actually block a merge?
Block on LCP, CLS, a script-byte budget, and TBT as the lab proxy for INP. Keep Speed Index and other signals as non-blocking diagnostics. Always derive each ceiling from your own P75 field data for a named device class and connection profile rather than copying defaults.
How do I keep lab budgets honest over time?
Feed real-user percentiles from Custom Performance Beacons & RUM back into your threshold matrix. When field P75 drifts above the lab ceiling the lab number is stale; when field is fine but the lab keeps failing, the runner is noisy and the environment needs fixing, not the budget.