Device & Network Emulation Weighting
A Lighthouse run on a fast desktop with no throttling tells you almost nothing about the user on a mid-range Android phone over Fast 3G, yet that user may be the majority of your traffic. Emulation weighting closes that gap: it maps each synthetic device-and-network profile to its real share of traffic, then collapses the per-profile metrics into a single weighted figure the gate can compare. This guide is part of the Threshold Calibration & Baseline Management reference, and it owns the first stage of that pipeline — making sure the raw runs that feed calibration represent the users you actually have.
The discipline has three coupled parts: choosing presets that match your audience, throttling each profile correctly so the timings are deterministic, and assigning weights that sum to one so high-traffic profiles dominate the verdict. Get the throttling wrong and every downstream percentile is built on noise; get the weights wrong and the gate optimizes for a user who barely exists.
Architecture of the Weighting Matrix
Construct a deterministic profile matrix that maps emulation presets to observed traffic and normalizes to exactly 1.0. The matrix is the contract between your real-user distribution and your lab gate: each cell is a (device × network) profile, and its weight is that profile's share of traffic. The diagram shows how the matrix routes parallel runs into a single weighted aggregate.
To build the matrix: extract RUM traffic distribution by device class (mobile, tablet, desktop) and connection type, map each bucket to a standard emulation preset (moto-g4, desktop-chrome, 3G-fast, 4G-good), normalize every weight so the column sums to exactly 1.0, and validate that the resulting concurrent-run count fits your CI runner capacity. Store it as committed JSON so the weights are versioned alongside the code.
{
"profiles": [
{ "id": "mobile-3g", "device": "moto-g4", "network": "3G-fast", "weight": 0.45 },
{ "id": "mobile-4g", "device": "moto-g4", "network": "4G-good", "weight": 0.25 },
{ "id": "desktop-4g", "device": "desktop-chrome", "network": "4G-good", "weight": 0.20 },
{ "id": "desktop-eth","device": "desktop-chrome", "network": "desktop", "weight": 0.10 }
],
"metadata": { "version": "1.2.0", "normalized_sum": 1.0, "source": "rum_traffic_q3" }
}
Prerequisites & Environment
Weighting depends on deterministic emulation, which depends on a stable runner. Before wiring weights, confirm the following are in place:
@lhci/cli≥ 0.13 or the Lighthouse CLI ≥ 11, with Chrome ≥ 120 available on the runner. Pin exact versions so throttling semantics do not drift between minors.- RUM or CrUX traffic shares for at least the last full quarter, broken down by device class and connection type. Without this, the weights are guesses.
- A pinned-CPU runner. Variable runner hardware silently changes the effective CPU throttle, which is the single most common source of emulation jitter — the fix is in Calibrating CPU Throttling for CI Runners.
throttlingMethod: simulateas the default. Simulated throttling models the network in software, so timings do not depend on the runner's real bandwidth.
Configuration Reference
Generate the Lighthouse throttling config from the weighting matrix rather than hand-writing per-profile settings, so the network parameters and the weights can never disagree. Every field below is load-bearing: rttMs and throughputKbps define the simulated network, and cpuSlowdownMultiplier defines how much slower the runner pretends to be than itself.
// lighthouse-emulation-config.js — derive throttling from the weighting matrix.
const matrix = require("./weighting-matrix.json");
module.exports = matrix.profiles.map((p) => ({
id: p.id,
lighthouseFlags: {
chromeFlags: ["--headless", "--no-sandbox", "--disable-dev-shm-usage"],
formFactor: p.device.includes("desktop") ? "desktop" : "mobile",
throttlingMethod: "simulate",
throttling: {
// RTT and throughput model the network in software (deterministic).
rttMs: p.network === "3G-fast" ? 150 : p.network === "4G-good" ? 40 : 0,
throughputKbps:
p.network === "3G-fast" ? 1600
: p.network === "4G-good" ? 9000
: 10000000,
// Mobile presets emulate a slow CPU; desktop runs unthrottled CPU.
cpuSlowdownMultiplier: p.device.includes("desktop") ? 1 : 4,
},
},
}));
cpuSlowdownMultiplier: 4 is the Lighthouse mobile default, chosen to make a fast runner behave like a mid-range phone — but "4" is only correct relative to a specific runner speed, which is exactly why it needs per-runner calibration.
Step-by-Step Implementation
-
Run profiles in parallel using a CI matrix so all four collect concurrently. Expected output per profile is a
lh-<id>.jsonartifact uploaded for the aggregation job.name: Weighted Performance Gate on: [pull_request] jobs: emulate: runs-on: ubuntu-latest strategy: matrix: profile: [mobile-3g, mobile-4g, desktop-4g, desktop-eth] fail-fast: false steps: - uses: actions/checkout@v4 - name: Run Lighthouse run: npx lighthouse https://staging.example.com --config-path=./lighthouse-emulation-config.js --output=json --output-path=./results/${{ matrix.profile }}.json - uses: actions/upload-artifact@v4 with: name: lh-${{ matrix.profile }} path: ./results/${{ matrix.profile }}.json -
Denoise each profile before weighting. Run three iterations per profile, take the median of each metric, and retry any single run more than 15% off the median. This keeps a flaky run from contaminating the weighted aggregate, following the Statistical Noise & Flakiness Reduction protocol.
-
Aggregate with weights in a job that depends on the matrix. Expected output is three printed weighted metrics and a non-zero exit on breach.
// aggregate-weights.js — collapse per-profile medians into one weighted metric. const fs = require("fs"); const path = require("path"); const matrix = require("../weighting-matrix.json"); let weightedLCP = 0, weightedINP = 0, weightedCLS = 0; matrix.profiles.forEach((p) => { const report = JSON.parse( fs.readFileSync(path.join("./results", `lh-${p.id}.json`), "utf8")); const { lcp, inp, cls } = report.audits; weightedLCP += lcp.numericValue * p.weight; weightedINP += inp.numericValue * p.weight; weightedCLS += cls.numericValue * p.weight; }); console.log(`Weighted LCP: ${weightedLCP.toFixed(2)}ms`); console.log(`Weighted INP: ${weightedINP.toFixed(2)}ms`); console.log(`Weighted CLS: ${weightedCLS.toFixed(4)}`); if (weightedLCP > 3000 || weightedINP > 200 || weightedCLS > 0.1) process.exit(1);Expected output:
Weighted LCP: 2607.50ms,Weighted INP: 188.75ms,Weighted CLS: 0.0925, exit0.
Threshold Calibration
The throttling numbers are not arbitrary — each profile maps to a measured device and link. The table below is the calibration reference: it pairs each preset with its CPU multiplier and the RTT and throughput that define its simulated network, plus a representative weighted-target LCP. Re-derive the weights from your own traffic, and verify the CPU multiplier against your runner per Calibrating CPU Throttling for CI Runners.
| Profile | Device preset | CPU multiplier | RTT | Throughput | LCP target (P75) |
|---|---|---|---|---|---|
| mobile-3g | moto-g4 | 4× | 150 ms | 1600 kbps | ≤ 3000 ms |
| mobile-4g | moto-g4 | 4× | 40 ms | 9000 kbps | ≤ 2500 ms |
| desktop-4g | desktop-chrome | 1× | 40 ms | 9000 kbps | ≤ 1800 ms |
| desktop-eth | desktop-chrome | 1× | 0 ms | unthrottled | ≤ 1500 ms |
The mobile profiles carry the heaviest weights and the loosest ceilings on purpose: they represent the slowest realistic hardware most of your users hold, so the weighted figure leans toward their experience. How those mobile and desktop ceilings diverge in the budget itself is covered in Mobile vs Desktop Budget Divergence.
CI Enforcement Snippet
Gate on the weighted aggregate, not on any single profile, and require the check in branch protection. This step runs after the matrix and the denoise step, failing the build only when the weighted metric crosses its calibrated ceiling.
aggregate:
needs: emulate
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/download-artifact@v4
with:
path: ./results
- name: Validate weight normalization
run: node -e "const m=require('./weighting-matrix.json');const s=m.profiles.reduce((a,p)=>a+p.weight,0);if(Math.abs(s-1)>0.001){console.error('weights sum to '+s);process.exit(1)}"
- name: Weighted aggregation gate
run: node ./scripts/aggregate-weights.js
The normalization check fails fast if the matrix has been edited to sum to anything other than 1.0, which would silently skew every weighted metric. For network isolation beyond simulated throttling, route the slowest profile through a dedicated agent per WebPageTest Private Instance Setup.
Troubleshooting & Edge Cases
- Weighted metric jitters between builds → the CPU throttle is drifting with runner hardware; pin the runner and calibrate the multiplier per Calibrating CPU Throttling for CI Runners.
- Weights no longer match traffic → inject a fresh matrix via a
TRAFFIC_MATRIX_OVERRIDEenvironment variable and fall back to the committed JSON on parse failure, so a stale matrix never blocks merges. - One profile is consistently flaky → it is likely the 3G profile saturating the simulated link; raise its iteration count to five and median-filter before weighting.
- Aggregate passes but a single mobile profile is terrible → weighted gating can mask a profile-level failure; add a per-profile
warnfloor alongside the weightederrorgate so regressions in a minority profile are still visible. - Concurrent runs exceed runner capacity → reduce matrix parallelism or stagger profiles; oversubscribed runners inflate CPU contention and corrupt the very emulation you are trying to control.
- Desktop and mobile share a ceiling → they should not; split the budgets so the unthrottled desktop run is held to a tighter LCP than the throttled mobile run.
Frequently Asked Questions
Where do the emulation weights actually come from?
From your real traffic, not a default. Pull the share of sessions by device class and connection type from RUM or CrUX over at least one full quarter, map each bucket to an emulation preset, and normalize the weights to sum to exactly 1.0. Commit the matrix as JSON so it is versioned with the code and re-derive it quarterly as traffic shifts.
Why simulated throttling instead of applied or "provided" throttling?
Simulated throttling models the network in software, so the timings do not depend on the runner's real bandwidth and stay deterministic across machines. Applied or provided throttling on a shared CI runner introduces real-network variance, which is the classic cause of a flapping gate. Keep throttlingMethod: simulate unless you are running on a dedicated agent.
Should the weighted gate replace per-profile assertions?
No — run both. The weighted aggregate is the merge gate because it reflects overall user impact, but a single slow profile can hide inside a passing weighted figure. Keep a warn-level per-profile floor so a regression in a minority profile, such as the slowest mobile link, is still surfaced in the report.