Calibrating CPU Throttling for CI Runners

Q: What exactly is BenchmarkIndex and where do I find it?

An unitless score Lighthouse computes during every run by timing a fixed CPU workload on the host; higher means faster. It is in the report JSON at environment.benchmarkIndex and captures both hardware and current load.

Q: Why not just pin the runner and hardcode a multiplier?

Pinning removes hardware variance but not load variance; a shared runner under load is slower than idle, so a hardcoded multiplier still drifts. Deriving the multiplier from the live BenchmarkIndex corrects for both.

Q: Does a dynamic multiplier make results non-reproducible?

It makes the emulated device reproducible, which is the goal. The multiplier varies so the throttled CPU speed stays fixed; a constant multiplier on variable hardware is what breaks reproducibility.

A cpuSlowdownMultiplier of 4 is correct on exactly one machine: the one Lighthouse's default was tuned against. On a faster runner, a 4× multiplier under-throttles and your TBT looks artificially good; on a slower or noisier runner, the same 4× over-throttles and the gate flaps red on builds that are actually fine. The multiplier is a ratio, and a fixed ratio applied to variable hardware produces a moving emulation target. This guide — part of the Device & Network Emulation Weighting reference — replaces the constant with a multiplier computed at runtime from the runner's measured CPU speed, so every run emulates the same mid-range device regardless of which machine GitHub hands you.

The fix hinges on one number Lighthouse already reports: BenchmarkIndex, an unitless score of how fast the host CPU executed a fixed workload during the run. Higher means faster. Calibrate once against a target device's BenchmarkIndex, then scale the multiplier on every run to hit that target.

Runner-vs-Device CPU Benchmark Reference

The table maps representative hardware to its typical BenchmarkIndex and shows the multiplier needed to emulate a mid-range phone (target BenchmarkIndex ≈ 700, roughly a Moto G4). The multiplier is simply the runner's index divided by the target.

Host	Typical BenchmarkIndex	Multiplier for target 700	Effect of fixed 4×
GitHub `ubuntu-latest` (4 vCPU)	~1300	1.86×	over-throttles (too slow)
GitHub `ubuntu-latest` (2 vCPU)	~850	1.21×	over-throttles
Self-hosted high-end (8 vCPU)	~2600	3.71×	near-correct by luck
Noisy shared runner (loaded)	~500	0.71×	over-throttles badly
Target: mid-range phone	~700	1.00× (reference)	—

The rightmost column is the point: a fixed 4× is wrong almost everywhere, and how wrong depends on load, so it drifts build to build. A multiplier derived from the live BenchmarkIndex holds the emulated device constant even as the runner's real speed varies.

Diagnostic Steps

Read the BenchmarkIndex from any existing report. It lives in the run's environment block.

npx lighthouse https://staging.example.com --only-categories=performance \
  --output=json --output-path=./run.json --quiet
node -e "console.log('BenchmarkIndex:', require('./run.json').environment.benchmarkIndex)"

Example output: BenchmarkIndex: 1287 — a fast 4-vCPU hosted runner, far above the 700 target.

Sample it across several runs to see how much the runner's speed varies under CI load. A spread wider than ±15% means the runner is contended and needs pinning before calibration is meaningful.
```
for i in 1 2 3; do
  npx lighthouse https://staging.example.com --only-categories=performance \
    --output=json --output-path=./b$i.json --quiet
  node -e "console.log(require('./b$i.json').environment.benchmarkIndex)"
done
```
Example output: 1290, 1305, 1180 — a 10% spread, acceptable; cap outliers per the variance protocol below.

Implementation

Compute the multiplier dynamically from a calibration run's BenchmarkIndex, then feed it into the throttling config. The target index defines the device you are emulating; clamp the result so a wildly off runner cannot produce an absurd multiplier.

// dynamic-cpu-throttle.js — derive cpuSlowdownMultiplier from runner speed.
const { execSync } = require("child_process");
const fs = require("fs");

const TARGET_BENCHMARK_INDEX = 700; // mid-range phone (≈ Moto G4)
const URL = process.env.CALIBRATION_URL || "https://staging.example.com";

// One quick calibration run to read this runner's CPU speed.
execSync(
  `npx lighthouse ${URL} --only-categories=performance ` +
  `--output=json --output-path=./calib.json --quiet ` +
  `--chrome-flags="--headless --no-sandbox --disable-dev-shm-usage"`,
  { stdio: "inherit" }
);

const { benchmarkIndex } = JSON.parse(
  fs.readFileSync("./calib.json", "utf8")).environment;

// Faster runner (higher index) needs a larger multiplier to feel as slow
// as the target device. Clamp to a sane range.
const raw = benchmarkIndex / TARGET_BENCHMARK_INDEX;
const multiplier = Math.min(Math.max(raw, 1), 8);

console.log(`BenchmarkIndex ${benchmarkIndex} → cpuSlowdownMultiplier ${multiplier.toFixed(2)}`);
fs.writeFileSync("./cpu-throttle.json",
  JSON.stringify({ cpuSlowdownMultiplier: Number(multiplier.toFixed(2)) }, null, 2));

Run this as the first step of the performance job; the emitted cpu-throttle.json is then merged into the Lighthouse throttling settings so the gated run uses the calibrated multiplier instead of the static 4×.

CI Gating Assertion

Wire the calibration step ahead of collection, and assert on the BenchmarkIndex itself so a runner that is too slow or too contended to emulate reliably fails loudly rather than silently skewing the gate. The lighthouserc assertion below treats an out-of-band runner as an environment failure, not a code failure.

- name: Calibrate CPU throttle
  run: node ./scripts/dynamic-cpu-throttle.js
  env:
    CALIBRATION_URL: https://staging.example.com

- name: Run Lighthouse CI with calibrated throttle
  run: npx lhci autorun --collect.settings.throttling.cpuSlowdownMultiplier=$(node -e "console.log(require('./cpu-throttle.json').cpuSlowdownMultiplier)")

{
  "ci": {
    "assert": {
      "assertions": {
        "uses-text-compression": "off",
        "metric-tbt": ["error", { "maxNumericValue": 350 }],
        "diagnostics": ["warn", { "maxNumericValue": 0 }]
      },
      "assertMatrix": [
        {
          "matchingUrlPattern": ".*",
          "preset": "lighthouse:no-pwa"
        }
      ]
    }
  }
}

If the runner's BenchmarkIndex falls below roughly 500 or above 3000, the derived multiplier hits a clamp boundary — treat that as a signal to pin the runner rather than trust the result.

Verification

After calibration, the emulated device should be stable even though the raw runner speed is not. Confirm two things:

The multiplier tracks the runner. Print it from cpu-throttle.json on each run; on a hosted runner it should land near 1.2–1.9×, not 4×. If it reads exactly 4×, the static default is still in effect and calibration did not run.
TBT variance shrinks. Compare the standard deviation of TBT across five builds before and after calibration. Passing output is a tighter spread — for example σ dropping from ~80 ms to under ~30 ms — because the emulated CPU is now constant. Persistent wide variance after calibration means the runner is contended; pin it and apply the denoising in Reducing Lighthouse CI Variance in Staging.

Frequently Asked Questions

What exactly is BenchmarkIndex and where do I find it?

It is an unitless score Lighthouse computes during every run by timing a fixed CPU workload on the host — higher means a faster machine. It lives in the JSON report at environment.benchmarkIndex. Because it is measured live, it captures both the runner's hardware and its current load, which is exactly what you need to keep the emulated device constant.

Why not just pin the runner and hardcode a multiplier?

Pinning hardware removes hardware variance but not load variance — a shared runner under a heavy job is slower than the same runner idle, so a hardcoded multiplier still drifts. Deriving the multiplier from the live BenchmarkIndex corrects for both. Pin the runner and calibrate for the most stable result.

Does a dynamic multiplier make results non-reproducible?

It makes the emulated device reproducible, which is the goal. The multiplier changes so that the throttled CPU speed stays fixed; a constant multiplier on variable hardware is what actually breaks reproducibility. Log the BenchmarkIndex and derived multiplier on every run so the calibration is auditable.

Calibrating CPU Throttling for CI Runners #

Runner-vs-Device CPU Benchmark Reference #

Diagnostic Steps #

Implementation #

CI Gating Assertion #

Verification #

Frequently Asked Questions #

Related Pages #