Synthetic Monitoring vs RUM: Trade-offs for Budget Gating

Q: Why does my lab INP look fine but field INP fails budget?

Lab INP uses a scripted interaction on a fast emulated CPU, while field INP captures real taps on slow devices with contended main threads. Gate synthetic INP as a warning and set the real target from field P75.

Teams reach for one number to gate a deploy and discover the two sources disagree: a synthetic Lighthouse run says LCP is 2.1s while field data reports a P75 of 3.4s. Neither is wrong — they measure different populations under different conditions. This page, part of the Comparing Performance Testing Tools reference, breaks down exactly where synthetic (lab) monitoring and real-user monitoring (RUM) diverge, and which signal belongs on which side of your gate. Generic "use both" advice fails because it never says which one blocks the merge — and gating on the wrong one either ships regressions or fails builds on noise you cannot reproduce.

Where the Two Signals Diverge

Synthetic monitoring runs a controlled test — fixed device, fixed network, warm or cold cache, in CI before deploy. It is deterministic and reproducible, which is exactly what a merge gate needs, but it samples a single environment that may not match your users. RUM collects Custom Performance Beacons & RUM from actual sessions across every device, network, and geography — the ground truth — but it is noisy, arrives after deploy, and cannot block a pull request that has not shipped yet.

Dimension	Synthetic (lab)	RUM (field)
Determinism	High — fixed device/network, repeatable	Low — every device, network, locale
When available	Pre-deploy, in CI	Post-deploy, after real traffic
Population	One emulated profile	True P75/P90/P99 of all users
Best metric fidelity	LCP, TBT, byte budgets	INP, CLS (need real interaction)
Gating role	Blocks the merge	Confirms the fix, catches drift
Noise floor (typical CV)	3–6% with simulate throttling	10–25% raw, needs large samples
Cost	CI minutes	Beacon ingestion + storage

The practical split: gate the merge on synthetic, validate in production with RUM. Synthetic owns pre-deploy enforcement because it is the only signal that exists before code ships and the only one deterministic enough to fail a build fairly. RUM owns post-deploy verification and long-term drift detection because it is the only signal that reflects what users actually experienced. INP and CLS are the exception worth calling out: they depend on real interaction and layout over a full session, so lab values are directional at best — treat a synthetic INP regression as a warning, and confirm the true P75 against field data calibrated with Percentile-Based Threshold Tuning.

Diagnostic Steps

Quantify the gap between your own lab and field numbers before deciding how tightly to gate. First, pull the synthetic median from a Lighthouse run:

npx lighthouse https://www.example.com/ \
  --only-categories=performance \
  --output=json --output-path=./lab.json --quiet
node -e "const r=require('./lab.json').audits; \
  console.log('lab LCP', r['largest-contentful-paint'].numericValue|0, 'ms');"
# → lab LCP 2120 ms

Then compute the field P75 for the same metric and URL from your beacon store, and compare:

# lab-vs-field gap: positive means the field is slower than the lab
node -e "const lab=2120, fieldP75=3400; \
  console.log('gap', (((fieldP75-lab)/lab)*100).toFixed(0)+'%');"
# → gap 60%

A gap above roughly 30–40% means your synthetic profile is too optimistic — tighten the lab throttling toward a mid-range device on Fast 3G so the gate is honest, rather than tightening the threshold and failing every build.

Implementation

Make synthetic the blocking gate and RUM the post-deploy check, with the lab ceiling derived from the field P75 minus a calibration margin. The script below turns a measured field P75 into the lab budget the gate should enforce:

// derive-lab-budget.js — set the synthetic gate from field reality
const FIELD_P75 = { lcp: 3400, inp: 240 }; // ms, from your RUM store
const LAB_MARGIN = 0.85; // lab runs ~15% faster than field P75

const labBudget = Object.fromEntries(
  Object.entries(FIELD_P75).map(([metric, p75]) => [
    metric,
    Math.round(p75 * LAB_MARGIN),
  ]),
);

console.log(JSON.stringify(labBudget));
// → {"lcp":2890,"inp":204}
// LCP gates as an error (lab-faithful); INP stays a warning (needs real input)

CI Gating Assertion

Gate the merge on the synthetic LCP (deterministic) and keep synthetic INP as a non-blocking warning, since its real value only emerges from field interaction. Add this to lighthouserc.json:

{
  "ci": {
    "assert": {
      "assertions": {
        "metric-lcp": ["error", { "maxNumericValue": 2890 }],
        "interaction-to-next-paint": ["warn", { "maxNumericValue": 204 }],
        "cumulative-layout-shift": ["warn", { "maxNumericValue": 0.1 }]
      }
    }
  }
}

Verification

Confirm the split works end to end. Pre-deploy, the synthetic LCP assertion should fail a deliberately regressed branch — push an oversized hero image and watch the gate exit non-zero with metric-lcp failure. Post-deploy, the field P75 LCP for that route should move in the same direction within a traffic window of a few hours; if the lab passed but the field P75 climbs past budget, your synthetic profile is still too optimistic and needs recalibration. A healthy setup shows the lab-to-field gap holding steady below ~30% release over release — track it on the same dashboard you build in Visualizing Budget Trends with Grafana.

Frequently Asked Questions

Should I ever block a deploy on RUM data?

Not the pre-merge gate — RUM only exists after traffic hits the new code, so it cannot fail a pull request. You can, however, wire a post-deploy RUM check that auto-rolls-back when the field P75 regresses past budget for two consecutive windows. That is a safety net, not the merge gate.

Why does my lab INP look fine but field INP fails budget?

Lab INP is driven by a scripted interaction on a fast emulated CPU, while field INP captures every real tap and click on slow devices with contended main threads. Treat synthetic INP as directional, gate it as a warn, and set the real target from field P75 using Percentile-Based Threshold Tuning.

How large a RUM sample do I need to trust the field P75?

Enough sessions per route per window that the P75 is stable run to run — typically a few thousand. Below that, use the sampling and confidence guidance in RUM sampling strategies before acting on the number.

Synthetic Monitoring vs RUM: Trade-offs for Budget Gating #

Where the Two Signals Diverge #

Diagnostic Steps #

Implementation #

CI Gating Assertion #

Verification #

Frequently Asked Questions #

Related Pages #