Synthetic Monitoring vs RUM: Trade-offs for Budget Gating
Teams reach for one number to gate a deploy and discover the two sources disagree: a synthetic Lighthouse run says LCP is 2.1s while field data reports a P75 of 3.4s. Neither is wrong — they measure different populations under different conditions. This page, part of the Comparing Performance Testing Tools reference, breaks down exactly where synthetic (lab) monitoring and real-user monitoring (RUM) diverge, and which signal belongs on which side of your gate. Generic "use both" advice fails because it never says which one blocks the merge — and gating on the wrong one either ships regressions or fails builds on noise you cannot reproduce.
Where the Two Signals Diverge
Synthetic monitoring runs a controlled test — fixed device, fixed network, warm or cold cache, in CI before deploy. It is deterministic and reproducible, which is exactly what a merge gate needs, but it samples a single environment that may not match your users. RUM collects Custom Performance Beacons & RUM from actual sessions across every device, network, and geography — the ground truth — but it is noisy, arrives after deploy, and cannot block a pull request that has not shipped yet.
| Dimension | Synthetic (lab) | RUM (field) |
|---|---|---|
| Determinism | High — fixed device/network, repeatable | Low — every device, network, locale |
| When available | Pre-deploy, in CI | Post-deploy, after real traffic |
| Population | One emulated profile | True P75/P90/P99 of all users |
| Best metric fidelity | LCP, TBT, byte budgets | INP, CLS (need real interaction) |
| Gating role | Blocks the merge | Confirms the fix, catches drift |
| Noise floor (typical CV) | 3–6% with simulate throttling | 10–25% raw, needs large samples |
| Cost | CI minutes | Beacon ingestion + storage |
The practical split: gate the merge on synthetic, validate in production with RUM. Synthetic owns pre-deploy enforcement because it is the only signal that exists before code ships and the only one deterministic enough to fail a build fairly. RUM owns post-deploy verification and long-term drift detection because it is the only signal that reflects what users actually experienced. INP and CLS are the exception worth calling out: they depend on real interaction and layout over a full session, so lab values are directional at best — treat a synthetic INP regression as a warning, and confirm the true P75 against field data calibrated with Percentile-Based Threshold Tuning.
Diagnostic Steps
Quantify the gap between your own lab and field numbers before deciding how tightly to gate. First, pull the synthetic median from a Lighthouse run:
npx lighthouse https://www.example.com/ \
--only-categories=performance \
--output=json --output-path=./lab.json --quiet
node -e "const r=require('./lab.json').audits; \
console.log('lab LCP', r['largest-contentful-paint'].numericValue|0, 'ms');"
# → lab LCP 2120 ms
Then compute the field P75 for the same metric and URL from your beacon store, and compare:
# lab-vs-field gap: positive means the field is slower than the lab
node -e "const lab=2120, fieldP75=3400; \
console.log('gap', (((fieldP75-lab)/lab)*100).toFixed(0)+'%');"
# → gap 60%
A gap above roughly 30–40% means your synthetic profile is too optimistic — tighten the lab throttling toward a mid-range device on Fast 3G so the gate is honest, rather than tightening the threshold and failing every build.
Implementation
Make synthetic the blocking gate and RUM the post-deploy check, with the lab ceiling derived from the field P75 minus a calibration margin. The script below turns a measured field P75 into the lab budget the gate should enforce:
// derive-lab-budget.js — set the synthetic gate from field reality
const FIELD_P75 = { lcp: 3400, inp: 240 }; // ms, from your RUM store
const LAB_MARGIN = 0.85; // lab runs ~15% faster than field P75
const labBudget = Object.fromEntries(
Object.entries(FIELD_P75).map(([metric, p75]) => [
metric,
Math.round(p75 * LAB_MARGIN),
]),
);
console.log(JSON.stringify(labBudget));
// → {"lcp":2890,"inp":204}
// LCP gates as an error (lab-faithful); INP stays a warning (needs real input)
CI Gating Assertion
Gate the merge on the synthetic LCP (deterministic) and keep synthetic INP as a non-blocking warning, since its real value only emerges from field interaction. Add this to lighthouserc.json:
{
"ci": {
"assert": {
"assertions": {
"metric-lcp": ["error", { "maxNumericValue": 2890 }],
"interaction-to-next-paint": ["warn", { "maxNumericValue": 204 }],
"cumulative-layout-shift": ["warn", { "maxNumericValue": 0.1 }]
}
}
}
}
Verification
Confirm the split works end to end. Pre-deploy, the synthetic LCP assertion should fail a deliberately regressed branch — push an oversized hero image and watch the gate exit non-zero with metric-lcp failure. Post-deploy, the field P75 LCP for that route should move in the same direction within a traffic window of a few hours; if the lab passed but the field P75 climbs past budget, your synthetic profile is still too optimistic and needs recalibration. A healthy setup shows the lab-to-field gap holding steady below ~30% release over release — track it on the same dashboard you build in Visualizing Budget Trends with Grafana.
Frequently Asked Questions
Should I ever block a deploy on RUM data?
Not the pre-merge gate — RUM only exists after traffic hits the new code, so it cannot fail a pull request. You can, however, wire a post-deploy RUM check that auto-rolls-back when the field P75 regresses past budget for two consecutive windows. That is a safety net, not the merge gate.
Why does my lab INP look fine but field INP fails budget?
Lab INP is driven by a scripted interaction on a fast emulated CPU, while field INP captures every real tap and click on slow devices with contended main threads. Treat synthetic INP as directional, gate it as a warn, and set the real target from field P75 using Percentile-Based Threshold Tuning.
How large a RUM sample do I need to trust the field P75?
Enough sessions per route per window that the P75 is stable run to run — typically a few thousand. Below that, use the sampling and confidence guidance in RUM sampling strategies before acting on the number.