Choosing Between P75 and P90 Budget Targets

P75 is the default everyone reaches for, but it is not always the right contract: a checkout flow where the slowest tenth of users abandons the cart needs P90, and an enterprise SLA that names a tail figure needs P95 or P99. Picking the percentile is a deliberate trade between coverage — how many users the budget protects — and cost — how many samples it takes to estimate and how often it flaps. This guide is part of the Percentile-Based Threshold Tuning reference, and it gives you a decision framework, a spread diagnostic, and a configurable assertion so the choice is data-driven rather than habitual.

The rule is simple to state and harder to apply: raise the percentile when the business cost of a slow tail is high, and lower it when your sample size or noise budget cannot support a stable estimate. P75 covers three in four users cheaply and stably; each step up to P90, P95, P99 buys more coverage but demands exponentially more samples and tolerates less noise.

Coverage vs Cost by Percentile

Percentile Users covered Min samples for stability Noise sensitivity Use when
P75 75% ~20 lab / ~1k field Low Default for typical timing metrics (LCP, INP) and content routes
P90 90% ~50 lab / ~3k field Medium Rare-but-severe metrics (CLS), revenue-adjacent flows
P95 95% ~200 lab / ~10k field High Checkout, payment, sign-up — tail abandonment is costly
P99 99% field only / ~50k Very high Contractual SLAs that name a tail latency figure

Two forces set the floor in column three: estimating a higher percentile means fewer observations sit beyond it, so each one moves the estimate more, and the slow tail is exactly where outliers cluster. That is why a P99 belongs to large field datasets, never to a handful of CI runs.

Diagnostic Steps

Let your own data tell you whether stepping up the percentile is worth it. If P75 and P90 are close, the extra coverage is nearly free; if they diverge sharply, the higher percentile is both more meaningful and more fragile.

  1. Compute the spread for one metric. Pull the raw values and print P75, P90, and P95 side by side.

    curl -s "$RUM_API/lcp?route=/checkout&window=14d" \
      | jq '[.sessions[].lcp] | sort | {p75: .[(length*0.75)|floor],
             p90: .[(length*0.90)|floor], p95: .[(length*0.95)|floor]}'

    Expected output, e.g.: { "p75": 2380, "p90": 3120, "p95": 4010 } — a wide P75-to-P90 gap signals a heavy tail worth gating at P90.

  2. Quantify the gap. A spread under ~20% between P75 and P90 means P75 already represents the population well; over ~40% means a meaningful slow cohort hides above P75 and a stricter percentile is justified for a high-stakes route.

  3. Check sample sufficiency. Confirm the session count clears the floor in the table for the percentile you are considering; if it does not, the higher percentile will flap and you should stay at P75.

Implementation

Make the percentile a per-route, per-metric parameter so the choice lives in config and the evaluator stays generic.

// scripts/percentile-budget.js
function percentile(values, p) {
  const v = [...values].sort((a, b) => a - b);
  const rank = (p / 100) * (v.length - 1);
  const lo = Math.floor(rank), hi = Math.ceil(rank);
  return lo === hi ? v[lo] : v[lo] + (rank - lo) * (v[hi] - v[lo]);
}

const MIN_SAMPLES = { 75: 20, 90: 50, 95: 200, 99: 50000 };

function evaluate({ metric, percentile: p, max, level }, values) {
  if (values.length < (MIN_SAMPLES[p] || 20)) {
    return { metric, status: "WARN", reason: "insufficient-samples" };
  }
  const value = Math.round(percentile(values, p) * 1000) / 1000;
  const breached = value > max;
  return { metric, percentile: p, value, max,
    status: breached ? (level === "error" ? "FAIL" : "WARN") : "PASS" };
}

module.exports = { percentile, evaluate };
{
  "/checkout": {
    "lcp": { "metric": "lcp", "percentile": 90, "max": 3000, "level": "error" },
    "inp": { "metric": "inp", "percentile": 90, "max": 250,  "level": "error" }
  },
  "/blog": {
    "lcp": { "metric": "lcp", "percentile": 75, "max": 2800, "level": "error" }
  }
}

The checkout route gates at P90 because the spread diagnostic showed a heavy tail and the route is revenue-critical; the blog stays at P75 where coverage is cheap and the tail is harmless.

CI Gating Assertion

The job evaluates each route at its configured percentile and fails only on an error-level breach.

# .github/workflows/percentile-choice-gate.yml
name: Percentile Choice Gate
on:
  pull_request:
    branches: [main]
jobs:
  percentile-gate:
    runs-on: ubuntu-latest
    timeout-minutes: 18
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with: { node-version: "20", cache: "npm" }
      - run: npm ci && npm run build
      - name: Collect runs
        run: npx lhci collect --numberOfRuns=9 --url=http://localhost:8080/checkout
      - name: Evaluate at configured percentile
        run: node ./scripts/percentile-budget.js
              --reports .lighthouseci --config ./config/route-percentiles.json

Note numberOfRuns=9 — a P90 needs more runs than the five that suffice for P75. Wire the same evaluator into Automated Regression Detection so a route trending toward its chosen percentile ceiling alerts before it breaches.

Verification

  • The job log prints one verdict per metric, e.g. { metric: 'lcp', percentile: 90, value: 2910, max: 3000, status: 'PASS' } — confirm the percentile field matches the route's intended choice, not a default 75.
  • Routes with too few collected runs report status: 'WARN', reason: 'insufficient-samples' instead of a number; raise numberOfRuns until they produce a value.
  • Re-run the spread diagnostic monthly; if a P75 route's P75-to-P90 gap widens past 40%, promote it to P90 and re-baseline.

Frequently Asked Questions

When is P90 worth the extra cost over P75?

When the slow tail carries real business risk and your sample supports it. If the route is revenue-critical (checkout, sign-up) or the metric is rare-but-severe (CLS), and the P75-to-P90 spread in your data exceeds ~40%, P90 protects a meaningful cohort that P75 ignores. If the spread is small or you lack the samples, the extra coverage is not worth the added flakiness.

Why not just gate everything at P99 for maximum coverage?

Because P99 is dominated by outliers and needs tens of thousands of field sessions to estimate stably — it will flap constantly on CI-scale samples and fail builds on background-tab stalls nobody felt. Reserve P99 for contractual SLAs measured against large field datasets, not lab gates.