Using 75th Percentile for Real-World INP Targets

Interaction to Next Paint is the metric teams most often gate on the wrong statistic: a median INP looks healthy while the slowest quarter of taps feel sluggish, and a P99 fails the build on background-tab outliers nobody felt. The 75th percentile is the sweet spot Core Web Vitals itself uses, and this guide — part of the Percentile-Based Threshold Tuning reference — shows exactly why P75 is the standard and how to set, compute, and enforce a real-world P75 INP budget by device class.

Core Web Vitals scores a URL at the 75th percentile of field sessions, segmented by device, because it captures the upper-quartile experience without letting the worst 1–2% of sessions — driven by background-tab suspension, transient network stalls, or aggressive thermal throttling — dictate the verdict. A P75 INP at or below 200 ms earns the "Good" tier; the CI fail line sits a buffer above the target to absorb measurement variance and synthetic-to-field drift.

P75 INP Targets by Device Class

INP is dominated by main-thread availability, so the realistic P75 target diverges sharply by device tier. Gate each tier against its own P75; never pool device classes into one distribution.

Device class Connection Field target (P75) CI fail line Min sessions / window
Desktop Cable / Fiber ≤ 150 ms 180 ms 1,000 / 7-day
High-end mobile 4G / LTE ≤ 200 ms 240 ms 1,000 / 7-day
Mid-range mobile Fast 3G ≤ 200 ms 260 ms 500 / 7-day
Low-end Android Slow 4G ≤ 300 ms 350 ms 500 / 14-day

The 200 ms target is the "Good" ceiling Core Web Vitals applies regardless of device, but the fail line widens on slower tiers because their distributions are wider and noisier — a tight fail line on low-end Android only generates false failures.

Diagnostic Steps

Confirm you have enough field data and a real P75 before you gate on it.

  1. Check sample density. A P75 from too few sessions is unstable. Count sessions in the window per device class.

    curl -s "$RUM_API/inp?device=high-end-mobile&window=7d" \
      | jq '.sessions | length'

    Expected: a number at or above the floor in the table (≥ 1,000 here). Below it, widen the window to 14 days before trusting the percentile.

  2. Compute the field P75. Pull the raw INP durations and interpolate the 75th percentile.

    curl -s "$RUM_API/inp?device=high-end-mobile&window=7d" \
      | jq '[.sessions[].inp] | sort | .[(length*0.75)|floor]'

    Expected output: a value in milliseconds, e.g. 192 — the P75 INP for high-end mobile over the window.

  3. Check the confidence bound. If the P75 swings more than ±15 ms between consecutive days, extend the window to 14 days or stratify by connection type before setting the budget.

Implementation

Compute the P75 from filtered field beacons and emit a single budget verdict. Discard sub-50 ms input noise and over-5,000 ms idle/background events, which otherwise distort the tail.

// scripts/p75-inp-budget.js
const TARGET = { "high-end-mobile": 200, "desktop": 150, "low-end-android": 300 };
const FAIL_LINE = { "high-end-mobile": 240, "desktop": 180, "low-end-android": 350 };

function p75(values) {
  const v = values.filter((d) => d > 50 && d < 5000).sort((a, b) => a - b);
  const rank = 0.75 * (v.length - 1);
  const lo = Math.floor(rank), hi = Math.ceil(rank);
  return lo === hi ? v[lo] : v[lo] + (rank - lo) * (v[hi] - v[lo]);
}

function evaluate(device, durations, minSamples = 500) {
  if (durations.length < minSamples) return { status: "WARN", reason: "low-sample" };
  const value = Math.round(p75(durations));
  const status = value > FAIL_LINE[device] ? "FAIL"
    : value > TARGET[device] ? "WARN" : "PASS";
  return { device, value, target: TARGET[device], status };
}

module.exports = { p75, evaluate };

Field beacons feed this evaluator from your RUM stream; for the collection side — the PerformanceObserver wiring and payload design — see the Custom Performance Beacons & RUM reference.

CI Gating Assertion

Synthetic INP in CI complements the field gate. Capture it with the interaction feature enabled and fail the build when the lab P75 crosses the device fail line.

# .github/workflows/inp-p75-gate.yml
name: INP P75 Gate
on:
  pull_request:
    branches: [main]
jobs:
  inp-gate:
    runs-on: ubuntu-latest
    timeout-minutes: 15
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with: { node-version: "20", cache: "npm" }
      - run: npm ci && npm run build
      - name: Collect with INP enabled
        run: npx lhci collect --numberOfRuns=5
              --collect.settings.chromeFlags="--enable-features=InteractionToNextPaint"
              --url=http://localhost:8080/checkout
      - name: Assert P75 INP
        run: node ./scripts/p75-inp-budget.js --reports .lighthouseci --device high-end-mobile

Verification

After merging, confirm the gate measures what you intend:

  • The job log prints a line such as { device: 'high-end-mobile', value: 192, target: 200, status: 'PASS' } — a numeric P75 below the target, not WARN: low-sample.
  • A deliberate slow-down (add a synthetic 250 ms blocking task to a click handler on a branch) flips the status to FAIL and exits non-zero. If it does not, the interaction feature flag is missing or events are being filtered out.
  • Field P75 in your dashboard tracks within roughly 10–15% of the lab P75; a persistent larger gap means the lab fail line is set too loosely relative to the field target.

Frequently Asked Questions

Why does Core Web Vitals use P75 for INP and not P90 or the median?

P75 covers three-quarters of real sessions, so a passing URL means most users have a good experience, while still ignoring the worst 1–2% of outliers caused by background-tab suspension or transient stalls. The median would hide a degraded slowest quarter; P90 and above become unstable in field data and over-penalise rare events. P75 is the balance point, which is why Google scores on it.

Should the CI fail line equal the 200 ms target?

No. Set the fail line a buffer above the target — roughly 240 ms on high-end mobile — so normal measurement variance and the synthetic-to-field gap do not flap the build. The 200 ms target is the field goal you steer toward; the fail line is the hard gate that blocks merges.

What if I don't have enough field sessions for a stable P75?

Widen the aggregation window (7 days to 14) or stratify by connection type rather than gating on a thin sample. Until the session count clears the per-device floor, downgrade the assertion to a warning so an under-sampled P75 never blocks a merge.