Building P75/P99 Aggregation Pipelines

A raw beacon stream is millions of individual numbers; a budget gate needs one number — the P75 — and one tripwire — the P99. The pipeline between them must compute percentiles that are accurate at the tail, cheap to update incrementally, and mergeable across time windows. This guide is part of the Custom Performance Beacons & RUM reference and covers the percentile-method tradeoff, rolling windows, storage, and the final CI assertion.

The naive approach — store every value and sort on read — is exact but does not scale and cannot be rolled up. Production pipelines use a mergeable sketch (t-digest or fixed-bucket histograms) so each ingestion window produces a small structure that can be combined into any larger window without re-reading raw events.

Percentile Method Comparison

The right method depends on tail accuracy, mergeability, and storage. The table compares the three you will choose between.

Method P99 accuracy Mergeable? Storage/window Best for
Exact sort Exact No All raw values Small volume, ad-hoc
Fixed-bucket histogram Bounded by bucket width Yes (add counts) ~dozens of ints Prometheus-style, known range
t-digest sketch High, adaptive at tail Yes (merge digests) ~1–5 KB High volume, accurate P99

For Core Web Vitals, t-digest is the default: it spends its resolution where the percentiles you gate on live (the upper tail), stays around a few KB per window regardless of session count, and merges cleanly so a daily digest rolls up into a 28-day reading without touching raw events.

Diagnostic Steps

Confirm the aggregation is faithful before you gate on it — a histogram whose buckets are too wide, or a digest fed unsorted garbage, produces a confident wrong number.

  1. Spot-check a computed percentile against a brute-force sort on one window of raw data:

    curl -fsSL "$RUM_QUERY_URL?window=1d&metric=LCP&raw=true" | \
      jq -s 'sort_by(.v) | .[(length*0.75)|floor].v'
    # 2380  -> compare against the pipeline's reported P75 for the same window
  2. Verify the digest is mergeable by checking a 7-day rollup equals the merge of seven daily digests, not a re-sort:

    curl -fsSL "$RUM_QUERY_URL?window=7d&metric=LCP&pct=99"   # rolled-up
    # { "LCP": 4480 }

Implementation

This pipeline ingests a window of raw beacons into a t-digest per metric and segment, persists the serialized digest, and answers percentile queries by merging the digests covering the requested window. It uses a t-digest library so the sketch math is correct and mergeable.

// aggregate.js — run per ingestion window (e.g. hourly) per metric+segment
import TDigest from "tdigest";

// 1. Build a digest from this window's raw beacon values.
export function buildDigest(values) {
  const td = new TDigest();
  for (const v of values) td.push(v);
  td.compress();
  return td.toArray();          // serialized centroids, ~1-5 KB
}

// 2. Answer a percentile query by merging the digests in the window.
export function percentile(digestRows, p) {
  const merged = new TDigest();
  for (const row of digestRows) merged.push_centroid(row); // mergeable rollup
  merged.compress();
  return Math.round(merged.percentile(p)); // p in [0,1], e.g. 0.75 or 0.99
}
-- rollup query: read the digests covering the gate window, not raw events.
-- (Engines like ClickHouse expose quantileTDigest natively; this is the shape.)
SELECT metric,
       device_class,
       route,
       quantileTDigestMerge(0.75)(digest) AS p75,
       quantileTDigestMerge(0.99)(digest) AS p99
FROM rum_digests
WHERE ts >= now() - INTERVAL 28 DAY
GROUP BY metric, device_class, route;

Window by ingestion time and roll up on read so a single slow hour does not permanently skew the gate, and so you can answer 1-day, 7-day, and 28-day questions from the same stored digests. Retain raw beacons for a short window (for example, 7 days) for the spot-check above, then keep only the digests long-term — they are orders of magnitude smaller.

CI Gating Assertion

The gate reads the rolled-up P75 and P99 straight from the pipeline and fails the build when either breaches budget — P75 as the hard gate, P99 as a tail tripwire surfaced in the log.

# .github/workflows/percentile-gate.yml
name: P75/P99 Budget Gate
on:
  schedule:
    - cron: "0 6 * * *"
  workflow_dispatch:
jobs:
  percentile-gate:
    runs-on: ubuntu-latest
    timeout-minutes: 10
    steps:
      - uses: actions/checkout@v4
      - name: Assert P75/P99 from aggregation pipeline
        env:
          RUM_QUERY_URL: ${{ secrets.RUM_QUERY_URL }}
        run: |
          curl -fsSL "$RUM_QUERY_URL?window=28d&pct=75,99&metrics=LCP,INP,CLS" -o pct.json
          node -e '
            const d = require("./pct.json");
            const p75Budget = { LCP: 2500, INP: 200, CLS: 100 };
            const p99Budget = { LCP: 4500, INP: 500, CLS: 250 };
            let failed = false;
            for (const m of Object.keys(p75Budget)) {
              const okP75 = d[m].p75 <= p75Budget[m];
              const okP99 = d[m].p99 <= p99Budget[m];
              console.log(`[PctGate] ${m} P75=${d[m].p75}/${p75Budget[m]} ${okP75?"PASS":"FAIL"} | P99=${d[m].p99}/${p99Budget[m]} ${okP99?"OK":"WATCH"}`);
              if (!okP75) failed = true; // P75 gates; P99 only warns
            }
            process.exit(failed ? 1 : 0);
          '

Verification

The passing signal is every metric under its P75 budget, with P99 shown for tail awareness:

[PctGate] LCP P75=2310/2500 PASS | P99=4280/4500 OK
[PctGate] INP P75=180/200 PASS | P99=470/500 OK
[PctGate] CLS P75=80/100 PASS | P99=240/250 WATCH

A WATCH on P99 with a passing P75 is the early signal of a tail regression — investigate before it pulls the median up and trips the gate. If the pipeline's P75 disagrees with the brute-force sort from the diagnostic step by more than a few percent, the digest compression or merge is wrong; rebuild from raw before trusting the gate. Calibrate the budgets themselves against Percentile-Based Threshold Tuning.

Frequently Asked Questions

Why t-digest instead of just storing every value and sorting?

Exact sorting is correct but does not scale and cannot be rolled up: a 28-day P75 would require re-reading every raw beacon for the month. A t-digest is a small mergeable sketch — a few KB per window — that stays accurate at the tail where your P99 lives, and merges so daily digests combine into any longer window without touching raw events. Keep raw data only briefly to spot-check the sketch, as shown in Custom Performance Beacons & RUM.

How long a window should the gate read?

A 28-day rolling window matches the Core Web Vitals assessment period and smooths day-of-week traffic shifts, so it is the right default for the gate. Keep shorter windows — 1-day and 7-day — available from the same digests for faster regression detection, but block merges on the 28-day reading so a single noisy day does not fail the build. Tune the thresholds per Percentile-Based Threshold Tuning.