RUM Sampling Strategies for High-Traffic Sites

At a million sessions a day, beaconing every page view is wasteful and expensive — but sampling carelessly biases your percentiles and starves the tail you most need to watch. The goal is to send the fewest beacons that still let you assert P75 and P99 with confidence. This guide is part of the Custom Performance Beacons & RUM reference and covers head-based versus tail-based sampling, the sample-rate-versus-confidence tradeoff, deterministic sticky sampling, and cost.

The core decision is when you decide to keep a session. Head-based sampling decides at page load, before any metric exists, and is cheap and unbiased. Tail-based sampling decides after metrics are known, letting you over-sample slow sessions — powerful for the P99 but heavier to operate.

Sample Rate vs Confidence

How low you can drop the rate depends on traffic and which percentile you gate on. The deeper the percentile, the more raw sessions you need so enough land in the tail. The table shows the approximate kept-session volume needed for a stable daily reading.

Daily sessions	Sample rate	Kept/day	Stable P75?	Stable P99?
10,000	100%	10,000	Yes	Marginal
100,000	25%	25,000	Yes	Yes
1,000,000	5%	50,000	Yes	Yes
10,000,000	1%	100,000	Yes	Yes

A useful rule: aim for at least ~1,000 kept sessions per segment per window for a trustworthy P75, and ~10,000 for a P99 you can gate on, since only ~1% of kept sessions populate the P99 bucket. Segment here means a device-class and route combination, not the whole site — sampling that looks fine in aggregate can starve a low-traffic route. For the confidence math behind acting on these numbers, see Statistical Significance Testing for Noisy CI.

Diagnostic Steps

Before lowering the rate, confirm you have enough tail volume to keep gating on P99.

Count kept sessions per segment over your gate window to check none are starved:

curl -fsSL "$RUM_QUERY_URL?window=28d&group=device,route&select=count" | \
  jq '.[] | select(.count < 1000)'
# [] means every segment clears the floor; any rows printed are under-sampled

Check how many sessions actually populate the P99 bucket — too few and the percentile is noise:

curl -fsSL "$RUM_QUERY_URL?window=28d&metric=LCP&select=count" | jq '.count * 0.01'
# 503  -> ~503 sessions define the P99; comfortable above the ~100 floor

Implementation

Sampling must be deterministic and sticky: every page view in a session makes the same keep/drop decision, so a session is never half-recorded. Hash the session id to a stable number in [0,1) and compare against the rate — no randomness per page, no server round-trip.

// sampling.js — runs before the beacon library loads
const SAMPLE_RATE = Number(window.__RUM_SAMPLE_RATE ?? 0.05); // 5%

// Stable per-session decision: same session id -> same hash -> same outcome.
async function shouldSample(sessionId) {
  const data = new TextEncoder().encode(sessionId);
  const digest = await crypto.subtle.digest("SHA-256", data);
  // Use the first 4 bytes as an unsigned int, normalize to [0,1).
  const n = new DataView(digest).getUint32(0) / 0xffffffff;
  return n < SAMPLE_RATE;
}

// Persist the session id so the decision is sticky across page views.
let sid = sessionStorage.getItem("rum_sid");
if (!sid) { sid = crypto.randomUUID(); sessionStorage.setItem("rum_sid", sid); }

shouldSample(sid).then((keep) => {
  if (keep) import("./rum-beacon.js"); // only sampled sessions load the lib
});

For tail-based capture, layer a small always-on rule on top: beacon any session whose LCP or INP exceeds a "poor" threshold regardless of the head decision, so slow outliers are never sampled away. Cap that override (for example, the first 5,000 poor sessions per window) to keep cost bounded.

CI Gating Assertion

Sampling that quietly collapses — a deploy that breaks sessionStorage, a rate set to zero by mistake — leaves the gate reading from too few sessions and silently passing. This job asserts that kept volume is sufficient before any percentile gate runs, so an under-sampled window fails loudly instead of giving a false green.

# .github/workflows/sample-volume-gate.yml
name: RUM Sample Volume Gate
on:
  schedule:
    - cron: "0 6 * * *"
  workflow_dispatch:
jobs:
  sample-volume:
    runs-on: ubuntu-latest
    timeout-minutes: 10
    steps:
      - uses: actions/checkout@v4
      - name: Assert sample volume sufficiency
        env:
          RUM_QUERY_URL: ${{ secrets.RUM_QUERY_URL }}
        run: |
          curl -fsSL "$RUM_QUERY_URL?window=28d&group=device,route&select=count" -o vol.json
          node -e '
            const rows = require("./vol.json");
            const P75_FLOOR = 1000, P99_FLOOR = 10000;
            let failed = false;
            for (const r of rows) {
              const okP75 = r.count >= P75_FLOOR;
              const okP99 = r.count >= P99_FLOOR;
              console.log(`[SampleGate] ${r.device}/${r.route} kept=${r.count} P75:${okP75?"OK":"LOW"} P99:${okP99?"OK":"LOW"}`);
              if (!okP75) failed = true;
            }
            process.exit(failed ? 1 : 0);
          '

Verification

The passing signal is every segment clearing the P75 floor, with P99-eligible segments also above their floor:

[SampleGate] mobile/checkout kept=4120 P75:OK P99:LOW
[SampleGate] desktop/home kept=18800 P75:OK P99:OK

A P99:LOW segment means you may gate that route on P75 but should not block on its P99 yet — either raise the sample rate for that route or widen the window. If a segment shows P75:LOW, the build fails; confirm the sticky session id is persisting and the rate is not accidentally zero.

Frequently Asked Questions

Head-based or tail-based sampling for budget gating?

Use head-based sampling as the foundation: it is cheap, decides at page load before metrics exist, and gives an unbiased distribution you can compute P75 from directly. Add a narrow tail-based override that always keeps "poor" outliers so your P99 is not sampled away. Pure tail-based sampling biases the median and is harder to operate, so reserve it for capturing slow-session detail, not for the primary gate.

Won't a 5% sample make my percentiles unreliable?

Not at high traffic. What matters is the absolute count of kept sessions per segment, not the percentage — 5% of a million sessions is 50,000, far more than enough for a stable P75 and a usable P99. The risk is low-traffic segments, where 5% may starve the tail; gate those on P75 only, or raise their rate. The statistical reasoning is detailed in Statistical Significance Testing for Noisy CI.

RUM Sampling Strategies for High-Traffic Sites #

Sample Rate vs Confidence #

Diagnostic Steps #

Implementation #

CI Gating Assertion #

Verification #

Frequently Asked Questions #

Related Pages #