Dashboarding & Team Adoption

A performance budget that no one can see is a budget that no one defends. The gate may be green in CI, but the moment a regression slips through a tolerance window, or a vendor tag doubles in size, the only people who notice are the ones already running synthetic audits by hand. Dashboarding closes that gap: it turns the raw stream of lab and field measurements into a trend a frontend lead, an engineering manager, and an on-call engineer can each read at their own altitude. Adoption is the harder half — a dashboard nobody opens and an alert everyone mutes are worse than nothing, because they manufacture the illusion of coverage. This reference treats both as one engineering contract: the data pipeline that feeds the charts, the charts themselves, the alerts that fire on regression, and the team rituals that make the whole apparatus load-bearing.

The work divides into three coupled responsibilities. First, visualization: getting CI and real-user data into a store and onto panels that answer a specific question for a specific audience. Second, infrastructure: standing up the server that retains every run and the dashboards that query it. Third, adoption: ownership, review rituals, and a blameless triage process that keeps the gate trusted rather than resented. Each child reference below owns one of these; this page is the architecture that connects them.

Architecture Overview

Two data sources feed every performance dashboard. Synthetic runs from CI produce deterministic lab metrics on a fixed cadence, and Real User Monitoring (RUM) beacons produce field metrics at whatever percentile your traffic generates. Both land in durable storage, both are queried by a visualization layer, and both can trigger alerts that route into the rituals a team actually runs. The diagram traces that flow end to end.

Synthetic and field data converge in storage, surface as dashboards, and escalate as alerts into the rituals — with a named owner — that keep the budget defended.

The visualization tier is built in Visualizing Budget Trends with Grafana, the storage and server tier in Self-Hosting the Lighthouse CI Server, and the human tier — ownership and rituals — in Driving Team Performance Budget Adoption. The RUM half of the data flow originates from Custom Performance Beacons & RUM, and the synthetic half from the collection settings documented in Lighthouse CI Configuration & Storage.

Metric Selection & Visualization Matrix

Different audiences read different charts. An engineering manager wants a single trend line they can put in a quarterly review; an on-call engineer wants a per-route breakdown that pinpoints the regression; a frontend lead wants the lab-versus-field delta that tells them whether the budget is even calibrated to reality. Charting every metric for every audience produces dashboards no one reads. Pick the metric, the percentile, and the visualization to the question being asked.

Audience	Question answered	Metric & percentile	Visualization	Cadence
Engineering manager	Are we trending toward or away from budget?	Performance score, P75 LCP	Single sparkline + budget line	Weekly rollup
Frontend lead	Is the lab gate calibrated to field reality?	Lab vs field LCP/INP delta	Dual-line overlay	Per release
On-call engineer	Which route or build regressed?	Per-route P75 LCP, INP, CLS	Heatmap by route	Per build
QA / release	Did this PR breach any budget?	Resource bytes vs ceiling	Stacked bar vs limit	Per pull request
Whole team	How much budget headroom is left?	Headroom % to error threshold	Bullet gauge	Sprint review

The discipline is to keep field percentiles honest. Always label a chart with both the percentile and the environment — a P75 LCP on a mid-range mobile device under a Fast 3G profile is a different number from a P75 LCP on desktop cable, and a dashboard that conflates them quietly hides the regression that matters. The aggregation that produces these percentiles is built in the RUM reference's pipeline work; the dashboard only displays what that pipeline computes.

Tooling & Implementation

Three tools cover the common stack. Grafana renders panels over a time-series store and is the right default when you already run Prometheus, InfluxDB, or a Postgres source. The LHCI server ships its own trend UI for free and is the lowest-friction starting point. A commercial RUM product such as DataDog covers field data without you operating an ingestion pipeline. They compose: LHCI for the lab gate's history, Grafana for the unified board, DataDog (or equivalent) for field percentiles when self-hosting RUM is not worth the cost.

A Grafana panel is just JSON; you can version it next to your application and provision it on boot. The panel below charts P75 LCP per route against a 2500 ms budget line, the canonical "is the gate calibrated" view a frontend lead reviews each release.

{
  "title": "P75 LCP by route vs budget",
  "type": "timeseries",
  "datasource": { "type": "prometheus", "uid": "perf-tsdb" },
  "fieldConfig": {
    "defaults": {
      "unit": "ms",
      "thresholds": {
        "mode": "absolute",
        "steps": [
          { "color": "green", "value": null },
          { "color": "red", "value": 2500 }
        ]
      }
    }
  },
  "targets": [
    {
      "refId": "A",
      "expr": "histogram_quantile(0.75, sum by (route, le) (rate(lcp_milliseconds_bucket[1h])))",
      "legendFormat": "{{route}}"
    }
  ]
}

The query reads an LCP histogram exported by your RUM ingestion and computes the 75th percentile per route over a one-hour rolling window. The 2500 ms threshold step turns the panel red the moment a route's field P75 crosses the budget — the same number the lab gate asserts, so the dashboard and the gate never disagree on what "over budget" means. Building the full board, including INP and CLS panels, is covered in the Grafana reference.

CI/CD Integration

A dashboard is only as current as its last write. Every CI run that produces metrics should publish them to the store so the trend line never has a hole where a build should be. With the LHCI server as the backend, lhci upload does this automatically; the job below runs the assertion gate and then pushes results to the dashboard's storage on every push to the main branch.

name: Publish performance to dashboard
on:
  push:
    branches: [main]

jobs:
  collect-and-publish:
    runs-on: ubuntu-latest
    timeout-minutes: 15
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: "20"
          cache: "npm"
      - run: npm ci
      - run: npm run build
      - name: Collect and assert
        run: npx lhci autorun
        env:
          LHCI_TOKEN: ${{ secrets.LHCI_TOKEN }}
          LHCI_SERVER_BASE_URL: ${{ secrets.LHCI_SERVER_BASE_URL }}
      - name: Annotate Grafana with deploy marker
        if: success()
        run: |
          curl -s -X POST "$GRAFANA_URL/api/annotations" \
            -H "Authorization: Bearer $GRAFANA_TOKEN" \
            -H "Content-Type: application/json" \
            -d "{\"text\":\"deploy ${GITHUB_SHA:0:7}\",\"tags\":[\"deploy\"]}"
        env:
          GRAFANA_URL: ${{ secrets.GRAFANA_URL }}
          GRAFANA_TOKEN: ${{ secrets.GRAFANA_TOKEN }}

The annotation step is what turns a trend line into a forensic tool: every deploy leaves a vertical marker on the chart, so when a regression appears the team can read off the exact commit that introduced it instead of bisecting by hand. Wire the same lhci autorun invocation that gates pull requests into this publish job so the lab numbers on the dashboard are the identical numbers the gate enforced — calibrated per the methodology in Lighthouse CI Configuration & Storage.

Observability & Regression Detection

A dashboard shows trends; alerting catches the regression while it is still cheap to revert. The signal that matters is not a single bad build — synthetic runs carry noise — but a sustained shift in the percentile. Configure alerts to fire only when the field or lab metric exceeds its budget for a window long enough to clear the noise floor, typically three consecutive evaluations or a one-hour persistence. A point-in-time alert on a noisy metric is an alert that will be muted within a week.

Tie alert thresholds to the same ceilings the gate asserts so a fired alert always corresponds to a real budget breach. When field data is the trigger, route the alert with the offending route, the percentile, and the device class in the payload so the on-call engineer can act without first opening the dashboard. The statistical machinery for separating a real regression from variance lives in Automated Regression Detection, and the alert-routing specifics for budget breaches are built in the Grafana reference's alerting section.

Failure Modes & Escalation Paths

Dashboards and alerts fail in predictable, human ways. Knowing the failure mode tells you the escalation.

Alert fatigue. A flaky metric fires nightly, the team adds a mute, and the mute outlives the flakiness. Escalation: fix the variance at the source — see the staging-variance and significance-testing references — then re-enable at a persistence window, never lower the threshold to stop the noise.
Dashboards no one reads. A board with forty panels answers no one's question, so it is opened during onboarding and never again. Escalation: delete panels until each one maps to a row in the audience matrix above; a board read weekly beats a board built once.
Budget erosion. Each PR widens a tolerance "just this once" and six months later the gate asserts nothing real. Escalation: require a budget change to go through the exception workflow with a named owner and an expiry date, enforced by the adoption process.
Lab/field divergence. The gate is green but RUM shows a P75 regression because the lab profile no longer matches real devices. Escalation: recalibrate the lab throttling against field percentiles; the dual-line overlay panel exists precisely to surface this gap before it becomes a customer complaint.
Orphaned ownership. The engineer who built the dashboard leaves and no codeowner inherits it. Escalation: assign the board and its alerts to a team in the policy, never to an individual, so a departure does not silently retire the coverage.

Every one of these is ultimately an adoption failure, not a tooling failure, which is why the human tier of this reference is as load-bearing as the data tier.

Frequently Asked Questions

Should I build dashboards on the LHCI server or on Grafana?

Start with the LHCI server's built-in trend UI — it ships free and answers the "did the lab gate regress" question immediately. Move to Grafana once you need to overlay field RUM data, chart per-route heatmaps, or put performance on a shared operational board next to other service metrics. Many teams run both: LHCI for lab history, Grafana for the unified view.

Why do my performance alerts get muted within a week?

Almost always because they fire on a single noisy sample instead of a sustained shift. Require three consecutive breaching evaluations or a one-hour persistence window before an alert fires, and fix the underlying variance rather than raising the threshold. An alert that only fires on real regressions stays trusted; one that cries wolf gets muted and then nobody notices the real breach.

How do I keep a dashboard from becoming write-only?

Tie every panel to a question a named audience actually asks, assign the board to a team via codeowners, and review it in an existing ritual such as sprint review rather than expecting people to visit it spontaneously. The adoption process in Driving Team Performance Budget Adoption covers making the dashboard load-bearing.

Dashboarding & Team Adoption #

Architecture Overview #

Metric Selection & Visualization Matrix #

Tooling & Implementation #

CI/CD Integration #

Observability & Regression Detection #

Failure Modes & Escalation Paths #

Frequently Asked Questions #

Related Pages #