1.0 Performance Gate
The Merlin 1.0 release gate turns benchmark and runtime expectations
into one repeatable report. It reads raw merlin bench JSON, combines it
with release dry-run timing samples, writes a Markdown report, and exits
non-zero when a metric fails without a release-manager waiver.
Run The Gate
fledge lanes run performance-gate
For release candidates, pass the provider set and timing samples that must be represented in the report:
MERLIN_REQUIRED_PROVIDERS=openai,ollama \
MERLIN_VERIFY_DURATIONS_MS=87306,84210,89902 \
MERLIN_STREAM_TTFT_MS=950,1210,1380 \
fledge lanes run performance-gate
Set MERLIN_PERFORMANCE_REPORT when you want the release report written
to a specific artifact path for tagging or release-note links.
SLOs
| SLO | Target |
|---|---|
| Provider p95 latency | <= 8000 ms on basic and reasoning for every evaluated provider |
| Adversarial bundle cost | <= $0.10 across engineering, roleplaying, and refusal |
| Tool-call budget | <= 1.5x expected tool calls for every evaluated provider |
| Verify lane median | <= 90000 ms |
| Streaming TTFT p50 | <= 1500 ms |
Values close to the limit are reported as WARN. Missing data or a
target miss is FAIL.
Provider Scope
Set MERLIN_REQUIRED_PROVIDERS for every provider that must be included
in the release report. When this is set, latency, cost, and tool-budget
SLOs evaluate only that release provider set. Without it, the gate uses
every provider present in the raw benchmark archive.
Latency and tool-call checks are per-provider: every evaluated provider
needs both basic and reasoning runs plus expected_tool_calls
metadata. A strong provider cannot hide missing or over-budget data from
another provider.
The gate never reads API keys and never contacts providers. Missing keys, quota failures, unavailable providers, and skipped runs show up as missing raw benchmark data.
Pricing Data
The adversarial cost gate requires all 16 tests from engineering,
roleplaying, and refusal. Zero-cost rows are treated as missing
pricing data because Merlin records 0.0 when a provider has no
configured per-token pricing.
Providers without configured pricing need pricing added to fledge.toml
or an explicit release-manager waiver.
Waivers
Any FAIL exits non-zero. To intentionally continue, set
MERLIN_PERFORMANCE_WAIVER to a non-empty file that contains
Release manager:.
Release manager: Leif
Date: 2026-05-31
Scope: Streaming TTFT p50
Reason: TTFT instrumentation is tracked separately and is not persisted
in bench JSON yet.
Follow-up: Add persisted TTFT samples before public 1.0.