Skip to content

How it works

Specs in, providers chosen,
code verified.

Merlin reads your specs, picks a provider, runs fledge plugins for tools, and verifies the output. Eight pieces, one agent loop. Your machine, your keys, your code.

Spec-driven development

Specs go in, correct code comes out.

Merlin reads your module specs before writing a line of code. Invariants, public API, and error cases become hard constraints in the system prompt: the same *.spec.md contracts spec-sync enforces in CI.

spec-aware planning

$ cat specs/api/auth.spec.md
# Authentication
- JWT tokens, 24h expiry
- Refresh token rotation
- bcrypt password hashing (cost=12)
$ merlin "Implement auth per spec"
✓ Implementation matches all spec requirements

Multi-provider

31 providers, one interface.

Anthropic, OpenAI (11 SKUs incl. gpt-5 / o1 / o3 / 4o), OpenRouter (×5 vendors, one key), Groq, Together, and 11 Ollama Cloud models. Swap providers with a flag; your code and keys stay on your machine.

provider switching

# Switch providers with one flag
$ merlin --provider claude "Refactor auth"
Using claude-sonnet-4-6 via Anthropic
$ merlin --provider ollama "Refactor auth"
Using qwen3-coder:480b via Ollama Cloud

Plugin architecture

Every tool is a plugin you can swap.

Bundled plugins cover filesystem, code search, shell, git, spec-sync, snapshots, runtime checks, media, in-loop sub-agents, and the Discord + Telegram bridges. Write your own in any language; it's just a binary that speaks JSON-lines over the fledge-v1 protocol.

fledge.toml

[merlin.tools]
files = "plugins/fledge-plugin-files"
search = "plugins/fledge-plugin-search"
git = "plugins/fledge-plugin-git"
specsync = "plugins/fledge-plugin-specsync"
vision = "plugins/fledge-plugin-vision"
# Add your own. It's just a binary

Sub-agents

Delegate work without filling the parent's context.

subagent-spawn hands a self-contained subtask to a child Merlin process. The child runs its own full loop and returns a compact JSON envelope; the parent's working memory stays small no matter how wide it fans out. Default tier is tool, recursion is capped at depth 2.

subagent-spawn

# Sub-agents keep the parent's context small
$ subagent-spawn { label: "summarize-files", ... }
⚙ subagent-spawn [7.0s] ✓
{ ok: true, depth: 1, tier: "tool",
tool_calls: 1, input_tokens: 8312, output_tokens: 42 }
# Parent saw ~250 tokens, not the whole file

Media plugins

Agents that can see and hear.

The vision plugin sends images to a local Ollama model and returns descriptions. The voice plugin transcribes audio with Whisper and synthesizes replies. The same agent loop, with new senses; bridges save attachments where these plugins can find them.

vision + voice

# Agents that can see and hear
$ vision-describe "/tmp/screenshot.png"
A web app login form with email + password
fields and a blue Sign In button.
$ voice-transcribe "/tmp/voice-note.ogg"
# Same agent loop, new senses

Bridges

Run Merlin from Discord and Telegram.

First-class bridges so your team can @mention Merlin or run slash commands from any channel. Reply chains become threaded sessions, live progress shows the active tool, and each channel keeps its own session context. Image + voice attachments route through the media plugins automatically.

bridges/discord

# Talk to Merlin from your server
@Merlin refactor the auth middleware
Merlin (openrouter | claude-sonnet-4-6)
Thinking… 12s read_file 4,210in / 821out
/session new · /plugins · /status

Fledge protocol

Open protocol. You can read every message.

Merlin is built on fledge-v1, a JSON-lines protocol for agent-tool communication. Every tool call, every response, fully inspectable. Stream the same NDJSON over stdout with --output ndjson for scripting.

protocol trace

# fledge-v1: JSON-lines protocol
{"type":"tool_call","name":"read","args":{...}}
{"type":"tool_result","content":"fn main() {…}"}
{"type":"text","content":"I see the entry point…"}

Verification

verify pass is the rollback anchor.

Specs are the contract; fledge lanes run verify is the success oracle. When a verify loop exhausts its retries, Merlin rolls files back to the last green tree state. Nobody else treats a passing verify as the rollback anchor.

verify loop

$ merlin "Add the rate limiter"
→ fledge lanes run verify
✗ test rate_limiter::burst (retry 1/3)
→ fledge lanes run verify
✓ verify passed, tree committed as green

Transparency

We publish our benchmarks.

26 test suites, 168 tests, including tool-augmented modes, updated with every release. The live data and per-provider breakdown stay on Merlin's own site.