Local-first voice dictation
Talk to your computer.
Words appear.
Quill captures your mic, transcribes with Whisper on-device, polishes
with embedded llama.cpp when you ask for it, and pastes into whatever
field you have focused.
F8 is quick raw dictation.
F9 is enhanced. Everything
runs on your machine.
Universal arm64 + x86_64 macOS .dmg, signed and notarized. Linux x86_64 .AppImage and .deb also available. Closed-alpha, proprietary.
Local-first means local-first.
No transcript leaves the machine by default. The polish pass runs against a SHA256-pinned model the app manages itself. Any remote provider is opt-in per-config and surfaces a startup warning. The daemon never logs raw transcripts at the default tracing level. That is the contract.
Record, transcribe, polish, paste
Four stages, all on-device.
Each stage is a separate Rust crate, so raw and enhanced dictation stay predictable. Hold the hotkey, talk, release. The daemon does the rest without a single byte leaving your machine on the default path.
Daemon opens the mic via cpal. webrtc-vad trims silence and auto-stops.
On-device transcription with whisper-rs. Metal on Apple Silicon, CPU elsewhere.
F9 only: embedded llama.cpp against the verified GGUF, with a raw-text fallback.
Clipboard paste, or the macOS Accessibility API, into the focused field.
First dictation, from a terminal
$ quill init
✓ model ready: ggml-base.en.bin
# Confirm your mic is visible
$ quill-daemon devices
# Hold-to-talk on F8, release to paste
$ quill-daemon listen --key f8
Drive it live over IPC
$ quill ping
pong
# Live-switch a hotkey's paste strategy, no restart
$ quill set-inject-mode enhanced clipboard
# Wayland: let the compositor own the keybind
$ quill press quick
$ quill release quick
Seven polish styles
Pick how the polish pass rewrites you.
On the F9 enhanced path,
the embedded model rewrites the raw transcript through one of seven
prompt templates. casual
is the default. no-polish
is a passthrough: Whisper text, untouched. Set it in Settings or via the
polish_template field.
casual
defaultLight cleanup. Conversational tone, contractions kept.
Slack, quick notes, PR comments
formal
Proper grammar, no contractions, business voice.
Email bodies, reports, customer replies
technical
Preserve technical terms, code, and command names verbatim.
Code review, bug reports, eng chat
bullets
Restructure continuous speech into a bulleted list.
Meeting notes, action items, stand-ups
concise
Shorter, fewer words, filler and hedging removed.
Commit messages, status updates
Format as an email body. Greeting, body, and sign-off when context fits.
Dictating email replies
no-polish
Pass the raw Whisper transcript through unchanged.
Fastest path, no LLM pass at all
Key concepts
The vocabulary; the daemon composes everything else from these.
- F8 (quick)
- The fast path, on by default: capture, transcribe, paste. No polish pass, lowest latency.
- F9 (enhanced)
- Adds the local polish pass before paste. Recommended but off until you set a binding for it.
- polish style
- One of seven prompt templates:
casual,formal,technical,bullets,concise,email,no-polish. - inject mode
- Per-hotkey:
clipboard,clipboard-only, orkeystroke. Live-switch withquill set-inject-mode. - polish backend
embeddedby default (bundled llama.cpp).remoteis the opt-in escape hatch and warns on non-loopback hosts.- daemon
- The long-running process that owns the global hotkey via
rdevand drives the whole pipeline over IPC.
The privacy contract
Dictate without trusting a cloud.
Quill is built so the private path is the default path. The embedded polish model is pinned and verified; remote is the loud exception, not the rule. Here is exactly what that buys you.
- Audio never leaves the machine. Whisper runs on-device. There is no transcript network round-trip on the default path.
- Telemetry and crash reports OFF. Both are opt-in and default OFF. Report-a-problem attaches only redacted log tails.
- No raw transcript logging. The daemon refuses to log raw transcripts at the default tracing level. That is enforced, not advised.
- SHA256-pinned polish model. The embedded GGUF (Qwen3 4B Q4_K_M) is verified against a pinned SHA256 before first use.
- Remote is explicit and loud. Pointing polish at a remote endpoint is per-config opt-in and fires a startup warning for non-loopback hosts.
- Proprietary, invite-only alpha. Closed-alpha binaries are gated through Discord during dogfooding, not anonymous download.
The pinned polish model
model = "Qwen3 4B Q4_K_M"
repo = "Qwen/Qwen3-4B-GGUF"
file = "Qwen3-4B-Q4_K_M.gguf"
sha256 = 7485fe6f11af...
# Verified before first use. No Ollama server.
backend = "embedded"
Polish backends: embedded (default),
system,
remote. Only
remote leaves the
machine, and only after you opt in.
What it looks like
Paper-like sheets instead of glowing panels. The setup view runs once. The live view is where you spend your time. Settings is one keystroke away.
What you get
Local-first by default
Audio never leaves your machine. Whisper runs on-device; enhanced dictation uses embedded llama.cpp against a verified GGUF in Quill's model cache. No API keys, no transcript network round-trip. Crash reports and usage telemetry both default OFF. The daemon refuses to log raw transcripts at the default tracing level.
Whisper with Metal acceleration
Speech-to-text via whisper-rs (whisper.cpp under the hood). Metal on Apple Silicon, CPU fallback everywhere else. Curated picker covers base.en, base, small.en, small, medium.en, medium. Quill manages the downloads into ~/.cache/quill/models/.
Enhanced polish without Ollama
After Whisper transcribes, a local Qwen3 4B Q4_K_M GGUF cleans up filler words, fixes punctuation, and disambiguates homophones through embedded llama.cpp. Pick one of seven styles. No Ollama install, no local HTTP server. A custom Ollama-compatible endpoint stays an explicit opt-in escape hatch.
F8 raw, F9 enhanced
F8 is the fast path: transcribe and paste. F9 adds the local polish pass before paste, and is recommended but off until you bind it. Both use the same hold-talk-release loop, and both leave the final text on the clipboard if automatic paste fails.
Pastes into any focused field
Quill uses the clipboard paste path by default and can use macOS Accessibility for richer focused-field writes (kAXSelectedText, falling back to keystrokes when a field is not AX-writable). Works in your editor, your browser, your terminal, your chat app.
Click-to-capture hotkey picker
Open Settings, click the hotkey field, press the binding you want: bare modifiers, function keys, or full chords like Cmd+Shift+Space, with reserved-combo warnings. The daemon picks changes up live over IPC: no restart, no TOML editing.
First-run model setup
First launch walks through mic and Accessibility permissions, downloads the Whisper model and embedded polish GGUF, then verifies the GGUF before use. If something breaks later, WHAT / WHY / DO error banners with stable IDs explain what happened, why, and what to do.
In-app updater
Background download, signature verify (macOS spctl), install on next launch. A 'What's new' card surfaces release highlights when the version bumps, sourced from a TOML asset baked into the binary, with no network call.
Paper-first app shell
Quill defaults to large Literata reading type, no-glow layered sheets, a live transcript-first recording layout, and AAA-checked Paper, Light, Dark, OLED, Tan, Brown, Blue, Red, Pink, Green, and Grey themes.
Stack
Pure Rust workspace, one crate per pipeline stage, plus a thin
iced GUI for the app shell.
| Layer | Crate | Tool |
|---|---|---|
| Audio capture + VAD | quill-audio | cpal + webrtc-vad |
| Speech-to-text | quill-stt | whisper-rs (Metal on macOS) |
| LLM polish | quill-polish | Embedded llama.cpp + verified GGUF (BYO endpoint optional) |
| Text insertion | quill-inject | arboard clipboard · enigo keystroke · macOS Accessibility |
| Hotkey + pipeline | quill-daemon | rdev + tokio |
| CLI | quill-cli | clap |
| GUI app | quill-app | iced |
Status
v0.1.0-alpha.18 is shipping on macOS and Linux. What works today, what we are hardening, what is still planned. Listed honestly.
- Universal arm64 + x86_64 macOS .dmg, signed, notarized, stapled
- Linux x86_64 .AppImage and .deb packages
- Whisper STT with Metal on macOS; curated model picker
- Embedded llama.cpp polish with verified Qwen3 4B Q4_K_M GGUF
- F8 quick raw dictation by default; F9 recommended for enhanced (off until you set it)
- Seven polish styles: casual, formal, technical, bullets, concise, email, no-polish
- Click-to-capture hotkey picker with reserved-combo warnings
- Clipboard paste by default, with macOS Accessibility for richer fields
- 30-second first-run tour, WHAT / WHY / DO error banners
- In-app updater with background download + signature verify
- Crash reporter + usage telemetry: both opt-in, default OFF
- Polish quality tuning for the embedded Qwen path
- Long-recording UI and clearer enhance progress feedback
- App shell lifecycle and icon unification polish
- Windows packaging
- ARM64 Linux build
- Wayland-aware injection on Linux
- Per-app custom polish prompts
Built on the CorvidLabs spine
The same lanes, specs, and agent contracts as every other CorvidLabs project.