shipped 2026 · private utility · runs in a tray
Vox
The internal name is VoiceType; the showcase name is Vox.
- Built for:
- One person. Me, at the keyboard, all day, every day, when typing is slower than speaking.
- Not built for:
- Anyone who needs an account, a cloud upload, or a multi-language translation surface. Vox transcribes English at the cursor, full stop.
Hold right-Ctrl, speak, release. Whatever you said appears at the cursor — in any application, in any text field, with sub-second latency on a stock laptop CPU. No account, no cloud, no listening when you don’t want it to.
The problem
Cloud dictation tools are accurate, fast, and listening. The cost of accuracy and speed is that the audio leaves the machine; the cost of leaving the machine is that someone else gets a copy of every sentence I say at my desk. That’s a bad trade for daily use.
Local dictation has been good enough for a year and a half. Vox is a thin shim around faster-whisper with one trick: a single hotkey that turns on the mic, captures audio while held, transcribes on release, and types the result at the cursor via Win32 SendInput. No window, no menu, no decision to make.
Decisions
kept2026
Push-to-talk over voice activity detection. PTT is unambiguous; VAD picks up the radio, the dog, the air conditioning, my exhale at 11 PM. The wrong thing transcribed is worse than no transcription.
kept2026
A lockfile (
voicetype.active) at the project root that exists only while the mic is hot. Other voice tools on the same machine respect the lockfile and mute themselves; without it, two listeners would fight for the mic and both would lose.cut2026
A floating UI window. The status feedback is a tray icon — grey idle, blue recording, yellow processing, green flash on success. A real window would invite configuration, and configuration is the enemy of a tool I want to forget I’m using.
System
| Layer | Implementation | Purpose |
|---|---|---|
| Hotkey | Global Win32 hook | Right-Ctrl press / release events |
| Capture | PyAudio | 16 kHz mono WAV ring buffer while held |
| Transcribe | faster-whisper | Local Whisper inference, CPU or CUDA |
| Inject | Win32 SendInput | Types into the focused text field |
| Status | Tray icon (Tkinter) | Four colors, no chrome, no decisions |
| Coord | Lockfile | voicetype.active · respected by peer tools |
# One global hotkey: right-Ctrl. Press starts recording and writes
# the lockfile so peer voice tools mute themselves. Release stops,
# transcribes locally, types into whatever has focus.
def on_press(key):
if key != Key.ctrl_r or recorder.active:
return
LOCKFILE.touch() # peer tools see this
set_tray("recording")
recorder.start() # 16 kHz mono ring buf
def on_release(key):
if key != Key.ctrl_r or not recorder.active:
return
audio = recorder.stop()
LOCKFILE.unlink(missing_ok=True)
set_tray("processing")
text = whisper.transcribe(audio, language="en").strip()
if text:
SendInput(text) # type at the cursor
set_tray("done")
else:
set_tray("idle")
{"t":"14:02:18.402","event":"press","key":"right_ctrl","lockfile":"created"}
{"t":"14:02:18.418","event":"capture_start","sr":16000,"channels":1}
{"t":"14:02:21.071","event":"release","key":"right_ctrl","duration_ms":2653}
{"t":"14:02:21.084","event":"transcribe_start","model":"distil-whisper-large-v3","device":"cpu"}
{"t":"14:02:21.812","event":"transcribe_done","ms":728,"chars":62}
{"t":"14:02:21.815","event":"send_input","method":"win32_unicode","chars":62}
{"t":"14:02:21.821","event":"done","total_ms":1419,"text":"the cost of accuracy is that the audio leaves the machine"}

Acknowledgments
Vox stands on faster-whisper, the Whisper weights from OpenAI, and the long line of small-utility-as-tray-icon tools that prove a useful piece of software does not need a UI to be useful.