Welcome to VoxAction: Voice to Text for Windows

VoxAction is a Windows voice-to-text app that lets you write by speaking in the applications you already use. Press a hotkey, speak naturally, and VoxAction inserts the resulting text directly at the caret in your active app.

VoxAction is designed for everyday Windows workflows: email, documents, browser forms, chats, support tickets, CRM notes, IDEs, note-taking apps, and many legacy Windows tools. If an app accepts normal typing and paste input, VoxAction can usually insert dictated text there.

What VoxAction Does

VoxAction focuses on three voice workflows:

  1. Dictation: speak and insert speech-to-text output into the active text field.
  2. Voice translation: speak in one language and insert translated text in another language.
  3. AI assisted editing: select existing text, speak an instruction, and replace it with rewritten text.

Dictation works with either Nurgo AI or a Whisper-compatible API. Voice translation and AI assisted editing require a Nurgo AI subscription.

Why Use VoxAction

Many transcription tools make you dictate into a separate editor, then copy and paste the result. VoxAction works as a universal voice input layer for Windows: it records from a global hotkey and pastes the final text back into the current app.

VoxAction can also use nearby text around your cursor as context. This helps the AI service keep names, acronyms, style, language, and formatting consistent with what you were already writing. For important names, product terms, acronyms, SKUs, or technical identifiers, you can add entries to the personal dictionary.

Privacy and Control

VoxAction records from a hotkey. It is not designed as an always-listening assistant. Recording starts when you press a configured hotkey and stops when you press it again or release it in push-to-talk mode.

You choose the AI service:

  • Nurgo AI: the easiest setup, with managed dictation, one-step voice translation, AI assisted editing, account management, and monthly AI credits.
  • Whisper-compatible API: a bring-your-own endpoint mode for basic dictation with your own API key, provider, or self-hosted speech-to-text server. Provider costs and data policies depend on the endpoint you configure.