Speech to Text for Ubuntu

Speech to Text for Ubuntu: Your Voice, Your AI, Your Code

If you’ve recently joined the growing number of developers embracing Cursor – the AI-first code editor – you’ve likely experienced a paradigm shift in your workflow. The power of AI at your fingertips, generating and refining code, is truly transformative. However, if you’re working on Ubuntu, you might have quickly discovered a missing piece: the ability to seamlessly use speech to text for your AI interactions.

You have this incredible power, and your natural instinct is to speak to it. After all, when engaging with advanced AI models like those found in Cursor, ChatGPT, Gemini, and other platforms, prompting is akin to a new programming language. Unlike the rigid syntax of traditional coding, this “language” thrives on natural English. And let’s face it: it’s often significantly easier to speak English than to type it, especially when crafting complex and nuanced prompts for an AI.

This is precisely the gap our new solution addresses. While voice input is a game-changer for speed and accessibility, a robust and integrated speech to text for Ubuntu that works universally across your desktop has been a persistent challenge. Until now.

Introducing Our Custom Speech-to-Text Solution for Ubuntu

At CDNsun, we recognized this critical need within the Linux and AI communities. We’ve developed and open-sourced a dedicated Speech-to-Text for Ubuntu project, designed specifically to bring the power of voice dictation to your favorite applications, including Cursor, on your Ubuntu system (tested on Ubuntu 24.04.2 LTS).

Seamless Voice Prompting for All Your AI Interactions

Our solution delivers effortless dictation through a simple yet powerful mechanism: a customizable hotkey initiates audio recording, and upon release, your spoken words are instantly transcribed and automatically typed into your active window. This means you can:

Prompt Cursor with Your Voice: Unleash the full potential of Cursor by speaking your code generation requests, refactoring instructions, or codebase queries. Maintain your flow and avoid the friction of typing.
Converse Naturally with AI: Dictate questions and commands directly into web-based AI interfaces like ChatGPT, Gemini, or any other LLM. Enjoy more fluid and intuitive conversations, unconstrained by typing speed.
Boost General Productivity on Ubuntu: Beyond AI applications, leverage this powerful speech to text tool for dictating emails, drafting documents, or inputting text into any application across your Ubuntu desktop.

How Our Speech-to-Text for Ubuntu Solution Works

At its core, our speech to text for Ubuntu system relies on two intelligent Python scripts:

key_listener.py: This script serves as your voice activation trigger. It monitors a configurable hotkey (we strongly recommend remapping an unused mouse button, like F16, for an ergonomic and seamless experience). Pressing and holding this key starts audio recording using arecord. Releasing the key stops the recording and automatically calls speech_to_text.py for processing.
speech_to_text.py: This is where the transcription magic happens. It takes the recorded audio, performs necessary adjustments (like converting to mono), and feeds it into the highly efficient Faster Whisper speech-to-text model for accurate transcription. Finally, using pyautogui, it simulates keyboard input, typing your recognized speech directly into your currently active application.

Get Started: Elevate Your Prompting and Productivity

Setting up our Speech-to-Text for Ubuntu solution is straightforward. If you’re an Ubuntu user looking to truly unlock the potential of AI tools like Cursor with the power of your voice, this project is for you.

1. Clone the Repository:
git clone https://github.com/CDNsun/speech-to-text-for-ubuntu
cd speech-to-text-for-ubuntu

2. Set Up Python Virtual Environment:
python3 -m venv venv
source venv/bin/activate

3. Install Dependencies:
pip install -r requirements.txt

4. Install System Utilities: Ensure arecord and evdev are available:
sudo apt install alsa-utils python3-evdev

5. Remap a Hotkey: Use a tool like input-remapper to assign an unused key (e.g., F16) to a convenient mouse button or keyboard key.

Input Remapper

6. Start the Key Listener:
sudo python3 key_listener.py

For continuous, hands-free operation, configure this script to launch automatically on system startup using crontab for user root similar to the following.


6. Start the Key Listener:
* * * * * ps -ef | grep "/home/david/Cursor/speech-to-text/key_listener.py" | grep -v grep > /dev/null || /usr/bin/python3 /home/david/Cursor/speech-to-text/key_listener.py >> /tmp/key_listener.log 2>&1 &

7. Speak Your Code, Speak Your Prompts!

With key_listener.py running, simply press and hold your configured hotkey to start speaking. Release the key, and watch your words instantly appear in Cursor, ChatGPT, Gemini, or any other AI interface, transforming your Ubuntu workflow.

This project empowers you to interact with AI tools more naturally and efficiently than ever before. Stop typing, start speaking, and unlock a new dimension of productivity on your Ubuntu system with our dedicated speech to text solution.

Project Link: CDNsun/speech-to-text-for-ubuntu on GitHub