WhisPad is a transcription and note management tool designed so anyone can turn their voice into text and easily organize their ideas. The application lets you use cloud models (OpenAI) or local whisper.cpp models to work offline.
WARNING Always back up your notes before upgrading to a new version. Never expose the app to the internet without additional security measures.
- Main Features
- Disclaimer
- Quick Setup
- Installing with Docker Desktop
- Installing from the Terminal
- API Key Configuration
- Speaker Diarization Setup
- Usage Guide
- Screenshots
- Contributors
- Real-time voice-to-text transcription from the browser.
- Write and edit markdown notes.
- Integrated note manager: create, search, tag, save, restore and download in Markdown format.
- Automatic text enhancement using AI (OpenAI, Google, OpenRouter or Groq) with streaming responses.
- A blue marker indicating where the transcription will be inserted.
- Compatible with multiple providers: OpenAI, SenseVoice and local whisper.cpp. No model is bundled, but you can download tiny, small, base, medium or large versions from the interface.
- NEW: SenseVoice Integration - Advanced multilingual speech recognition with emotion detection and audio event recognition for 50+ languages.
- NEW: Speaker Diarization Support - Automatically identify different speakers in audio recordings for both Whisper Local and SenseVoice (see Speaker Diarization Guide).
- Download or upload local (.bin) whisper.cpp models directly from the interface.
- Upload audio files which are automatically transcribed and stored alongside your notes.
- Export all notes in a ZIP file with one click.
- Deleting a note automatically removes its related quizzes and flashcards.
- Mobile-friendly interface.
- User login with per-user note folders and admin management tools.
- Admin has access to all providers and can manage which ones are available for each user.
- Reliable save and autosave: headings and ordered lists are now preserved correctly.
This application is currently in testing and is provided as is. I take no responsibility for any data loss that may occur when using it. Make sure you make frequent backups of your data.
If you are not comfortable with the terminal, the easiest method is to use Docker Desktop. You only need to install Docker, download this project and run it.
- Download Docker Desktop from https://www.docker.com/products/docker-desktop/ and install it like any other application.
- Download this repository as a ZIP from GitHub and unzip it in the folder of your choice.
- Open Docker Desktop and select Open in Terminal (or open a terminal in that folder). Type:
docker compose up
- Docker will download the dependencies and show "Starting services...". When everything is ready, open your browser at
https://localhost:5037
. - Sign in with admin and the password from
ADMIN_PASSWORD
the first time to access the app. - To stop the application, press
Ctrl+C
in the terminal or use the Stop button in Docker Desktop.
This option is ideal if you don't want to worry about installing Python or dependencies manually.
- Install Docker Desktop.
- Open a terminal and clone the repository:
(If you prefer, download the ZIP and unzip it.)
git clone https://github.com/tu_usuario/whispad.git cd whispad
- Run the application with:
docker compose up
- Go to
https://localhost:5037
and start using WhisPad. - Log in with admin using the password from
ADMIN_PASSWORD
on first use. - To stop it, press
Ctrl+C
in the terminal or rundocker compose down
. - If you want to use LM Studio or Ollama for local AI text improvement, set the host to
host.docker.internal
in the configuration page so the container can reach your local instance. Use the Update Models button to fetch the list of available models fromhttp://<lmstudio-host>:<port>/v1/models
automatically.
If you prefer not to use Docker, you can also run it directly with Python:
- Make sure you have Python 3.11 or higher and pip installed.
- Clone the repository or download the code and go to the project folder:
git clone https://github.com/tu_usuario/whispad.git cd whispad
- Install the Python dependencies:
pip install -r requirements.txt
- Download a whisper.cpp model with the script or from the Models menu (no model is included by default):
You can also download or upload models directly from the interface.
bash install-whisper-cpp.sh
- Run the server:
python backend.py
- Open
index.html
in your browser or serve the folder withpython -m http.server 5037
and visithttps://localhost:5037
. - Log in with admin using the password from
ADMIN_PASSWORD
the first time you access the app.
Copy env.example
to .env
and add your API keys:
cp env.example .env
Edit the .env
file and fill in the variables OPENAI_API_KEY
, GOOGLE_API_KEY
, DEEPSEEK_API_KEY
, OPENROUTER_API_KEY
and GROQ_API_KEY
for the services you want to use. These keys enable cloud transcription and text enhancement.
If you want to send each saved note to an external workflow (for example, an n8n or Dify instance), also set WORKFLOW_WEBHOOK_URL
and optionally WORKFLOW_WEBHOOK_TOKEN
.
Use WORKFLOW_WEBHOOK_USER
to choose which user's notes are sent. The webhook payload now includes the username so your workflow can fetch the note from the correct folder.
Set the database credentials in your .env
file using POSTGRES_USER
, POSTGRES_PASSWORD
and POSTGRES_DB
.
DATABASE_URL
should point to postgresql://${POSTGRES_USER}:${POSTGRES_PASSWORD}@db:5432/${POSTGRES_DB}
.
Set ADMIN_PASSWORD
to define the initial admin user's password.
WhisPad supports speaker diarization to automatically identify different speakers in audio recordings. This feature uses pyannote-audio and requires a HuggingFace token with access to gated models.
-
Create a HuggingFace Account
- Go to huggingface.co and create a free account
-
Generate an Access Token
- Navigate to HuggingFace Settings > Access Tokens
- Click "New token"
- Choose "Read" permissions (sufficient for model access)
- Copy the generated token
-
Request Access to Gated Models You need to accept the terms and conditions for these specific models:
- pyannote/speaker-diarization-3.1: Accept terms here
- pyannote/segmentation-3.0: Accept terms here
- speechbrain/spkrec-ecapa-voxceleb: Accept terms here
Important: Click "Accept" on each model page. Access is usually granted immediately, but may take a few minutes.
-
Add Token to Environment Add your HuggingFace token to your
.env
file:HUGGINGFACE_TOKEN=your_token_here
-
Enable Speaker Diarization
- Restart WhisPad after adding the token
- In the transcription interface, check the "Enable Speaker Diarization" option
- Works with both Whisper Local and SenseVoice providers
- Automatic Speaker Detection: Identifies different speakers in audio
- Speaker Labels: Adds
[SPEAKER 1]
,[SPEAKER 2]
labels to transcriptions - Multi-Provider Support: Works with both local Whisper and SenseVoice
- Format Compatibility: Automatically converts WebM recordings to WAV for processing
- Clear Line Separation: Each speaker segment is placed on a separate line for easy reading
- Token Issues: Ensure you've accepted terms for all required models
- Performance: Speaker diarization adds processing time, especially on CPU
- Quality: Works best with clear audio and distinct speakers
- Press the microphone button to record audio and get real-time transcription.
- Select text fragments and apply style or clarity improvements with a click.
- Organize your notes: add titles, tags and search them easily.
- Download each note in Markdown or the entire set in a ZIP file.
- Download additional whisper.cpp models from the Models menu (you can still drag and drop your own files) and enjoy offline transcription.
- Use the Restore menu to import previously saved notes.
With these instructions you should have WhisPad running in just a few minutes with or without Docker. Enjoy fast transcription and all the benefits of organizing your ideas in one place!
WhisPad is designed to persist your data between container restarts, updates, and recreations through Docker volumes and a PostgreSQL database:
- Notes: Stored in
./saved_notes/
(mounted to/app/saved_notes
in container) - Audio Files: Stored in
./saved_audios/
(mounted to/app/saved_audios
in container) - Users: Stored in PostgreSQL (
whispad
database) - Provider Config: Stored in PostgreSQL database (no external file needed)
- Models: Stored in
./whisper-cpp-models/
(mounted to/app/whisper-cpp-models
in container) - Logs: Stored in
./logs/
(mounted to/var/log/nginx
in container)
- Default Admin: Username
admin
, password set viaADMIN_PASSWORD
- User Configuration: Admins can create users and assign different transcription/postprocessing providers
- Per-User Folders: Each user's notes are isolated in their own folder under
saved_notes/
- Single User Mode: Set
MULTI_USER=false
in.env
or the compose file to automatically sign in with the admin account and bypass the login page
On first start the backend creates the admin
user using the password from ADMIN_PASSWORD
.
Important: Change the admin password immediately after first login for security!
Here are some screenshots of WhisPad in action:
- @Drakonis96 - Main idea and core coding.
- @laweschan - Contributed fresh insights and tested the application.
This project was developed with the help of AI tools including Perplexity Labs, OpenAI Codex, and Claude 4. Local transcription models run thanks to whisper.cpp (a copy is bundled here for easier installation).