Cloud transcription services like Otter.ai charge $100 per year to transcribe your meetings. They also store your audio on their servers, process it through their AI, and retain it according to their privacy policies. For confidential business discussions, legal consultations, or medical appointments, that’s a problem.
Scriberr offers an alternative: fully offline transcription with speaker identification, running entirely on your own hardware. No subscriptions, no cloud uploads, no third-party access to your conversations.
What Scriberr Does
Scriberr is a self-hosted application that transcribes audio and video files locally. It uses WhisperX, which combines OpenAI’s Whisper speech recognition with speaker diarization (identifying who said what) and precise word-level timestamps.
Key features:
- Offline transcription using NVIDIA Parakeet, Canary, or Whisper models
- Speaker detection that labels different speakers in the transcript
- Chat integration with Ollama or OpenAI-compatible APIs for summarizing transcripts
- Built-in recording to capture audio directly
- Folder watcher that automatically processes new files
- PWA support for desktop and mobile access
The transcription happens on your machine. Your audio never leaves your network.
Hardware Requirements
Scriberr runs on both CPU and GPU, but performance differs substantially:
CPU-only:
- Works on any modern machine
- Transcription speed: roughly real-time or slower (a 60-minute recording takes about 60 minutes)
- Adequate for occasional use
GPU-accelerated:
- Requires NVIDIA GPU with at least 4GB VRAM
- GTX 1060 or better for basic acceleration
- RTX 3060/4060 or better recommended for smooth performance
- Transcription speed: 4-10x faster than real-time
For perspective, faster-whisper requires less than 8GB GPU memory for the large-v2 model with beam_size=5. A mid-range gaming GPU handles transcription comfortably.
Installation: Docker Method
Docker is the easiest way to run Scriberr. You’ll need Docker and Docker Compose installed on your system.
CPU-Only Setup
Create a docker-compose.yml file:
services:
scriberr:
image: ghcr.io/rishikanthc/scriberr:latest
container_name: scriberr
ports:
- "8080:8080"
environment:
- PUID=1000
- PGID=1000
- SECURE_COOKIES=false
volumes:
- scriberr_data:/app/data
- env_data:/app/env
restart: unless-stopped
volumes:
scriberr_data:
env_data:
Then run:
docker compose up -d
GPU-Accelerated Setup (NVIDIA)
First, install the NVIDIA Container Toolkit on your host system. Verify it works with nvidia-smi.
Create docker-compose.yml:
services:
scriberr:
image: ghcr.io/rishikanthc/scriberr:v1.0.4-cuda
container_name: scriberr
ports:
- "8080:8080"
environment:
- PUID=1000
- PGID=1000
- SECURE_COOKIES=false
- NVIDIA_VISIBLE_DEVICES=all
- NVIDIA_DRIVER_CAPABILITIES=compute,utility
volumes:
- scriberr_data:/app/data
- env_data:/app/env
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
restart: unless-stopped
volumes:
scriberr_data:
env_data:
Run with:
docker compose up -d
For RTX 50-series (Blackwell) GPUs: Use the Blackwell-specific image instead:
image: ghcr.io/rishikanthc/scriberr:v1.0.4-blackwell
First Startup
The first launch takes several minutes. Scriberr downloads machine learning models (Whisper, PyAnnote for diarization, NVIDIA NeMo) and initializes the Python environment. Subsequent starts are fast because models persist in the volumes.
Access the web interface at http://localhost:8080.
Installation: Homebrew Method (macOS/Linux)
If you prefer running without Docker:
brew tap rishikanthc/scriberr
brew install scriberr
scriberr
This installs Scriberr as a native application. It uses your system’s Python environment and runs transcription using Apple Metal (on M-series Macs) or CPU.
Using Scriberr
Manual Transcription
- Open the web interface
- Upload an audio or video file (MP3, WAV, FLAC, M4A, MP4, etc.)
- Wait for transcription to complete
- View the transcript with speaker labels and timestamps
Automatic Processing
Set up folder watching to automatically transcribe new recordings:
- Configure a watched folder in Scriberr’s settings
- Point your recording software to save files there
- Transcripts appear automatically
This works well with screen recording tools, voice memo apps, or meeting recorders that save local files.
AI Summarization
Connect Scriberr to a local LLM through Ollama or any OpenAI-compatible API. After transcription, you can chat with the transcript to generate summaries, extract action items, or ask questions about the conversation.
Understanding the Whisper Stack
Scriberr uses WhisperX, which builds on faster-whisper. Understanding this stack helps with troubleshooting and optimization.
faster-whisper reimplements OpenAI’s Whisper model using CTranslate2, a C++ inference engine. It runs 4x faster than the original Python implementation while using less memory through INT8/FP16 quantization.
WhisperX adds three capabilities on top:
- Voice Activity Detection (VAD): Identifies speech segments to reduce hallucinations on silent portions
- Forced alignment: Uses wav2vec2 to get precise word-level timestamps
- Speaker diarization: Uses pyannote-audio to identify different speakers
The tradeoff: WhisperX runs multiple models per audio file, so it uses more memory and processing time than plain faster-whisper. For meeting transcription where knowing who said what matters, that overhead is worth it.
Alternatives Worth Knowing
If Scriberr doesn’t fit your needs, consider these options:
faster-whisper via LinuxServer.io: A simpler container that provides raw transcription through the Wyoming protocol. Good for Home Assistant integration or if you don’t need speaker identification. Uses less memory.
services:
faster-whisper:
image: lscr.io/linuxserver/faster-whisper:latest
container_name: faster-whisper
environment:
- PUID=1000
- PGID=1000
- TZ=Etc/UTC
- WHISPER_MODEL=small
volumes:
- /path/to/config:/config
ports:
- 10300:10300
restart: unless-stopped
WhisperX directly: If you’re comfortable with Python and want maximum control, run WhisperX as a library or use the whisperx-asr-service container for API access.
Meetily: A more polished meeting transcription tool with its own UI, designed specifically for meetings rather than general audio. Also self-hosted and open source.
Cost Comparison
Running your own transcription eliminates per-minute charges:
| Service | Cost |
|---|---|
| Otter.ai | $99-299/year |
| OpenAI Whisper API | $0.006/minute ($3.60/hour) |
| Rev.ai | $0.003-0.005/minute |
| Self-hosted (Scriberr) | $0 (electricity only) |
If you transcribe 10 hours of meetings monthly, Otter.ai costs about $100/year. The OpenAI API would cost $36/month ($432/year). Self-hosting costs electricity—a few cents per hour of GPU time.
The real benefit isn’t cost savings. It’s keeping your conversations private.
Why Self-Hosting Matters
Cloud transcription services store your audio. They process it through their systems. They may use it for training. Even with data protection promises, you’re trusting a third party with potentially sensitive information.
Self-hosted transcription stays local:
- GDPR/HIPAA compliance: No third-party data processor agreements needed
- Attorney-client privilege: Legal conversations never leave your network
- Corporate confidentiality: Board meetings, strategy discussions, and HR matters stay internal
- Personal privacy: Medical appointments, therapy sessions, and personal recordings remain private
Your audio files never touch the internet. The AI models run on your hardware. Transcripts stay on your storage.
Troubleshooting
“Unable to load audio stream” error: Set SECURE_COOKIES=false if accessing via HTTP instead of HTTPS.
Permission errors on Linux: Set PUID and PGID to your user’s UID/GID (find with id command). Default 1000 works for most single-user systems.
GPU not detected: Verify NVIDIA Container Toolkit installation with docker run --rm --gpus all nvidia/cuda:12.0-base nvidia-smi. If that fails, reinstall the toolkit.
Slow transcription on GPU: Check that Scriberr is actually using the GPU. The CUDA image should show GPU activity in nvidia-smi during transcription.
Models failing to download: First startup requires internet access to download ML models. After initial setup, Scriberr works fully offline.
What You Can Do
- Install Scriberr using the Docker method above—it takes about 15 minutes
- Test with a short recording to verify everything works
- Set up folder watching if you regularly record meetings
- Connect to Ollama for AI-powered summaries (optional)
Once running, you’ll never pay for transcription again. More importantly, your conversations stay yours.