Skip to content

Podcast Transcription

Generate transcripts for podcast episodes using local Whisper AI. All processing happens on your device.

Overview

After archiving a podcast episode, Social Archiver can generate a full transcript using OpenAI's Whisper speech recognition model running locally on your computer.

Why Local Processing?

This feature is designed with two considerations in mind:

  1. Privacy: Your audio never leaves your device. All transcription happens locally using open-source tools, ensuring complete privacy for sensitive content.

  2. No API Costs: Unlike cloud-based transcription services that charge per minute of audio, local Whisper is completely free to use once installed.

The trade-off is that you need to install additional tools and transcription speed depends on your computer's performance.

Requirements

You need one of the following speech recognition tools installed:

ToolCLI IncludedSpeedAuto Model Download
faster-whisperNo (requires wrapper)Fastest✓ Yes
openai-whisperYesModerate✓ Yes
whisper.cppYesFast✗ Manual

Recommendation

  • Apple Silicon Mac (M1/M2/M3/M4): Use whisper.cpp for best performance (Metal GPU acceleration)
  • Windows: Use openai-whisper for easiest setup (CLI included, no wrapper needed)
  • Linux: Use faster-whisper for best CPU performance (auto model download)

CTranslate2-based implementation. Up to 4x faster than openai-whisper with lower memory usage. Models are automatically downloaded on first use.

CLI Wrapper Required

faster-whisper is a Python library without a built-in CLI. You need to install our CLI wrapper script.

Windows Users

For Windows, we recommend using openai-whisper instead (see Option 2 below). It includes a CLI out of the box and requires no additional setup. If you still want to use faster-whisper on Windows, follow the Windows-specific instructions below.

macOS / Linux Installation

Step 1: Install the library

bash
pip install faster-whisper
bash
# macOS Homebrew blocks system-wide pip installs (PEP 668)
# Create a dedicated virtual environment instead:

# Create venv and install faster-whisper
python3 -m venv ~/.local/share/faster-whisper-venv
~/.local/share/faster-whisper-venv/bin/pip install faster-whisper

Step 2: Install CLI wrapper

Download the wrapper script and save it to your PATH:

bash
# Create bin directory if it doesn't exist
mkdir -p ~/.local/bin

# Download wrapper script
curl -o ~/.local/bin/faster-whisper \
  https://raw.githubusercontent.com/hyungyunlim/obsidian-social-archiver-releases/main/faster-whisper-cli.py

# Make it executable
chmod +x ~/.local/bin/faster-whisper

macOS Users: Update the shebang

If you installed faster-whisper in a venv (Step 1), you need to update the script's shebang to use the venv Python:

bash
# Replace the first line of the script
sed -i '' '1s|.*|#!/Users/'$USER'/.local/share/faster-whisper-venv/bin/python|' ~/.local/bin/faster-whisper

Step 3: Add to PATH (if not already)

bash
# Add to your shell config (~/.zshrc or ~/.bashrc)
export PATH="$HOME/.local/bin:$PATH"

# Reload shell
source ~/.zshrc  # or ~/.bashrc

Verify installation:

bash
faster-whisper --version

Windows Installation

Step 1: Install the library

powershell
pip install faster-whisper

Step 2: Install CLI wrapper

Open PowerShell and run:

powershell
# Download wrapper script to Python Scripts folder
# First, find your Python Scripts path
python -c "import sys; print(sys.prefix + '\\Scripts')"

# Download the wrapper (adjust path if needed)
Invoke-WebRequest -Uri "https://raw.githubusercontent.com/hyungyunlim/obsidian-social-archiver-releases/main/faster-whisper-cli.py" -OutFile "$env:LOCALAPPDATA\Programs\Python\Python311\Scripts\faster-whisper.py"

Step 3: Create a batch wrapper

Create a file named faster-whisper.bat in the same Scripts folder:

batch
@echo off
python "%~dp0faster-whisper.py" %*

Or run this PowerShell command:

powershell
$scriptsPath = python -c "import sys; print(sys.prefix + '\\Scripts')"
Set-Content -Path "$scriptsPath\faster-whisper.bat" -Value '@echo off`r`npython "%~dp0faster-whisper.py" %*'

Verify installation:

powershell
faster-whisper --version

Option 2: openai-whisper

The original Python implementation by OpenAI. Easy to install, works out of the box. Models are automatically downloaded on first use.

Installation:

bash
# Requires Python 3.8+
pip install openai-whisper

# Or with pipx (recommended for isolation)
pipx install openai-whisper

Verify installation:

bash
whisper --help

Option 3: whisper.cpp

High-performance C++ implementation. Best performance on Apple Silicon Macs with Metal GPU acceleration.

Manual Model Download Required

whisper.cpp requires you to manually download GGML model files before use.

Installation:

bash
# macOS (using Homebrew)
brew install whisper-cpp

# Or build from source
git clone https://github.com/ggerganov/whisper.cpp
cd whisper.cpp
cmake -B build && cmake --build build --config Release

Download models:

bash
# Create models directory
mkdir -p ~/whisper-models

# Download small model (recommended, 465MB)
curl -L -o ~/whisper-models/ggml-small.bin \
  https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-small.bin

# Other models available:
# ggml-tiny.bin (74MB), ggml-base.bin (142MB)
# ggml-medium.bin (1.5GB), ggml-large-v3.bin (2.9GB)

The plugin searches for models in these locations:

  • ~/whisper-models/
  • ~/.cache/whisper-cpp/
  • ~/whisper.cpp/models/

Verify installation:

bash
whisper-cli --help

Plugin Settings

Configure transcription in Settings → Social Archiver → Transcription Settings:

SettingDescription
Enable Whisper transcriptionToggle transcription feature on/off
Preferred Whisper variantChoose which Whisper implementation to use
Preferred modelSelect model size (tiny to large)
Default languageAuto-detect or specify language
Custom Whisper pathOverride automatic binary detection

Choosing a Variant

If you have multiple Whisper variants installed, you can select which one to use:

  • Auto-detect: On Apple Silicon Mac, tries whisper.cpp first; on other systems, tries faster-whisper first
  • faster-whisper: Easiest setup, auto model download, great CPU performance
  • openai-whisper: Original implementation, easy setup, auto model download
  • whisper.cpp: Best for Apple Silicon Macs (Metal GPU acceleration), requires manual model download

Models Are Not Shared

Each variant uses a different model format. Models downloaded for one variant cannot be used by another:

VariantModel FormatStorage Location
faster-whisperCTranslate2~/.cache/huggingface/
openai-whisperPyTorch (.pt)~/.cache/whisper/
whisper.cppGGML (.bin)Manual location

How It Works

  1. Archive a podcast: First, subscribe to a podcast or archive an episode with audio
  2. Open in Timeline: View the archived episode in the Timeline view
  3. Click transcribe button: Click the microphone icon on the podcast card
  4. Wait for processing: Transcription progress is shown in the button
  5. View transcript: Expandable transcript appears below the audio player

Transcript Features

Interactive Timestamps

Click any timestamp in the transcript to jump to that point in the audio. The current segment is highlighted during playback.

Search (Desktop)

Use the search box to find specific words or phrases in the transcript.

Collapse/Expand

Transcripts start collapsed to save space. Click to expand and view the full content.

Transcript Storage

Transcripts are saved directly in your markdown files under a ## Transcript section:

markdown
## Transcript

[00:00] Welcome to the podcast...

[00:15] Today we're discussing...

Metadata is stored in the YAML frontmatter:

yaml
transcriptionModel: small
transcriptionLanguage: en
transcriptionDuration: 1847.5
transcriptionTime: 2024-12-12T05:30:00.000Z

Model Selection

Choose a model based on your needs:

ModelSizeSpeedAccuracyBest For
tiny74MB~32xLowQuick previews
base142MB~16xFairShort clips
small466MB~6xGoodDefault, balanced
medium1.5GB~2xHighImportant content
large2.9GB~1xBestMaximum accuracy

Speed is relative to audio length (e.g., ~6x means 10 min audio takes ~1.5 min to transcribe).

Recommendation

Start with the small model. It offers a good balance of speed and accuracy for most podcasts.

Troubleshooting

Transcribe Button Doesn't Appear

Cause: No Whisper tool detected in system PATH

Solution:

  1. Verify installation with whisper --version or faster-whisper --version
  2. Restart Obsidian after installing tools
  3. Check that the tool is in your system PATH

Transcription Fails or Times Out

Common causes:

  • Insufficient memory for the selected model
  • Corrupted audio file
  • Very long audio (2+ hours)

Solutions:

  • Try a smaller model (e.g., tiny or base)
  • Verify the audio file plays correctly
  • Check available disk space and memory

Poor Transcription Quality

Common causes:

  • Background noise in audio
  • Multiple speakers talking over each other
  • Non-standard accents or technical jargon

Solutions:

  • Use a larger model (medium or large)
  • Specify the language in settings if auto-detection fails

Slow Transcription

Transcription speed depends on:

  • Your CPU/GPU performance
  • Selected model size
  • Audio length

Performance Tips

  • Apple Silicon Macs: Use whisper.cpp for best performance
  • NVIDIA GPU: Use faster-whisper with CUDA support
  • CPU only: Use tiny or base models for reasonable speed

Language Support

Whisper supports 99+ languages with automatic detection. For best results with non-English content:

  1. Let auto-detection identify the language first
  2. If detection fails, manually specify the language in plugin settings

Privacy & Storage

Local Processing Only

All transcription happens locally on your device. Audio files are never uploaded to any server. The transcript is stored only in your Obsidian vault.

Disk Space

Transcripts are text-only and very small (typically under 100KB even for long podcasts). They won't significantly impact your vault size.

Released under the MIT License.