Overview
OpenAI provides two STT service implementations:OpenAISTTServicefor VAD-segmented speech recognition using OpenAI’s transcription API (HTTP-based), supporting GPT-4o transcription and Whisper modelsOpenAIRealtimeSTTServicefor real-time streaming speech-to-text using OpenAI’s Realtime API WebSocket transcription sessions, with support for local VAD and server-side VAD modes
OpenAI STT API Reference
Pipecat’s API methods for OpenAI STT integration
Example Implementation
Complete example with OpenAI ecosystem integration
OpenAI Documentation
Official OpenAI transcription documentation and features
OpenAI Platform
Access API keys and transcription models
Installation
To use OpenAI services, install the required dependency:Prerequisites
OpenAI Account Setup
Before using OpenAI STT services, you need:- OpenAI Account: Sign up at OpenAI Platform
- API Key: Generate an API key from your account dashboard
- Model Access: Ensure access to Whisper and GPT-4o transcription models
Required Environment Variables
OPENAI_API_KEY: Your OpenAI API key for authentication
OpenAISTTService
OpenAISTTService uses VAD-based audio segmentation with HTTP transcription requests. It records speech segments detected by local VAD and sends them to OpenAI’s transcription API.
OpenAIRealtimeSTTService
OpenAIRealtimeSTTService provides real-time streaming speech-to-text using OpenAI’s Realtime API WebSocket transcription sessions. Audio is streamed continuously over a WebSocket connection for lower latency compared to HTTP-based transcription.