OpenAI

Overview

OpenAI provides two STT service implementations:

OpenAISTTService for VAD-segmented speech recognition using OpenAI’s transcription API (HTTP-based), supporting GPT-4o transcription and Whisper models
OpenAIRealtimeSTTService for real-time streaming speech-to-text using OpenAI’s Realtime API WebSocket transcription sessions, with support for local VAD and server-side VAD modes

OpenAI STT API Reference

Pipecat’s API methods for OpenAI STT integration

Example Implementation

Complete example with OpenAI ecosystem integration

OpenAI Documentation

Official OpenAI transcription documentation and features

OpenAI Platform

Access API keys and transcription models

Installation

To use OpenAI services, install the required dependency:

pip install "pipecat-ai[openai]"

Prerequisites

OpenAI Account Setup

Before using OpenAI STT services, you need:

OpenAI Account: Sign up at OpenAI Platform
API Key: Generate an API key from your account dashboard
Model Access: Ensure access to Whisper and GPT-4o transcription models

Required Environment Variables

OPENAI_API_KEY: Your OpenAI API key for authentication

OpenAISTTService

OpenAISTTService uses VAD-based audio segmentation with HTTP transcription requests. It records speech segments detected by local VAD and sends them to OpenAI’s transcription API.

from pipecat.services.openai.stt import OpenAISTTService

stt = OpenAISTTService(
    api_key=os.getenv("OPENAI_API_KEY"),
    model="gpt-4o-transcribe",
)

OpenAIRealtimeSTTService

OpenAIRealtimeSTTService provides real-time streaming speech-to-text using OpenAI’s Realtime API WebSocket transcription sessions. Audio is streamed continuously over a WebSocket connection for lower latency compared to HTTP-based transcription.

Usage Example

from pipecat.services.openai.stt import OpenAIRealtimeSTTService

# Local VAD mode (default) - use with a VAD processor in the pipeline
stt = OpenAIRealtimeSTTService(
    api_key=os.getenv("OPENAI_API_KEY"),
    model="gpt-4o-transcribe",
    noise_reduction="near_field",
)

# Server-side VAD mode - do NOT use a separate VAD processor
stt = OpenAIRealtimeSTTService(
    api_key=os.getenv("OPENAI_API_KEY"),
    model="gpt-4o-transcribe",
    turn_detection=None,  # Enable server-side VAD
)

API Reference

Services

Utilities

Frameworks

Pipeline

Overview

OpenAI STT API Reference

Example Implementation

OpenAI Documentation

OpenAI Platform

Installation

Prerequisites

OpenAI Account Setup

Required Environment Variables

OpenAISTTService

OpenAIRealtimeSTTService

Usage Example

API Reference

Services

Utilities

Frameworks

Pipeline

​Overview

OpenAI STT API Reference

Example Implementation

OpenAI Documentation

OpenAI Platform

​Installation

​Prerequisites

​OpenAI Account Setup

​Required Environment Variables

​OpenAISTTService

​OpenAIRealtimeSTTService

​Usage Example

Overview

Installation

Prerequisites

OpenAI Account Setup

Required Environment Variables

OpenAISTTService

OpenAIRealtimeSTTService

Usage Example