adapters/openai-whisper-adapter
Voice Router SDK - OpenAI Whisper Provider / adapters/openai-whisper-adapter
adapters/openai-whisper-adapter
Classes
OpenAIWhisperAdapter
OpenAI Whisper transcription provider adapter
Implements transcription for OpenAI's Whisper and GPT-4o transcription models with support for:
- Multiple model options: whisper-1, gpt-4o-transcribe, gpt-4o-mini-transcribe, gpt-4o-transcribe-diarize
- Speaker diarization (with gpt-4o-transcribe-diarize model)
- Word-level timestamps
- Multi-language support
- Prompt-based style guidance
- Known speaker references for improved diarization
- Temperature control for output randomness
See
- https://platform.openai.com/docs/guides/speech-to-text OpenAI Speech-to-Text Documentation
- https://platform.openai.com/docs/api-reference/audio OpenAI Audio API Reference
Examples
import { OpenAIWhisperAdapter } from '@meeting-baas/sdk';
const adapter = new OpenAIWhisperAdapter();
adapter.initialize({
apiKey: process.env.OPENAI_API_KEY
});
const result = await adapter.transcribe({
type: 'url',
url: 'https://example.com/audio.mp3'
}, {
language: 'en'
});
console.log(result.data.text);const result = await adapter.transcribe({
type: 'url',
url: 'https://example.com/meeting.mp3'
}, {
language: 'en',
diarization: true, // Uses gpt-4o-transcribe-diarize model
metadata: {
model: 'gpt-4o-transcribe-diarize'
}
});
console.log('Speakers:', result.data.speakers);
console.log('Utterances:', result.data.utterances);const result = await adapter.transcribe(audio, {
language: 'en',
wordTimestamps: true,
metadata: {
model: 'gpt-4o-transcribe', // More accurate than whisper-1
temperature: 0.2, // Lower temperature for more focused output
prompt: 'Expect technical terminology related to AI and machine learning'
}
});
console.log('Words:', result.data.words);const result = await adapter.transcribe(audio, {
language: 'en',
diarization: true,
metadata: {
model: 'gpt-4o-transcribe-diarize',
knownSpeakerNames: ['customer', 'agent'],
knownSpeakerReferences: [
'data:audio/wav;base64,...', // Customer voice sample
'data:audio/wav;base64,...' // Agent voice sample
]
}
});
// Speakers will be labeled as 'customer' and 'agent' instead of 'A' and 'B'
console.log('Speakers:', result.data.speakers);Extends
Methods
createErrorResponse()
protectedcreateErrorResponse(error,statusCode?,code?):UnifiedTranscriptResponse
Helper method to create error responses with stack traces
Parameters
| Parameter | Type | Description |
|---|---|---|
error | unknown | Error object or unknown error |
statusCode? | number | Optional HTTP status code |
code? | ErrorCode | Optional error code (defaults to extracted or UNKNOWN_ERROR) |
Returns
Inherited from
BaseAdapter.createErrorResponse
getAxiosConfig()
protectedgetAxiosConfig():object
Get axios config for generated API client functions Configures headers and base URL using Bearer token authorization
Returns
object
baseURL
baseURL:
string
headers
headers:
Record<string,string>
timeout
timeout:
number
Overrides
getTranscript()
getTranscript(
transcriptId):Promise<UnifiedTranscriptResponse<TranscriptionProvider>>
OpenAI Whisper returns results synchronously, so getTranscript is not needed. This method exists for interface compatibility but will return an error.
Parameters
| Parameter | Type |
|---|---|
transcriptId | string |
Returns
Promise<UnifiedTranscriptResponse<TranscriptionProvider>>
Overrides
handleRealtimeMessage()
privatehandleRealtimeMessage(message,callbacks?,onTranscriptUpdate?):void
Handle incoming Realtime API messages
Parameters
| Parameter | Type |
|---|---|
message | RealtimeServerEvent |
callbacks? | StreamingCallbacks |
onTranscriptUpdate? | (text) => void |
Returns
void
initialize()
initialize(
config):void
Initialize the adapter with configuration
Parameters
| Parameter | Type |
|---|---|
config | ProviderConfig |
Returns
void
Inherited from
normalizeResponse()
privatenormalizeResponse(response,model,isDiarization):UnifiedTranscriptResponse
Normalize OpenAI response to unified format
Parameters
| Parameter | Type |
|---|---|
response | CreateTranscriptionResponseDiarizedJson | CreateTranscriptionResponseVerboseJson | { text: string; } |
model | string |
isDiarization | boolean |
Returns
pollForCompletion()
protectedpollForCompletion(transcriptId,options?):Promise<UnifiedTranscriptResponse<TranscriptionProvider>>
Generic polling helper for async transcription jobs
Polls getTranscript() until job completes or times out.
Parameters
| Parameter | Type | Description |
|---|---|---|
transcriptId | string | Job/transcript ID to poll |
options? | { intervalMs?: number; maxAttempts?: number; } | Polling configuration |
options.intervalMs? | number | - |
options.maxAttempts? | number | - |
Returns
Promise<UnifiedTranscriptResponse<TranscriptionProvider>>
Final transcription result
Inherited from
selectModel()
privateselectModel(options?):string
Select appropriate model based on transcription options
Parameters
| Parameter | Type |
|---|---|
options? | TranscribeOptions |
Returns
string
transcribe()
transcribe(
audio,options?):Promise<UnifiedTranscriptResponse<TranscriptionProvider>>
Submit audio for transcription
OpenAI Whisper API processes audio synchronously and returns results immediately. Supports multiple models with different capabilities:
- whisper-1: Open source Whisper V2 model
- gpt-4o-transcribe: More accurate GPT-4o based transcription
- gpt-4o-mini-transcribe: Faster, cost-effective GPT-4o mini
- gpt-4o-transcribe-diarize: GPT-4o with speaker diarization
Parameters
| Parameter | Type | Description |
|---|---|---|
audio | AudioInput | Audio input (URL or Buffer) |
options? | TranscribeOptions | Transcription options |
Returns
Promise<UnifiedTranscriptResponse<TranscriptionProvider>>
Transcription response with full results
Overrides
transcribeStream()
transcribeStream(
options?,callbacks?):Promise<StreamingSession>
Real-time streaming transcription using OpenAI Realtime API
Opens a WebSocket connection to OpenAI's Realtime API for live audio transcription. Audio should be sent as PCM16 format (16-bit signed, little-endian).
Parameters
| Parameter | Type | Description |
|---|---|---|
options? | StreamingOptions | Streaming options including audio format and VAD settings |
callbacks? | StreamingCallbacks | Event callbacks for transcription events |
Returns
Promise<StreamingSession>
StreamingSession for sending audio and controlling the session
Example
const session = await adapter.transcribeStream({
sampleRate: 24000,
openaiStreaming: {
model: 'gpt-4o-realtime-preview',
turnDetection: {
type: 'server_vad',
threshold: 0.5,
silenceDurationMs: 500
}
}
}, {
onTranscript: (event) => console.log(event.text),
onError: (error) => console.error(error)
});
// Send audio chunks (PCM16 format)
session.sendAudio({ data: audioBuffer });
// Close when done
await session.close();validateConfig()
protectedvalidateConfig():void
Helper method to validate configuration
Returns
void
Inherited from
Constructors
Constructor
new OpenAIWhisperAdapter():
OpenAIWhisperAdapter
Returns
Inherited from
Properties
baseUrl
protectedbaseUrl:string="https://api.openai.com/v1"
Base URL for provider API (must be defined by subclass)
Overrides
capabilities
readonlycapabilities:ProviderCapabilities
Provider capabilities
Overrides
name
readonlyname:"openai-whisper"
Provider name
Overrides
config?
protectedoptionalconfig:ProviderConfig
Inherited from
Functions
createOpenAIWhisperAdapter()
createOpenAIWhisperAdapter(
config):OpenAIWhisperAdapter
Factory function to create an OpenAI Whisper adapter
Parameters
| Parameter | Type |
|---|---|
config | ProviderConfig |