Voice Router SDK - OpenAI Whisper Provider / adapters/openai-whisper-adapter

adapters/openai-whisper-adapter

Classes

OpenAIWhisperAdapter

OpenAI Whisper transcription provider adapter

Implements transcription for OpenAI's Whisper and GPT-4o transcription models with support for:

Multiple model options: whisper-1, gpt-4o-transcribe, gpt-4o-mini-transcribe, gpt-4o-transcribe-diarize
Speaker diarization (with gpt-4o-transcribe-diarize model)
Word-level timestamps
Multi-language support
Prompt-based style guidance
Known speaker references for improved diarization
Temperature control for output randomness

See

https://platform.openai.com/docs/guides/speech-to-text OpenAI Speech-to-Text Documentation
https://platform.openai.com/docs/api-reference/audio OpenAI Audio API Reference

Examples

import { OpenAIWhisperAdapter } from '@meeting-baas/sdk';

const adapter = new OpenAIWhisperAdapter();
adapter.initialize({
  apiKey: process.env.OPENAI_API_KEY
});

const result = await adapter.transcribe({
  type: 'url',
  url: 'https://example.com/audio.mp3'
}, {
  language: 'en'
});

console.log(result.data.text);

const result = await adapter.transcribe({
  type: 'url',
  url: 'https://example.com/meeting.mp3'
}, {
  language: 'en',
  diarization: true,  // Uses gpt-4o-transcribe-diarize model
  metadata: {
    model: 'gpt-4o-transcribe-diarize'
  }
});

console.log('Speakers:', result.data.speakers);
console.log('Utterances:', result.data.utterances);

const result = await adapter.transcribe(audio, {
  language: 'en',
  wordTimestamps: true,
  metadata: {
    model: 'gpt-4o-transcribe',  // More accurate than whisper-1
    temperature: 0.2,  // Lower temperature for more focused output
    prompt: 'Expect technical terminology related to AI and machine learning'
  }
});

console.log('Words:', result.data.words);

const result = await adapter.transcribe(audio, {
  language: 'en',
  diarization: true,
  metadata: {
    model: 'gpt-4o-transcribe-diarize',
    knownSpeakerNames: ['customer', 'agent'],
    knownSpeakerReferences: [
      'data:audio/wav;base64,...',  // Customer voice sample
      'data:audio/wav;base64,...'   // Agent voice sample
    ]
  }
});

// Speakers will be labeled as 'customer' and 'agent' instead of 'A' and 'B'
console.log('Speakers:', result.data.speakers);

Extends

BaseAdapter

Methods

createErrorResponse()

protected createErrorResponse(error, statusCode?, code?): UnifiedTranscriptResponse

Helper method to create error responses with stack traces

Parameters

Parameter	Type	Description
`error`	`unknown`	Error object or unknown error
`statusCode?`	`number`	Optional HTTP status code
`code?`	`ErrorCode`	Optional error code (defaults to extracted or UNKNOWN_ERROR)

Returns

UnifiedTranscriptResponse

Inherited from

BaseAdapter.createErrorResponse

deriveWsUrl()

protected deriveWsUrl(httpUrl): string

Derive a WebSocket URL from an HTTP base URL

Converts https:// → wss:// and http:// → ws://

Parameters

Parameter	Type
`httpUrl`	`string`

Returns

string

Inherited from

BaseAdapter.deriveWsUrl

getAxiosConfig()

protected getAxiosConfig(): object

Get axios config for generated API client functions Configures headers and base URL using Bearer token authorization

Returns

object

baseURL

baseURL: string

headers

headers: Record<string, string>

timeout

timeout: number

Overrides

BaseAdapter.getAxiosConfig

getTranscript()

getTranscript(transcriptId): Promise<UnifiedTranscriptResponse<TranscriptionProvider>>

OpenAI Whisper returns results synchronously, so getTranscript is not needed. This method exists for interface compatibility but will return an error.

Parameters

Parameter	Type
`transcriptId`	`string`

Returns

Promise<UnifiedTranscriptResponse<TranscriptionProvider>>

Overrides

BaseAdapter.getTranscript

handleRealtimeMessage()

private handleRealtimeMessage(message, callbacks?, onTranscriptUpdate?): void

Handle incoming Realtime API messages

Parameters

Parameter	Type
`message`	`RealtimeServerEvent`
`callbacks?`	`StreamingCallbacks`
`onTranscriptUpdate?`	(`text`) => `void`

Returns

void

initialize()

initialize(config): void

Initialize the adapter with configuration

Parameters

Parameter	Type
`config`	`ProviderConfig`

Returns

void

Inherited from

BaseAdapter.initialize

normalizeResponse()

private normalizeResponse(response, model, isDiarization): UnifiedTranscriptResponse

Normalize OpenAI response to unified format

Parameters

Parameter	Type
`response`	`CreateTranscriptionResponseDiarizedJson` \| `CreateTranscriptionResponseVerboseJson` \| { `text`: `string`; }
`model`	`string`
`isDiarization`	`boolean`

Returns

UnifiedTranscriptResponse

pollForCompletion()

protected pollForCompletion(transcriptId, options?): Promise<UnifiedTranscriptResponse<TranscriptionProvider>>

Generic polling helper for async transcription jobs

Polls getTranscript() until job completes or times out.

Parameters

Parameter	Type	Description
`transcriptId`	`string`	Job/transcript ID to poll
`options?`	{ `intervalMs?`: `number`; `maxAttempts?`: `number`; }	Polling configuration
`options.intervalMs?`	`number`	-
`options.maxAttempts?`	`number`	-

Returns

Promise<UnifiedTranscriptResponse<TranscriptionProvider>>

Final transcription result

Inherited from

BaseAdapter.pollForCompletion

selectModel()

private selectModel(options?): string

Select appropriate model based on transcription options

Parameters

Parameter	Type
`options?`	`TranscribeOptions`

Returns

string

transcribe()

transcribe(audio, options?): Promise<UnifiedTranscriptResponse<TranscriptionProvider>>

Submit audio for transcription

OpenAI Whisper API processes audio synchronously and returns results immediately. Supports multiple models with different capabilities:

whisper-1: Open source Whisper V2 model
gpt-4o-transcribe: More accurate GPT-4o based transcription
gpt-4o-mini-transcribe: Faster, cost-effective GPT-4o mini
gpt-4o-transcribe-diarize: GPT-4o with speaker diarization

Parameters

Parameter	Type	Description
`audio`	`AudioInput`	Audio input (URL or Buffer)
`options?`	`TranscribeOptions`	Transcription options

Returns

Promise<UnifiedTranscriptResponse<TranscriptionProvider>>

Transcription response with full results

Overrides

BaseAdapter.transcribe

transcribeStream()

transcribeStream(options?, callbacks?): Promise<StreamingSession>

Real-time streaming transcription using OpenAI Realtime API

Opens a WebSocket connection to OpenAI's Realtime API for live audio transcription. Audio should be sent as PCM16 format (16-bit signed, little-endian).

Parameters

Parameter	Type	Description
`options?`	`StreamingOptions`	Streaming options including audio format and VAD settings
`callbacks?`	`StreamingCallbacks`	Event callbacks for transcription events

Returns

Promise<StreamingSession>

StreamingSession for sending audio and controlling the session

Example

const session = await adapter.transcribeStream({
  sampleRate: 24000,
  openaiStreaming: {
    model: 'gpt-4o-realtime-preview',
    turnDetection: {
      type: 'server_vad',
      threshold: 0.5,
      silenceDurationMs: 500
    }
  }
}, {
  onTranscript: (event) => console.log(event.text),
  onError: (error) => console.error(error)
});

// Send audio chunks (PCM16 format)
session.sendAudio({ data: audioBuffer });

// Close when done
await session.close();