VoiceRouter

adapters/openai-whisper-adapter

Voice Router SDK - OpenAI Whisper Provider / adapters/openai-whisper-adapter

adapters/openai-whisper-adapter

Classes

OpenAIWhisperAdapter

OpenAI Whisper transcription provider adapter

Implements transcription for OpenAI's Whisper and GPT-4o transcription models with support for:

  • Multiple model options: whisper-1, gpt-4o-transcribe, gpt-4o-mini-transcribe, gpt-4o-transcribe-diarize
  • Speaker diarization (with gpt-4o-transcribe-diarize model)
  • Word-level timestamps
  • Multi-language support
  • Prompt-based style guidance
  • Known speaker references for improved diarization
  • Temperature control for output randomness

See

Examples

import { OpenAIWhisperAdapter } from '@meeting-baas/sdk';

const adapter = new OpenAIWhisperAdapter();
adapter.initialize({
  apiKey: process.env.OPENAI_API_KEY
});

const result = await adapter.transcribe({
  type: 'url',
  url: 'https://example.com/audio.mp3'
}, {
  language: 'en'
});

console.log(result.data.text);
const result = await adapter.transcribe({
  type: 'url',
  url: 'https://example.com/meeting.mp3'
}, {
  language: 'en',
  diarization: true,  // Uses gpt-4o-transcribe-diarize model
  metadata: {
    model: 'gpt-4o-transcribe-diarize'
  }
});

console.log('Speakers:', result.data.speakers);
console.log('Utterances:', result.data.utterances);
const result = await adapter.transcribe(audio, {
  language: 'en',
  wordTimestamps: true,
  metadata: {
    model: 'gpt-4o-transcribe',  // More accurate than whisper-1
    temperature: 0.2,  // Lower temperature for more focused output
    prompt: 'Expect technical terminology related to AI and machine learning'
  }
});

console.log('Words:', result.data.words);
const result = await adapter.transcribe(audio, {
  language: 'en',
  diarization: true,
  metadata: {
    model: 'gpt-4o-transcribe-diarize',
    knownSpeakerNames: ['customer', 'agent'],
    knownSpeakerReferences: [
      'data:audio/wav;base64,...',  // Customer voice sample
      'data:audio/wav;base64,...'   // Agent voice sample
    ]
  }
});

// Speakers will be labeled as 'customer' and 'agent' instead of 'A' and 'B'
console.log('Speakers:', result.data.speakers);

Extends

Methods

createErrorResponse()

protected createErrorResponse(error, statusCode?, code?): UnifiedTranscriptResponse

Helper method to create error responses with stack traces

Parameters
ParameterTypeDescription
errorunknownError object or unknown error
statusCode?numberOptional HTTP status code
code?ErrorCodeOptional error code (defaults to extracted or UNKNOWN_ERROR)
Returns

UnifiedTranscriptResponse

Inherited from

BaseAdapter.createErrorResponse

getAxiosConfig()

protected getAxiosConfig(): object

Get axios config for generated API client functions Configures headers and base URL using Bearer token authorization

Returns

object

baseURL

baseURL: string

headers

headers: Record<string, string>

timeout

timeout: number

Overrides

BaseAdapter.getAxiosConfig

getTranscript()

getTranscript(transcriptId): Promise<UnifiedTranscriptResponse<TranscriptionProvider>>

OpenAI Whisper returns results synchronously, so getTranscript is not needed. This method exists for interface compatibility but will return an error.

Parameters
ParameterType
transcriptIdstring
Returns

Promise<UnifiedTranscriptResponse<TranscriptionProvider>>

Overrides

BaseAdapter.getTranscript

handleRealtimeMessage()

private handleRealtimeMessage(message, callbacks?, onTranscriptUpdate?): void

Handle incoming Realtime API messages

Parameters
ParameterType
messageRealtimeServerEvent
callbacks?StreamingCallbacks
onTranscriptUpdate?(text) => void
Returns

void

initialize()

initialize(config): void

Initialize the adapter with configuration

Parameters
ParameterType
configProviderConfig
Returns

void

Inherited from

BaseAdapter.initialize

normalizeResponse()

private normalizeResponse(response, model, isDiarization): UnifiedTranscriptResponse

Normalize OpenAI response to unified format

Parameters
ParameterType
responseCreateTranscriptionResponseDiarizedJson | CreateTranscriptionResponseVerboseJson | { text: string; }
modelstring
isDiarizationboolean
Returns

UnifiedTranscriptResponse

pollForCompletion()

protected pollForCompletion(transcriptId, options?): Promise<UnifiedTranscriptResponse<TranscriptionProvider>>

Generic polling helper for async transcription jobs

Polls getTranscript() until job completes or times out.

Parameters
ParameterTypeDescription
transcriptIdstringJob/transcript ID to poll
options?{ intervalMs?: number; maxAttempts?: number; }Polling configuration
options.intervalMs?number-
options.maxAttempts?number-
Returns

Promise<UnifiedTranscriptResponse<TranscriptionProvider>>

Final transcription result

Inherited from

BaseAdapter.pollForCompletion

selectModel()

private selectModel(options?): string

Select appropriate model based on transcription options

Parameters
ParameterType
options?TranscribeOptions
Returns

string

transcribe()

transcribe(audio, options?): Promise<UnifiedTranscriptResponse<TranscriptionProvider>>

Submit audio for transcription

OpenAI Whisper API processes audio synchronously and returns results immediately. Supports multiple models with different capabilities:

  • whisper-1: Open source Whisper V2 model
  • gpt-4o-transcribe: More accurate GPT-4o based transcription
  • gpt-4o-mini-transcribe: Faster, cost-effective GPT-4o mini
  • gpt-4o-transcribe-diarize: GPT-4o with speaker diarization
Parameters
ParameterTypeDescription
audioAudioInputAudio input (URL or Buffer)
options?TranscribeOptionsTranscription options
Returns

Promise<UnifiedTranscriptResponse<TranscriptionProvider>>

Transcription response with full results

Overrides

BaseAdapter.transcribe

transcribeStream()

transcribeStream(options?, callbacks?): Promise<StreamingSession>

Real-time streaming transcription using OpenAI Realtime API

Opens a WebSocket connection to OpenAI's Realtime API for live audio transcription. Audio should be sent as PCM16 format (16-bit signed, little-endian).

Parameters
ParameterTypeDescription
options?StreamingOptionsStreaming options including audio format and VAD settings
callbacks?StreamingCallbacksEvent callbacks for transcription events
Returns

Promise<StreamingSession>

StreamingSession for sending audio and controlling the session

Example
const session = await adapter.transcribeStream({
  sampleRate: 24000,
  openaiStreaming: {
    model: 'gpt-4o-realtime-preview',
    turnDetection: {
      type: 'server_vad',
      threshold: 0.5,
      silenceDurationMs: 500
    }
  }
}, {
  onTranscript: (event) => console.log(event.text),
  onError: (error) => console.error(error)
});

// Send audio chunks (PCM16 format)
session.sendAudio({ data: audioBuffer });

// Close when done
await session.close();
validateConfig()

protected validateConfig(): void

Helper method to validate configuration

Returns

void

Inherited from

BaseAdapter.validateConfig

Constructors

Constructor

new OpenAIWhisperAdapter(): OpenAIWhisperAdapter

Returns

OpenAIWhisperAdapter

Inherited from

BaseAdapter.constructor

Properties

baseUrl

protected baseUrl: string = "https://api.openai.com/v1"

Base URL for provider API (must be defined by subclass)

Overrides

BaseAdapter.baseUrl

capabilities

readonly capabilities: ProviderCapabilities

Provider capabilities

Overrides

BaseAdapter.capabilities

name

readonly name: "openai-whisper"

Provider name

Overrides

BaseAdapter.name

config?

protected optional config: ProviderConfig

Inherited from

BaseAdapter.config

Functions

createOpenAIWhisperAdapter()

createOpenAIWhisperAdapter(config): OpenAIWhisperAdapter

Factory function to create an OpenAI Whisper adapter

Parameters

ParameterType
configProviderConfig

Returns

OpenAIWhisperAdapter

On this page