<Warning>keyterms_prompt is only supported when the speech_model is specified as slam-1``</Warning> Improve accuracy with up to 1000 domain-specific words or phrases (maximum 6 words per phrase).

language_code?

optional language_code: TranscriptOptionalParamsLanguageCode

The language of your audio file. Possible values are found in Supported Languages. The default value is 'en_us'.

language_confidence_threshold?

optional language_confidence_threshold: number

The confidence threshold for the automatically detected language. An error will be returned if the language confidence is below this threshold. Defaults to 0.

language_detection?

optional language_detection: boolean

Enable Automatic language detection, either true or false.

multichannel?

optional multichannel: boolean

Enable Multichannel transcription, can be true or false.

prompt?

optional prompt: string

This parameter does not currently have any functionality attached to it.

punctuate?

optional punctuate: boolean

Enable Automatic Punctuation, can be true or false

redact_pii?

optional redact_pii: boolean

Redact PII from the transcribed text using the Redact PII model, can be true or false

redact_pii_audio?

optional redact_pii_audio: boolean

Generate a copy of the original media file with spoken PII "beeped" out, can be true or false. See PII redaction for more details.

redact_pii_audio_quality?

optional redact_pii_audio_quality: RedactPiiAudioQuality

Controls the filetype of the audio created by redact_pii_audio. Currently supports mp3 (default) and wav. See PII redaction for more details.

redact_pii_policies?

optional redact_pii_policies: PiiPolicy[]

The list of PII Redaction policies to enable. See PII redaction for more details.

redact_pii_sub?

optional redact_pii_sub: TranscriptOptionalParamsRedactPiiSub

The replacement logic for detected PII, can be "entity_type" or "hash". See PII redaction for more details.

sentiment_analysis?

optional sentiment_analysis: boolean

Enable Sentiment Analysis, can be true or false

speaker_labels?

optional speaker_labels: boolean

Enable Speaker diarization, can be true or false

speakers_expected?

optional speakers_expected: TranscriptOptionalParamsSpeakersExpected

Tells the speaker label model how many speakers it should attempt to identify. See Speaker diarization for more details.

speech_model?

optional speech_model: TranscriptOptionalParamsSpeechModel

The speech model to use for the transcription. When null, the "best" model is used.

speech_threshold?

optional speech_threshold: TranscriptOptionalParamsSpeechThreshold

Reject audio files that contain less than this fraction of speech. Valid values are in the range [0, 1] inclusive.

summarization?

optional summarization: boolean

Enable Summarization, can be true or false

summary_model?

optional summary_model: SummaryModel

The model to summarize the transcript

summary_type?

optional summary_type: SummaryType

The type of summary

topics?

optional topics: string[]

The list of custom topics

webhook_auth_header_name?

optional webhook_auth_header_name: TranscriptOptionalParamsWebhookAuthHeaderName

The header name to be sent with the transcript completed or failed webhook requests

webhook_auth_header_value?

optional webhook_auth_header_value: TranscriptOptionalParamsWebhookAuthHeaderValue

The header value to send back with the transcript completed or failed webhook requests for added security

The URL to which we send webhook requests. We sends two different types of webhook requests. One request when a transcript is completed or failed, and one request when the redacted audio is ready if redact_pii_audio is enabled.

word_boost?

optional word_boost: string[]

The list of custom vocabulary to boost transcription probability for

Deprecated

AssemblyAISentimentResult

The result of the Sentiment Analysis model

Properties

confidence

confidence: number

The confidence score for the detected sentiment of the sentence, from 0 to 1

Minimum

Maximum

end

end: number

The ending time, in milliseconds, of the sentence

sentiment

sentiment: Sentiment

The detected sentiment for the sentence, one of POSITIVE, NEUTRAL, NEGATIVE

speaker

speaker: SentimentAnalysisResultSpeaker

The speaker of the sentence if Speaker Diarization is enabled, else null

start

start: number

The starting time, in milliseconds, of the sentence

text

text: string

The transcript of the sentence

channel?

optional channel: SentimentAnalysisResultChannel

The channel of this utterance. The left and right channels are channels 1 and 2. Additional channels increment the channel number sequentially.

AssemblyAITopicsResult

The result of the Topic Detection model, if it is enabled. See Topic Detection for more information.

Properties

results

results: TopicDetectionResult[]

An array of results for the Topic Detection model

status

status: AudioIntelligenceModelStatus

The status of the Topic Detection model. Either success, or unavailable in the rare case that the model failed.

summary

summary: TopicDetectionModelResultSummary

The overall relevance of topic to the entire audio file

AudioAckEvent

Audio chunk acknowledgment event

Properties

byteRange?

optional byteRange: [number, number]

Byte range of the acknowledged audio chunk [start, end]

timeRange?

optional timeRange: [number, number]

Time range in seconds of the acknowledged audio chunk [start, end]

timestamp?

optional timestamp: string

Acknowledgment timestamp

AudioChunk

Audio chunk for streaming transcription

Properties

data

data: Buffer<ArrayBufferLike> | Uint8Array<ArrayBufferLike>

Audio data as Buffer or Uint8Array

isLast?

optional isLast: boolean

Whether this is the last chunk

ChapterizationEvent

Post-processing chapterization event

Properties

chapters

chapters: object[]

Generated chapters

end

end: number

End time in seconds

headline

headline: string

Chapter title/headline

start

start: number

Start time in seconds

summary

summary: string

Chapter summary

error?

optional error: string

Error if chapterization failed

DeepgramExtendedData

Extended data from Deepgram transcription Includes detailed metadata, model info, and feature-specific data

Properties

metadata?

optional metadata: ListenV1ResponseMetadata

Full response metadata

modelInfo?

optional modelInfo: Record<string, unknown>

Model versions used

requestId?

optional requestId: string

Request ID for debugging/tracking

sha256?

optional sha256: string

SHA256 hash of the audio

tags?

optional tags: string[]

Tags echoed back from request

EntityEvent

Named entity recognition result

Properties

text

text: string

Entity text

type

type: string

Entity type (PERSON, ORGANIZATION, LOCATION, etc.)

end?

optional end: number

End position

start?

optional start: number

Start position

utteranceId?

optional utteranceId: string

Utterance ID this entity belongs to

GladiaExtendedData

Extended data from Gladia transcription Includes translation, moderation, entities, LLM outputs, and more

Properties

audioToLlm?

optional audioToLlm: AudioToLlmListDTO

Audio-to-LLM custom prompt results

chapters?

optional chapters: ChapterizationDTO

Auto-generated chapters

customMetadata?

optional customMetadata: Record<string, unknown>

Custom metadata echoed back

entities?

optional entities: NamedEntityRecognitionDTO

Named entity recognition results

moderation?

optional moderation: ModerationDTO

Content moderation results

sentiment?

optional sentiment: SentimentAnalysisDTO

Sentiment analysis results

speakerReidentification?

optional speakerReidentification: SpeakerReidentificationDTO

AI speaker reidentification results

structuredData?

optional structuredData: StructuredDataExtractionDTO

Structured data extraction results

translation?

optional translation: TranslationDTO

Translation results (if translation enabled)

LifecycleEvent

Lifecycle event (session start, recording end, etc.)

Properties

eventType

eventType: "start_session" | "start_recording" | "stop_recording" | "end_recording" | "end_session"

Lifecycle event type

sessionId?

optional sessionId: string

Session ID

timestamp?

optional timestamp: string

Event timestamp

ListTranscriptsOptions

Options for listing transcripts with date/time filtering

Providers support different filtering capabilities:

AssemblyAI: status, created_on, before_id, after_id, throttled_only
Gladia: status, date, before_date, after_date, custom_metadata
Azure: status, skip, top, filter (OData)
Deepgram: start, end, status, page, request_id, endpoint (requires projectId)

Examples

await adapter.listTranscripts({
  date: '2026-01-07',           // Exact date (ISO format)
  status: 'completed',
  limit: 50
})

await adapter.listTranscripts({
  afterDate: '2026-01-01',
  beforeDate: '2026-01-31',
  limit: 100
})

Properties

afterDate?

optional afterDate: string

Filter for transcripts created after this date (ISO format)

assemblyai?

optional assemblyai: Partial<ListTranscriptsParams>

AssemblyAI-specific list options

beforeDate?

optional beforeDate: string

Filter for transcripts created before this date (ISO format)

date?

optional date: string

Filter by exact date (ISO format: YYYY-MM-DD)

deepgram?

optional deepgram: Partial<ManageV1ProjectsRequestsListParams>

Deepgram-specific list options (request history)

gladia?

optional gladia: Partial<TranscriptionControllerListV2Params>

Gladia-specific list options

limit?

optional limit: number

Maximum number of transcripts to retrieve

offset?

optional offset: number

Pagination offset (skip N results)

status?

optional status: string

Filter by transcript status

ListTranscriptsResponse

Response from listTranscripts

Example

import type { ListTranscriptsResponse } from 'voice-router-dev';

const response: ListTranscriptsResponse = await router.listTranscripts('assemblyai', {
  status: 'completed',
  limit: 50
});

response.transcripts.forEach(item => {
  console.log(item.data?.id, item.data?.status);
});

if (response.hasMore) {
  // Fetch next page
}

Properties

transcripts

transcripts: UnifiedTranscriptResponse<TranscriptionProvider>[]

List of transcripts

hasMore?

optional hasMore: boolean

Whether more results are available

total?

optional total: number

Total count (if available from provider)

OpenAIWhisperOptions

Properties

file

file: Blob

The audio file object (not file name) to transcribe, in one of these formats: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, or webm.

model

model: string

ID of the model to use. The options are gpt-4o-transcribe, gpt-4o-mini-transcribe, gpt-4o-mini-transcribe-2025-12-15, whisper-1 (which is powered by our open source Whisper V2 model), and gpt-4o-transcribe-diarize.

chunking_strategy?

optional chunking_strategy: TranscriptionChunkingStrategy

include?

optional include: "logprobs"[]

Additional information to include in the transcription response. logprobs will return the log probabilities of the tokens in the response to understand the model's confidence in the transcription. logprobs only works with response_format set to json and only with the models gpt-4o-transcribe, gpt-4o-mini-transcribe, and gpt-4o-mini-transcribe-2025-12-15. This field is not supported when using gpt-4o-transcribe-diarize.

known_speaker_names?

optional known_speaker_names: string[]

Optional list of speaker names that correspond to the audio samples provided in known_speaker_references[]. Each entry should be a short identifier (for example customer or agent). Up to 4 speakers are supported.

Max Items

known_speaker_references?

optional known_speaker_references: string[]

Optional list of audio samples (as data URLs) that contain known speaker references matching known_speaker_names[]. Each sample must be between 2 and 10 seconds, and can use any of the same input audio formats supported by file.

Max Items

language?

optional language: string

The language of the input audio. Supplying the input language in ISO-639-1 (e.g. en) format will improve accuracy and latency.

prompt?

optional prompt: string

An optional text to guide the model's style or continue a previous audio segment. The prompt should match the audio language. This field is not supported when using gpt-4o-transcribe-diarize.

response_format?

optional response_format: AudioResponseFormat

stream?

optional stream: CreateTranscriptionRequestStream

temperature?

optional temperature: number

The sampling temperature, between 0 and 1. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. If set to 0, the model will use log probability to automatically increase the temperature until certain thresholds are hit.

timestamp_granularities?

optional timestamp_granularities: CreateTranscriptionRequestTimestampGranularitiesItem[]

The timestamp granularities to populate for this transcription. response_format must be set verbose_json to use timestamp granularities. Either or both of these options are supported: word, or segment. Note: There is no additional latency for segment timestamps, but generating word timestamps incurs additional latency. This option is not available for gpt-4o-transcribe-diarize.

ProviderCapabilities

Provider capability flags

Each boolean indicates whether the provider supports a specific feature. Use ProviderCapabilitiesMap from provider-metadata for runtime access.

Properties

customVocabulary

customVocabulary: boolean

Custom vocabulary/keyword boosting

deleteTranscript

deleteTranscript: boolean

Delete transcriptions

diarization

diarization: boolean

Speaker diarization (identifying different speakers)

entityDetection

entityDetection: boolean

Entity detection

languageDetection

languageDetection: boolean

Automatic language detection

listTranscripts

listTranscripts: boolean

List/fetch previous transcriptions

piiRedaction

piiRedaction: boolean

PII redaction

sentimentAnalysis

sentimentAnalysis: boolean

Sentiment analysis

streaming

streaming: boolean

Real-time streaming transcription support

summarization

summarization: boolean

Audio summarization

wordTimestamps

wordTimestamps: boolean

Word-level timestamps

getAudioFile?

optional getAudioFile: boolean

Download original audio file

RawWebSocketMessage

Raw WebSocket message from provider

Captures the exact payload received from or sent to the provider WebSocket, before any SDK normalization or transformation. Useful for debugging and storing exact provider payloads for replay/analysis.

Example

onRawMessage: (msg) => {
  // Store raw payload for debugging
  rawMessages.push({
    provider: msg.provider,
    direction: msg.direction,
    timestamp: msg.timestamp,
    payload: msg.payload,
    messageType: msg.messageType
  });
}

Properties

direction

direction: "incoming" | "outgoing"

Message direction

payload

payload: string | ArrayBuffer

Raw payload exactly as received/sent

For incoming JSON messages: the raw string before parsing
For outgoing audio: the binary data being sent
For incoming binary: ArrayBuffer as received

provider

provider: string

Provider name (e.g., 'gladia', 'deepgram', 'soniox')

timestamp

timestamp: number

Timestamp in milliseconds (Date.now())

messageType?

optional messageType: string

Message type if cheaply derivable from payload

Provider-specific values:

Gladia: 'transcript', 'post_final_transcript', 'speech_start', 'speech_end', etc.
Deepgram: 'Results', 'Metadata', 'UtteranceEnd', etc.
AssemblyAI: 'SessionBegins', 'PartialTranscript', 'FinalTranscript', etc.
Soniox: derived from payload structure

SentimentEvent

Sentiment analysis result (for real-time sentiment)

Properties

sentiment

sentiment: string

Sentiment label (positive, negative, neutral)

confidence?

optional confidence: number

Confidence score 0-1

utteranceId?

optional utteranceId: string

Utterance ID this sentiment belongs to

Speaker

Speaker information from diarization

Properties

id

id: string

Speaker identifier (e.g., "A", "B", "speaker_0")

confidence?

optional confidence: number

Confidence score for speaker identification (0-1)

label?

optional label: string

Speaker label if known

SpeechEvent

Speech event data (for speech_start/speech_end events)

Properties

timestamp

timestamp: number

Timestamp in seconds

type

type: "speech_start" | "speech_end"

Event type: speech_start or speech_end

channel?

optional channel: number

Channel number

sessionId?

optional sessionId: string

Session ID

StreamEvent

Streaming transcription event

Properties

type

type: StreamEventType

channel?

optional channel: number

Channel number for multi-channel audio

confidence?

optional confidence: number

Confidence score for this event

data?

optional data: unknown

Additional event data

error?

optional error: object

Error information (for type: "error")

code

code: string

message

message: string

details?

optional details: unknown

isFinal?

optional isFinal: boolean

Whether this is a final transcript (vs interim)

language?

optional language: string

Language of the transcript/utterance

speaker?

optional speaker: string

Speaker ID if diarization is enabled

text?

optional text: string

Partial transcript text (for type: "transcript")

utterance?

optional utterance: Utterance

Utterance data (for type: "utterance")

words?

optional words: Word[]

Words in this event

StreamingCallbacks

Properties

onAudioAck()?

optional onAudioAck: (event) => void

Called for audio chunk acknowledgments (Gladia: requires receive_acknowledgments)

Parameters

Parameter	Type
`event`	`AudioAckEvent`

Returns

void

onChapterization()?

optional onChapterization: (event) => void

Called when post-processing chapterization completes (Gladia: requires chapterization enabled)

Parameters

Parameter	Type
`event`	`ChapterizationEvent`

Returns

void

onClose()?

optional onClose: (code?, reason?) => void

Called when the stream is closed

Parameters

Parameter	Type
`code?`	`number`
`reason?`	`string`

Returns

void

onEntity()?

optional onEntity: (event) => void

Called for named entity recognition (Gladia: requires named_entity_recognition enabled)

Parameters

Parameter	Type
`event`	`EntityEvent`

Returns

void

onError()?

optional onError: (error) => void

Called when an error occurs

Parameters

Parameter	Type
`error`	{ `code`: `string`; `message`: `string`; `details?`: `unknown`; }
`error.code`	`string`
`error.message`	`string`
`error.details?`	`unknown`

Returns

void

onLifecycle()?

optional onLifecycle: (event) => void

Called for session lifecycle events (Gladia: requires receive_lifecycle_events)

Parameters

Parameter	Type
`event`	`LifecycleEvent`

Returns

void

onMetadata()?

optional onMetadata: (metadata) => void

Called when metadata is received

Parameters

Parameter	Type
`metadata`	`Record`<`string`, `unknown`>

Returns

void

onOpen()?

optional onOpen: () => void

Called when connection is established

Returns

void

onRawMessage()?

optional onRawMessage: (message) => void

Called for every raw WebSocket message before SDK processing

Captures exact provider payloads for debugging, replay, and analysis. Invoked for both incoming messages from the provider and outgoing audio chunks sent to the provider.

Parameters

Parameter	Type
`message`	`RawWebSocketMessage`

Returns

void

Example

const rawMessages: RawWebSocketMessage[] = [];

await adapter.transcribeStream(options, {
  onRawMessage: (msg) => {
    rawMessages.push(msg);
    // Or stream to storage immediately
  },
  onTranscript: (event) => { ... }
});

// After session, rawMessages contains all provider payloads

onSentiment()?

optional onSentiment: (event) => void

Called for real-time sentiment analysis (Gladia: requires sentiment_analysis enabled)

Parameters

Parameter	Type
`event`	`SentimentEvent`

Returns

void

onSpeechEnd()?

optional onSpeechEnd: (event) => void

Called when speech ends (Gladia: requires receive_speech_events)

Parameters

Parameter	Type
`event`	`SpeechEvent`

Returns

void

onSpeechStart()?

optional onSpeechStart: (event) => void

Called when speech starts (Gladia: requires receive_speech_events)

Parameters

Parameter	Type
`event`	`SpeechEvent`

Returns

void

onSummarization()?

optional onSummarization: (event) => void

Called when post-processing summarization completes (Gladia: requires summarization enabled)

Parameters

Parameter	Type
`event`	`SummarizationEvent`

Returns

void

onTranscript()?

optional onTranscript: (event) => void

Called when a transcript (interim or final) is received

Parameters

Parameter	Type
`event`	`StreamEvent`

Returns

void

onTranslation()?

optional onTranslation: (event) => void

Called for real-time translation (Gladia: requires translation enabled)

Parameters

Parameter	Type
`event`	`TranslationEvent`

Returns

void

onUtterance()?

optional onUtterance: (utterance) => void

Called when a complete utterance is detected

Parameters

Parameter	Type
`utterance`	`Utterance`

Returns

void

StreamingOptions

Options for streaming transcription

Extends

Omit<TranscribeOptions, "webhookUrl">

Properties

assemblyai?

optional assemblyai: Partial<AssemblyAIOptions>

AssemblyAI-specific options (passed directly to API)

See

https://www.assemblyai.com/docs/api-reference/transcripts/submit

Inherited from

TranscribeOptions.assemblyai

assemblyaiStreaming?

optional assemblyaiStreaming: AssemblyAIStreamingOptions

AssemblyAI-specific streaming options (passed to WebSocket URL & configuration)

Includes end-of-turn detection tuning, VAD threshold, profanity filter, keyterms, speech model selection, and language detection.

See

https://www.assemblyai.com/docs/speech-to-text/streaming

Example

await adapter.transcribeStream({
  assemblyaiStreaming: {
    speechModel: 'universal-streaming-multilingual',
    languageDetection: true,
    endOfTurnConfidenceThreshold: 0.7,
    minEndOfTurnSilenceWhenConfident: 500,
    vadThreshold: 0.3,
    formatTurns: true,
    filterProfanity: true,
    keyterms: ['TypeScript', 'JavaScript', 'API']
  }
});

audioToLlm?

optional audioToLlm: AudioToLlmListConfigDTO

Audio-to-LLM configuration (Gladia-specific) Run custom LLM prompts on the transcription

See

GladiaAudioToLlmConfig

Inherited from

TranscribeOptions.audioToLlm

bitDepth?

optional bitDepth: number

Bit depth for PCM audio

Common depths: 8, 16, 24, 32 16-bit is standard for most applications

channels?

optional channels: number

Number of audio channels

1: Mono (recommended for transcription)
2: Stereo
3-8: Multi-channel (provider-specific support)

codeSwitching?

optional codeSwitching: boolean

Enable code switching (multilingual audio detection) Supported by: Gladia

Inherited from

TranscribeOptions.codeSwitching

codeSwitchingConfig?

optional codeSwitchingConfig: CodeSwitchingConfigDTO

Code switching configuration (Gladia-specific)

See

GladiaCodeSwitchingConfig

Inherited from

TranscribeOptions.codeSwitchingConfig

customVocabulary?

optional customVocabulary: string[]

Custom vocabulary to boost (provider-specific format)

Inherited from

TranscribeOptions.customVocabulary

deepgram?

optional deepgram: Partial<ListenV1MediaTranscribeParams>

Deepgram-specific options (passed directly to API)

See

https://developers.deepgram.com/reference/listen-file

Inherited from

TranscribeOptions.deepgram

deepgramStreaming?

optional deepgramStreaming: DeepgramStreamingOptions

Deepgram-specific streaming options (passed to WebSocket URL)

Includes filler_words, numerals, measurements, paragraphs, profanity_filter, topics, intents, custom_topic, custom_intent, keyterm, dictation, utt_split, and more.

See

https://developers.deepgram.com/docs/streaming

Example

await adapter.transcribeStream({
  deepgramStreaming: {
    fillerWords: true,
    profanityFilter: true,
    topics: true,
    intents: true,
    customTopic: ['sales', 'support'],
    customIntent: ['purchase', 'complaint'],
    numerals: true
  }
});

diarization?

optional diarization: boolean

Enable speaker diarization

Inherited from

TranscribeOptions.diarization

encoding?

optional encoding: AudioEncoding

Audio encoding format

Common formats:

linear16: PCM 16-bit (universal, recommended)
mulaw: μ-law telephony codec
alaw: A-law telephony codec
flac, opus, speex: Advanced codecs (Deepgram only)

See

AudioEncoding for full list of supported formats

endpointing?

optional endpointing: number

Utterance end silence threshold in milliseconds

entityDetection?

optional entityDetection: boolean

Enable entity detection

Inherited from

TranscribeOptions.entityDetection

gladia?

optional gladia: Partial<InitTranscriptionRequest>

Gladia-specific options (passed directly to API)

See

https://docs.gladia.io/

Inherited from

TranscribeOptions.gladia

gladiaStreaming?

optional gladiaStreaming: Partial<Omit<StreamingRequest, "encoding" | "channels" | "sample_rate" | "bit_depth">>

Gladia-specific streaming options (passed directly to API)

Includes pre_processing, realtime_processing, post_processing, messages_config, and callback configuration.

See

https://docs.gladia.io/api-reference/v2/live

Example

await adapter.transcribeStream({
  gladiaStreaming: {
    realtime_processing: {
      words_accurate_timestamps: true
    },
    messages_config: {
      receive_partial_transcripts: true
    }
  }
});

interimResults?

optional interimResults: boolean

Enable interim results (partial transcripts)

language?

optional language: TranscriptionLanguage

Language code with autocomplete from OpenAPI specs

Example

'en', 'en_us', 'fr', 'de', 'es'

See

TranscriptionLanguage for full list

Inherited from

TranscribeOptions.language

languageDetection?

optional languageDetection: boolean

Enable automatic language detection

Inherited from

TranscribeOptions.languageDetection

maxSilence?

optional maxSilence: number

Maximum duration without endpointing in seconds

model?

optional model: TranscriptionModel

Model to use for transcription (provider-specific)

Type-safe with autocomplete for all known models:

Deepgram: 'nova-2', 'nova-3', 'base', 'enhanced', 'whisper-large', etc.
Gladia: 'solaria-1' (default)
AssemblyAI: Not applicable (uses Universal-2 automatically)

Example

// Use Nova-2 for better multilingual support
{ model: 'nova-2', language: 'fr' }

Overrides

TranscribeOptions.model

openai?

optional openai: Partial<Omit<OpenAIWhisperOptions, "model" | "file">>

OpenAI Whisper-specific options (passed directly to API)

See

https://platform.openai.com/docs/api-reference/audio/createTranscription

Inherited from

TranscribeOptions.openai

openaiStreaming?

optional openaiStreaming: OpenAIStreamingOptions

OpenAI Realtime API streaming options

Configure the OpenAI Realtime WebSocket connection for audio transcription. Uses the Realtime API which supports real-time audio input transcription.

See

https://platform.openai.com/docs/guides/realtime

Example

await adapter.transcribeStream({
  openaiStreaming: {
    model: 'gpt-4o-realtime-preview',
    voice: 'alloy',
    turnDetection: {
      type: 'server_vad',
      threshold: 0.5,
      silenceDurationMs: 500
    }
  }
});

piiRedaction?

optional piiRedaction: boolean

Enable PII redaction

Inherited from

TranscribeOptions.piiRedaction

region?

optional region: StreamingSupportedRegions

Regional endpoint for streaming (Gladia only)

Gladia supports regional streaming endpoints for lower latency:

us-west: US West Coast
eu-west: EU West (Ireland)

Example

import { GladiaRegion } from 'voice-router-dev/constants'

await adapter.transcribeStream({
  region: GladiaRegion["us-west"]
})

See

https://docs.gladia.io/api-reference/v2/live

sampleRate?

optional sampleRate: number

Sample rate in Hz

Common rates: 8000, 16000, 32000, 44100, 48000 Most providers recommend 16000 Hz for optimal quality/performance

sentimentAnalysis?

optional sentimentAnalysis: boolean

Enable sentiment analysis

Inherited from

TranscribeOptions.sentimentAnalysis

sonioxStreaming?

optional sonioxStreaming: SonioxStreamingOptions

Soniox-specific streaming options

Configure the Soniox WebSocket connection for real-time transcription. Supports speaker diarization, language identification, translation, and custom context.

See

https://soniox.com/docs/stt/SDKs/web-sdk

Example

await adapter.transcribeStream({
  sonioxStreaming: {
    model: 'stt-rt-preview',
    enableSpeakerDiarization: true,
    enableEndpointDetection: true,
    context: {
      terms: ['TypeScript', 'React'],
      text: 'Technical discussion'
    },
    translation: { type: 'one_way', target_language: 'es' }
  }
});

speakersExpected?

optional speakersExpected: number

Expected number of speakers (for diarization)

Inherited from

TranscribeOptions.speakersExpected

summarization?

optional summarization: boolean

Enable summarization

Inherited from

TranscribeOptions.summarization

wordTimestamps?

optional wordTimestamps: boolean

Enable word-level timestamps

Inherited from

TranscribeOptions.wordTimestamps

StreamingSession

Represents an active streaming transcription session

Properties

close()

close: () => Promise<void>

Close the streaming session

Returns

Promise<void>

createdAt

createdAt: Date

Session creation timestamp

getStatus()

getStatus: () => "open" | "connecting" | "closing" | "closed"

Get current session status

Returns

"open" | "connecting" | "closing" | "closed"

id

id: string

Unique session ID

provider

provider: TranscriptionProvider

Provider handling this stream

sendAudio()

sendAudio: (chunk) => Promise<void>

Send an audio chunk to the stream

Parameters

Parameter	Type
`chunk`	`AudioChunk`

Returns

Promise<void>

SummarizationEvent

Post-processing summarization event

Properties

summary

summary: string

Full summarization text

error?

optional error: string

Error if summarization failed

TranscribeOptions

Common transcription options across all providers

For provider-specific options, use the typed provider options:

deepgram: Full Deepgram API options
assemblyai: Full AssemblyAI API options
gladia: Full Gladia API options

Properties

assemblyai?

optional assemblyai: Partial<AssemblyAIOptions>

AssemblyAI-specific options (passed directly to API)

See

https://www.assemblyai.com/docs/api-reference/transcripts/submit

audioToLlm?

optional audioToLlm: AudioToLlmListConfigDTO

Audio-to-LLM configuration (Gladia-specific) Run custom LLM prompts on the transcription

See

GladiaAudioToLlmConfig

codeSwitching?

optional codeSwitching: boolean

Enable code switching (multilingual audio detection) Supported by: Gladia

codeSwitchingConfig?

optional codeSwitchingConfig: CodeSwitchingConfigDTO

Code switching configuration (Gladia-specific)

See

GladiaCodeSwitchingConfig

customVocabulary?

optional customVocabulary: string[]

Custom vocabulary to boost (provider-specific format)

deepgram?

optional deepgram: Partial<ListenV1MediaTranscribeParams>

Deepgram-specific options (passed directly to API)

See

https://developers.deepgram.com/reference/listen-file

diarization?

optional diarization: boolean

Enable speaker diarization

entityDetection?

optional entityDetection: boolean

Enable entity detection

gladia?

optional gladia: Partial<InitTranscriptionRequest>

Gladia-specific options (passed directly to API)

See

https://docs.gladia.io/

language?

optional language: TranscriptionLanguage

Language code with autocomplete from OpenAPI specs

Example

'en', 'en_us', 'fr', 'de', 'es'

See

TranscriptionLanguage for full list

languageDetection?

optional languageDetection: boolean

Enable automatic language detection

model?

optional model: TranscriptionModel

Model to use for transcription (provider-specific)

Type-safe model selection derived from OpenAPI specs:

Deepgram: 'nova-3', 'nova-2', 'enhanced', 'base', etc.
AssemblyAI: 'best', 'slam-1', 'universal'
Speechmatics: 'standard', 'enhanced' (operating point)
Gladia: 'solaria-1' (streaming only)

See

TranscriptionModel for full list of available models

openai?

optional openai: Partial<Omit<OpenAIWhisperOptions, "model" | "file">>

OpenAI Whisper-specific options (passed directly to API)

See

https://platform.openai.com/docs/api-reference/audio/createTranscription

piiRedaction?

optional piiRedaction: boolean

Enable PII redaction

sentimentAnalysis?

optional sentimentAnalysis: boolean

Enable sentiment analysis

speakersExpected?

optional speakersExpected: number

Expected number of speakers (for diarization)

summarization?

optional summarization: boolean

Enable summarization

webhookUrl?

optional webhookUrl: string

Webhook URL for async results

wordTimestamps?

optional wordTimestamps: boolean

Enable word-level timestamps

TranscriptData

Transcript data structure

Contains the core transcript information returned by getTranscript and listTranscripts.

Example

const result = await router.getTranscript('abc123', 'assemblyai');
if (result.success && result.data) {
  console.log(result.data.id);           // string
  console.log(result.data.text);         // string
  console.log(result.data.status);       // TranscriptionStatus
  console.log(result.data.metadata);     // TranscriptMetadata
}

Properties

id

id: string

Unique transcript ID

status

status: TranscriptionStatus

Transcription status

text

text: string

Full transcribed text (empty for list items)

completedAt?

optional completedAt: string

Completion timestamp (shorthand for metadata.completedAt)

confidence?

optional confidence: number

Overall confidence score (0-1)

createdAt?

optional createdAt: string

Creation timestamp (shorthand for metadata.createdAt)

duration?

optional duration: number

Audio duration in seconds

language?

optional language: string

Detected or specified language code

metadata?

optional metadata: TranscriptMetadata

Transcript metadata

speakers?

optional speakers: Speaker[]

Speaker diarization results

summary?

optional summary: string

Summary of the content (if summarization enabled)

utterances?

optional utterances: Utterance[]

Utterances (speaker turns)

words?

optional words: Word[]

Word-level transcription with timestamps

TranscriptMetadata

Transcript metadata with typed common fields

Contains provider-agnostic metadata fields that are commonly available. Provider-specific fields can be accessed via the index signature.

Example

const { transcripts } = await router.listTranscripts('assemblyai', { limit: 20 });
transcripts.forEach(item => {
  console.log(item.data?.metadata?.audioUrl);     // string | undefined
  console.log(item.data?.metadata?.createdAt);    // string | undefined
  console.log(item.data?.metadata?.audioDuration); // number | undefined
});

Indexable

[key: string]: unknown

Provider-specific fields

Properties

audioDuration?

optional audioDuration: number

Audio duration in seconds

audioFileAvailable?

optional audioFileAvailable: boolean

True if the provider stored the audio and it can be downloaded via adapter.getAudioFile(). Currently only Gladia supports this - other providers discard audio after processing.

Example

if (item.data?.metadata?.audioFileAvailable) {
  const audio = await gladiaAdapter.getAudioFile(item.data.id)
  // audio.data is a Blob
}

completedAt?

optional completedAt: string

Completion timestamp (ISO 8601)

createdAt?

optional createdAt: string

Creation timestamp (ISO 8601)

customMetadata?

optional customMetadata: Record<string, unknown>

Custom metadata (Gladia)

displayName?

optional displayName: string

Display name (Azure)

filesUrl?

optional filesUrl: string

Files URL (Azure)

kind?

optional kind: "batch" | "streaming" | "pre-recorded" | "live"

Transcript type

lastActionAt?

optional lastActionAt: string

Last action timestamp (Azure)

resourceUrl?

optional resourceUrl: string

Resource URL for the transcript

sourceAudioUrl?

optional sourceAudioUrl: string

Original audio URL/source you provided to the API (echoed back). This is NOT a provider-hosted URL - it's what you sent when creating the transcription.

TranslationEvent

Translation event data (for real-time translation)

Properties

targetLanguage

targetLanguage: string

Target language

translatedText

translatedText: string

Translated text

isFinal?

optional isFinal: boolean

Whether this is a final translation

original?

optional original: string

Original text

utteranceId?

optional utteranceId: string

Utterance ID this translation belongs to

UnifiedTranscriptResponse

Unified transcription response with provider-specific type safety

When a specific provider is known at compile time, both raw and extended fields will be typed with that provider's actual types.

Examples

const result: UnifiedTranscriptResponse<'assemblyai'> = await adapter.transcribe(audio);
// result.raw is typed as AssemblyAITranscript
// result.extended is typed as AssemblyAIExtendedData
const chapters = result.extended?.chapters; // AssemblyAIChapter[] | undefined
const entities = result.extended?.entities; // AssemblyAIEntity[] | undefined

const result: UnifiedTranscriptResponse<'gladia'> = await gladiaAdapter.transcribe(audio);
const translation = result.extended?.translation; // GladiaTranslation | undefined
const llmResults = result.extended?.audioToLlm; // GladiaAudioToLlmResult | undefined

const result: UnifiedTranscriptResponse = await router.transcribe(audio);
// result.raw is typed as unknown (could be any provider)
// result.extended is typed as union of all extended types

Type Parameters

Type Parameter	Default type	Description
`P` extends `TranscriptionProvider`	`TranscriptionProvider`	The transcription provider (defaults to all providers)

Properties

provider

provider: P

Provider that performed the transcription

success

success: boolean

Operation success status

data?

optional data: TranscriptData

Transcription data (only present on success)

error?

optional error: object

Error information (only present on failure)

code

code: string

Error code (provider-specific or normalized)

message

message: string

Human-readable error message

details?

optional details: unknown

Additional error details

statusCode?

optional statusCode: number

HTTP status code if applicable

extended?

optional extended: P extends keyof ProviderExtendedDataMap ? ProviderExtendedDataMap[P<P>] : unknown

Extended provider-specific data (fully typed from OpenAPI specs)

Contains rich data beyond basic transcription:

AssemblyAI: chapters, entities, sentiment, content safety, topics
Gladia: translation, moderation, entities, audio-to-llm, chapters
Deepgram: detailed metadata, request tracking, model info

Example

const result = await assemblyaiAdapter.transcribe(audio, { summarization: true });
result.extended?.chapters?.forEach(chapter => {
  console.log(`${chapter.headline}: ${chapter.summary}`);
});

raw?

optional raw: P extends keyof ProviderRawResponseMap ? ProviderRawResponseMap[P<P>] : unknown

Raw provider response (for advanced usage)

Type-safe based on the provider:

gladia: PreRecordedResponse
deepgram: ListenV1Response
openai-whisper: CreateTranscription200One
assemblyai: AssemblyAITranscript
azure-stt: AzureTranscription

tracking?

optional tracking: object

Request tracking information for debugging

audioHash?

optional audioHash: string

Audio fingerprint (SHA256) if available

processingTimeMs?

optional processingTimeMs: number

Processing duration in milliseconds

requestId?

optional requestId: string

Provider's request/job ID

Utterance

Utterance (sentence or phrase by a single speaker)

Normalized from provider-specific types:

Gladia: UtteranceDTO
AssemblyAI: TranscriptUtterance
Deepgram: ListenV1ResponseResultsUtterancesItem

Properties

end

end: number

End time in seconds

start

start: number

Start time in seconds

text

text: string

The transcribed text

channel?

optional channel: number

Audio channel number (for multi-channel/stereo recordings)

Channel numbering varies by provider:

AssemblyAI: 1=left, 2=right, sequential for additional channels
Deepgram: 0-indexed channel number
Gladia: 0-indexed channel number

confidence?

optional confidence: number

Confidence score (0-1)

id?

optional id: string

Unique utterance identifier (provider-assigned)

Available from: Deepgram Useful for linking utterances to other data (entities, sentiment, etc.)

language?

optional language: string

Detected language for this utterance (BCP-47 code)

Available from: Gladia (with code-switching enabled) Essential for multilingual transcription where language changes mid-conversation.

Example

'en', 'es', 'fr', 'de'

See

TranscriptionLanguage for full list of supported codes

speaker?

optional speaker: string

Speaker ID

words?

optional words: Word[]

Words in this utterance

Word

Word-level transcription with timing

Normalized from provider-specific types:

Gladia: WordDTO
AssemblyAI: TranscriptWord
Deepgram: ListenV1ResponseResultsChannelsItemAlternativesItemWordsItem

Properties

end

end: number

End time in seconds

start

start: number

Start time in seconds

word

word: string

The transcribed word

channel?

optional channel: number

Audio channel number (for multi-channel/stereo recordings)

Channel numbering varies by provider:

AssemblyAI: 1=left, 2=right, sequential for additional channels
Deepgram: 0-indexed channel number
Gladia: 0-indexed channel number

confidence?

optional confidence: number

Confidence score (0-1)

speaker?

optional speaker: string

Speaker ID if diarization is enabled

Type Aliases

AudioInput

AudioInput = AudioInputUrl | AudioInputFile | AudioInputStream

Union of all audio input types

BatchOnlyProvider

BatchOnlyProvider = BatchOnlyProviderType

Providers that only support batch/async transcription

Automatically derived from providers where streaming is false or undefined. Note: Speechmatics has a WebSocket API but streaming is not yet implemented in this SDK.

ProviderExtendedDataMap

ProviderExtendedDataMap = object

Map of provider names to their extended data types

Properties

assemblyai

assemblyai: AssemblyAIExtendedData

azure-stt

azure-stt: Record<string, never>

deepgram

deepgram: DeepgramExtendedData

gladia

gladia: GladiaExtendedData

openai-whisper

openai-whisper: Record<string, never>

soniox

soniox: Record<string, never>

speechmatics

speechmatics: Record<string, never>

ProviderRawResponseMap

ProviderRawResponseMap = object

Map of provider names to their raw response types Enables type-safe access to provider-specific raw responses

Properties

assemblyai

assemblyai: Transcript

azure-stt

azure-stt: AzureTranscription

deepgram

deepgram: ListenV1Response

gladia

gladia: PreRecordedResponse

openai-whisper

openai-whisper: CreateTranscription200One

soniox

soniox: unknown

speechmatics

speechmatics: unknown

SessionStatus

SessionStatus = "connecting" | "open" | "closing" | "closed"

WebSocket session status for streaming transcription

SpeechmaticsOperatingPoint

SpeechmaticsOperatingPoint = "standard" | "enhanced"

Speechmatics operating point (model) type Manually defined as Speechmatics OpenAPI spec doesn't export this cleanly

StreamEventType

StreamEventType = "open" | "transcript" | "utterance" | "metadata" | "error" | "close" | "speech_start" | "speech_end" | "translation" | "sentiment" | "entity" | "summarization" | "chapterization" | "audio_ack" | "lifecycle"

Streaming transcription event types

StreamingProvider

StreamingProvider = StreamingProviderType

Providers that support real-time streaming transcription

This type is automatically derived from ProviderCapabilitiesMap.streaming in provider-metadata.ts No manual sync needed - if you set streaming: true for a provider, it's included here.

TranscriptionLanguage

TranscriptionLanguage = AssemblyAILanguageCode | GladiaLanguageCode | DeepgramLanguageCode | SonioxLanguageCode | SpeechmaticsLanguageCode | AzureLocaleCode

Unified transcription language type with autocomplete for all providers

Strict union type - only accepts valid language codes from each provider's auto-generated types. This ensures compile-time validation of language codes.

Provider language sources:

AssemblyAI: OpenAPI spec enum (102 languages)
Gladia: OpenAPI spec enum (99 languages)
Deepgram: Auto-generated from /v1/models API (161 BCP-47 codes)
Soniox: Auto-generated from OpenAPI spec (60 languages)
Speechmatics: Auto-generated from Feature Discovery API (62 languages)
Azure: Auto-generated from Microsoft docs (154 locales)

Use provider const objects for autocomplete:

Example

import { DeepgramLanguage, SonioxLanguage } from 'voice-router-dev/constants'
{ language: DeepgramLanguage["en-US"] }
{ language: SonioxLanguage.en }

TranscriptionModel

TranscriptionModel = DeepgramModelType | StreamingSupportedModels | AssemblyAISpeechModel | SonioxModelCode | SpeechmaticsOperatingPoint

Unified transcription model type with autocomplete for all providers

Strict union type - only accepts valid models from each provider:

Deepgram: nova-3, nova-2, enhanced, base, etc.
AssemblyAI: best, slam-1, universal
Gladia: solaria-1
Soniox: stt-rt-v3, stt-rt-preview, stt-async-v3, etc.
Speechmatics: standard, enhanced

Use provider const objects for autocomplete:

Example

import { DeepgramModel, SonioxModel } from 'voice-router-dev'
{ model: DeepgramModel["nova-3"] }
{ model: SonioxModel.stt_rt_v3 }

TranscriptionProvider

TranscriptionProvider = "gladia" | "assemblyai" | "deepgram" | "openai-whisper" | "azure-stt" | "speechmatics" | "soniox"

Supported transcription provider identifiers

TranscriptionStatus

TranscriptionStatus = "queued" | "processing" | "completed" | "error"

Transcription status