router/types
Voice Router SDK - Gladia Provider / router/types
router/types
Interfaces
AssemblyAIExtendedData
Extended data from AssemblyAI transcription Includes chapters, entities, sentiment, content safety, and more
Properties
chapters?
optionalchapters:Chapter[]
Auto-generated chapters with summaries
contentSafety?
optionalcontentSafety:ContentSafetyLabelsResult
Content safety/moderation labels
entities?
optionalentities:Entity[]
Detected named entities (people, organizations, locations)
highlights?
optionalhighlights:AutoHighlightsResult
Key phrases and highlights
languageConfidence?
optionallanguageConfidence:number
Language detection confidence (0-1)
sentimentResults?
optionalsentimentResults:SentimentAnalysisResult[]
Per-utterance sentiment analysis results
throttled?
optionalthrottled:boolean
Whether the request was throttled
topics?
optionaltopics:TopicDetectionModelResult
IAB topic categories
AudioAckEvent
Audio chunk acknowledgment event
Properties
byteRange?
optionalbyteRange: [number,number]
Byte range of the acknowledged audio chunk [start, end]
timeRange?
optionaltimeRange: [number,number]
Time range in seconds of the acknowledged audio chunk [start, end]
timestamp?
optionaltimestamp:string
Acknowledgment timestamp
AudioChunk
Audio chunk for streaming transcription
Properties
data
data:
Buffer<ArrayBufferLike> |Uint8Array<ArrayBufferLike>
Audio data as Buffer or Uint8Array
isLast?
optionalisLast:boolean
Whether this is the last chunk
ChapterizationEvent
Post-processing chapterization event
Properties
chapters
chapters:
object[]
Generated chapters
end
end:
number
End time in seconds
headline
headline:
string
Chapter title/headline
start
start:
number
Start time in seconds
summary
summary:
string
Chapter summary
error?
optionalerror:string
Error if chapterization failed
DeepgramExtendedData
Extended data from Deepgram transcription Includes detailed metadata, model info, and feature-specific data
Properties
metadata?
optionalmetadata:ListenV1ResponseMetadata
Full response metadata
modelInfo?
optionalmodelInfo:Record<string,unknown>
Model versions used
requestId?
optionalrequestId:string
Request ID for debugging/tracking
sha256?
optionalsha256:string
SHA256 hash of the audio
tags?
optionaltags:string[]
Tags echoed back from request
EntityEvent
Named entity recognition result
Properties
text
text:
string
Entity text
type
type:
string
Entity type (PERSON, ORGANIZATION, LOCATION, etc.)
end?
optionalend:number
End position
start?
optionalstart:number
Start position
utteranceId?
optionalutteranceId:string
Utterance ID this entity belongs to
GladiaAudioToLlmConfig
Generated by orval v7.9.0 🍺 Do not edit manually. Gladia Control API OpenAPI spec version: 1.0
Properties
prompts
prompts:
unknown[][]
The list of prompts applied on the audio transcription
Min Items
1
GladiaAudioToLlmResult
Properties
error
error:
AudioToLlmListDTOError
null if success is true. Contains the error details of the failed model
Nullable
exec_time
exec_time:
number
Time audio intelligence model took to complete the task
is_empty
is_empty:
boolean
The audio intelligence model returned an empty value
results
results:
AudioToLlmDTO[] |null
If audio_to_llm has been enabled, results of the AI custom analysis
Nullable
success
success:
boolean
The audio intelligence model succeeded to get a valid output
GladiaChapters
Properties
error
error:
ChapterizationDTOError
null if success is true. Contains the error details of the failed model
Nullable
exec_time
exec_time:
number
Time audio intelligence model took to complete the task
is_empty
is_empty:
boolean
The audio intelligence model returned an empty value
results
results:
ChapterizationDTOResults
If chapterization has been enabled, will generate chapters name for different parts of the given audio.
success
success:
boolean
The audio intelligence model succeeded to get a valid output
GladiaCodeSwitchingConfig
Properties
languages?
optionallanguages:TranscriptionLanguageCodeEnum[]
Specify the languages you want to use when detecting multiple languages
GladiaEntities
Properties
entity
entity:
string
If named_entity_recognition has been enabled, the detected entities.
error
error:
NamedEntityRecognitionDTOError
null if success is true. Contains the error details of the failed model
Nullable
exec_time
exec_time:
number
Time audio intelligence model took to complete the task
is_empty
is_empty:
boolean
The audio intelligence model returned an empty value
success
success:
boolean
The audio intelligence model succeeded to get a valid output
GladiaExtendedData
Extended data from Gladia transcription Includes translation, moderation, entities, LLM outputs, and more
Properties
audioToLlm?
optionalaudioToLlm:GladiaAudioToLlmResult
Audio-to-LLM custom prompt results
chapters?
optionalchapters:GladiaChapters
Auto-generated chapters
customMetadata?
optionalcustomMetadata:Record<string,unknown>
Custom metadata echoed back
entities?
optionalentities:GladiaEntities
Named entity recognition results
moderation?
optionalmoderation:GladiaModeration
Content moderation results
sentiment?
optionalsentiment:GladiaSentiment
Sentiment analysis results
speakerReidentification?
optionalspeakerReidentification:GladiaSpeakerReidentification
AI speaker reidentification results
structuredData?
optionalstructuredData:GladiaStructuredData
Structured data extraction results
translation?
optionaltranslation:GladiaTranslation
Translation results (if translation enabled)
GladiaModeration
Properties
error
error:
ModerationDTOError
null if success is true. Contains the error details of the failed model
Nullable
exec_time
exec_time:
number
Time audio intelligence model took to complete the task
is_empty
is_empty:
boolean
The audio intelligence model returned an empty value
results
results:
string|null
If moderation has been enabled, moderated transcription
Nullable
success
success:
boolean
The audio intelligence model succeeded to get a valid output
GladiaSentiment
Properties
error
error:
SentimentAnalysisDTOError
null if success is true. Contains the error details of the failed model
Nullable
exec_time
exec_time:
number
Time audio intelligence model took to complete the task
is_empty
is_empty:
boolean
The audio intelligence model returned an empty value
results
results:
string
If sentiment_analysis has been enabled, Gladia will analyze the sentiments and emotions of the audio
success
success:
boolean
The audio intelligence model succeeded to get a valid output
GladiaSpeakerReidentification
Properties
error
error:
SpeakerReidentificationDTOError
null if success is true. Contains the error details of the failed model
Nullable
exec_time
exec_time:
number
Time audio intelligence model took to complete the task
is_empty
is_empty:
boolean
The audio intelligence model returned an empty value
results
results:
string
If speaker_reidentification has been enabled, results of the AI speaker reidentification.
success
success:
boolean
The audio intelligence model succeeded to get a valid output
GladiaStructuredData
Properties
error
error:
StructuredDataExtractionDTOError
null if success is true. Contains the error details of the failed model
Nullable
exec_time
exec_time:
number
Time audio intelligence model took to complete the task
is_empty
is_empty:
boolean
The audio intelligence model returned an empty value
results
results:
string
If structured_data_extraction has been enabled, results of the AI structured data extraction for the defined classes.
success
success:
boolean
The audio intelligence model succeeded to get a valid output
GladiaTranslation
Properties
error
error:
TranslationDTOError
null if success is true. Contains the error details of the failed model
Nullable
exec_time
exec_time:
number
Time audio intelligence model took to complete the task
is_empty
is_empty:
boolean
The audio intelligence model returned an empty value
results
results:
TranslationResultDTO[] |null
List of translated transcriptions, one for each target_languages
Nullable
success
success:
boolean
The audio intelligence model succeeded to get a valid output
LifecycleEvent
Lifecycle event (session start, recording end, etc.)
Properties
eventType
eventType:
"start_session"|"start_recording"|"stop_recording"|"end_recording"|"end_session"
Lifecycle event type
sessionId?
optionalsessionId:string
Session ID
timestamp?
optionaltimestamp:string
Event timestamp
ListTranscriptsOptions
Options for listing transcripts with date/time filtering
Providers support different filtering capabilities:
- AssemblyAI: status, created_on, before_id, after_id, throttled_only
- Gladia: status, date, before_date, after_date, custom_metadata
- Azure: status, skip, top, filter (OData)
- Deepgram: start, end, status, page, request_id, endpoint (requires projectId)
Examples
await adapter.listTranscripts({
date: '2026-01-07', // Exact date (ISO format)
status: 'completed',
limit: 50
})await adapter.listTranscripts({
afterDate: '2026-01-01',
beforeDate: '2026-01-31',
limit: 100
})Properties
afterDate?
optionalafterDate:string
Filter for transcripts created after this date (ISO format)
assemblyai?
optionalassemblyai:Partial<ListTranscriptsParams>
AssemblyAI-specific list options
beforeDate?
optionalbeforeDate:string
Filter for transcripts created before this date (ISO format)
date?
optionaldate:string
Filter by exact date (ISO format: YYYY-MM-DD)
deepgram?
optionaldeepgram:Partial<ManageV1ProjectsRequestsListParams>
Deepgram-specific list options (request history)
gladia?
optionalgladia:Partial<TranscriptionControllerListV2Params>
Gladia-specific list options
limit?
optionallimit:number
Maximum number of transcripts to retrieve
offset?
optionaloffset:number
Pagination offset (skip N results)
status?
optionalstatus:string
Filter by transcript status
ListTranscriptsResponse
Response from listTranscripts
Example
import type { ListTranscriptsResponse } from 'voice-router-dev';
const response: ListTranscriptsResponse = await router.listTranscripts('assemblyai', {
status: 'completed',
limit: 50
});
response.transcripts.forEach(item => {
console.log(item.data?.id, item.data?.status);
});
if (response.hasMore) {
// Fetch next page
}Properties
transcripts
transcripts:
UnifiedTranscriptResponse<TranscriptionProvider>[]
List of transcripts
hasMore?
optionalhasMore:boolean
Whether more results are available
total?
optionaltotal:number
Total count (if available from provider)
OpenAIWhisperOptions
Properties
file
file:
Blob
The audio file object (not file name) to transcribe, in one of these formats: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, or webm.
model
model:
string
ID of the model to use. The options are gpt-4o-transcribe, gpt-4o-mini-transcribe, gpt-4o-mini-transcribe-2025-12-15, whisper-1 (which is powered by our open source Whisper V2 model), and gpt-4o-transcribe-diarize.
chunking_strategy?
optionalchunking_strategy:TranscriptionChunkingStrategy
include?
optionalinclude:"logprobs"[]
Additional information to include in the transcription response.
logprobs will return the log probabilities of the tokens in the
response to understand the model's confidence in the transcription.
logprobs only works with response_format set to json and only with
the models gpt-4o-transcribe, gpt-4o-mini-transcribe, and gpt-4o-mini-transcribe-2025-12-15. This field is not supported when using gpt-4o-transcribe-diarize.
known_speaker_names?
optionalknown_speaker_names:string[]
Optional list of speaker names that correspond to the audio samples provided in known_speaker_references[]. Each entry should be a short identifier (for example customer or agent). Up to 4 speakers are supported.
Max Items
4
known_speaker_references?
optionalknown_speaker_references:string[]
Optional list of audio samples (as data URLs) that contain known speaker references matching known_speaker_names[]. Each sample must be between 2 and 10 seconds, and can use any of the same input audio formats supported by file.
Max Items
4
language?
optionallanguage:string
The language of the input audio. Supplying the input language in ISO-639-1 (e.g. en) format will improve accuracy and latency.
prompt?
optionalprompt:string
An optional text to guide the model's style or continue a previous audio segment. The prompt should match the audio language. This field is not supported when using gpt-4o-transcribe-diarize.
response_format?
optionalresponse_format:AudioResponseFormat
stream?
optionalstream:CreateTranscriptionRequestStream
temperature?
optionaltemperature:number
The sampling temperature, between 0 and 1. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. If set to 0, the model will use log probability to automatically increase the temperature until certain thresholds are hit.
timestamp_granularities?
optionaltimestamp_granularities:CreateTranscriptionRequestTimestampGranularitiesItem[]
The timestamp granularities to populate for this transcription. response_format must be set verbose_json to use timestamp granularities. Either or both of these options are supported: word, or segment. Note: There is no additional latency for segment timestamps, but generating word timestamps incurs additional latency.
This option is not available for gpt-4o-transcribe-diarize.
ProviderCapabilities
Provider capability flags
Each boolean indicates whether the provider supports a specific feature. Use ProviderCapabilitiesMap from provider-metadata for runtime access.
Properties
customVocabulary
customVocabulary:
boolean
Custom vocabulary/keyword boosting
deleteTranscript
deleteTranscript:
boolean
Delete transcriptions
diarization
diarization:
boolean
Speaker diarization (identifying different speakers)
entityDetection
entityDetection:
boolean
Entity detection
languageDetection
languageDetection:
boolean
Automatic language detection
listTranscripts
listTranscripts:
boolean
List/fetch previous transcriptions
piiRedaction
piiRedaction:
boolean
PII redaction
sentimentAnalysis
sentimentAnalysis:
boolean
Sentiment analysis
streaming
streaming:
boolean
Real-time streaming transcription support
summarization
summarization:
boolean
Audio summarization
wordTimestamps
wordTimestamps:
boolean
Word-level timestamps
getAudioFile?
optionalgetAudioFile:boolean
Download original audio file
SentimentEvent
Sentiment analysis result (for real-time sentiment)
Properties
sentiment
sentiment:
string
Sentiment label (positive, negative, neutral)
confidence?
optionalconfidence:number
Confidence score 0-1
utteranceId?
optionalutteranceId:string
Utterance ID this sentiment belongs to
Speaker
Speaker information from diarization
Properties
id
id:
string
Speaker identifier (e.g., "A", "B", "speaker_0")
confidence?
optionalconfidence:number
Confidence score for speaker identification (0-1)
label?
optionallabel:string
Speaker label if known
SpeechEvent
Speech event data (for speech_start/speech_end events)
Properties
timestamp
timestamp:
number
Timestamp in seconds
type
type:
"speech_start"|"speech_end"
Event type: speech_start or speech_end
channel?
optionalchannel:number
Channel number
sessionId?
optionalsessionId:string
Session ID
StreamEvent
Streaming transcription event
Properties
type
type:
StreamEventType
channel?
optionalchannel:number
Channel number for multi-channel audio
confidence?
optionalconfidence:number
Confidence score for this event
data?
optionaldata:unknown
Additional event data
error?
optionalerror:object
Error information (for type: "error")
code
code:
string
message
message:
string
details?
optionaldetails:unknown
isFinal?
optionalisFinal:boolean
Whether this is a final transcript (vs interim)
language?
optionallanguage:string
Language of the transcript/utterance
speaker?
optionalspeaker:string
Speaker ID if diarization is enabled
text?
optionaltext:string
Partial transcript text (for type: "transcript")
utterance?
optionalutterance:Utterance
Utterance data (for type: "utterance")
words?
optionalwords:Word[]
Words in this event
StreamingCallbacks
Callback functions for streaming events
Properties
onAudioAck()?
optionalonAudioAck: (event) =>void
Called for audio chunk acknowledgments (Gladia: requires receive_acknowledgments)
Parameters
| Parameter | Type |
|---|---|
event | AudioAckEvent |
Returns
void
onChapterization()?
optionalonChapterization: (event) =>void
Called when post-processing chapterization completes (Gladia: requires chapterization enabled)
Parameters
| Parameter | Type |
|---|---|
event | ChapterizationEvent |
Returns
void
onClose()?
optionalonClose: (code?,reason?) =>void
Called when the stream is closed
Parameters
| Parameter | Type |
|---|---|
code? | number |
reason? | string |
Returns
void
onEntity()?
optionalonEntity: (event) =>void
Called for named entity recognition (Gladia: requires named_entity_recognition enabled)
Parameters
| Parameter | Type |
|---|---|
event | EntityEvent |
Returns
void
onError()?
optionalonError: (error) =>void
Called when an error occurs
Parameters
| Parameter | Type |
|---|---|
error | { code: string; message: string; details?: unknown; } |
error.code | string |
error.message | string |
error.details? | unknown |
Returns
void
onLifecycle()?
optionalonLifecycle: (event) =>void
Called for session lifecycle events (Gladia: requires receive_lifecycle_events)
Parameters
| Parameter | Type |
|---|---|
event | LifecycleEvent |
Returns
void
onMetadata()?
optionalonMetadata: (metadata) =>void
Called when metadata is received
Parameters
| Parameter | Type |
|---|---|
metadata | Record<string, unknown> |
Returns
void
onOpen()?
optionalonOpen: () =>void
Called when connection is established
Returns
void
onSentiment()?
optionalonSentiment: (event) =>void
Called for real-time sentiment analysis (Gladia: requires sentiment_analysis enabled)
Parameters
| Parameter | Type |
|---|---|
event | SentimentEvent |
Returns
void
onSpeechEnd()?
optionalonSpeechEnd: (event) =>void
Called when speech ends (Gladia: requires receive_speech_events)
Parameters
| Parameter | Type |
|---|---|
event | SpeechEvent |
Returns
void
onSpeechStart()?
optionalonSpeechStart: (event) =>void
Called when speech starts (Gladia: requires receive_speech_events)
Parameters
| Parameter | Type |
|---|---|
event | SpeechEvent |
Returns
void
onSummarization()?
optionalonSummarization: (event) =>void
Called when post-processing summarization completes (Gladia: requires summarization enabled)
Parameters
| Parameter | Type |
|---|---|
event | SummarizationEvent |
Returns
void
onTranscript()?
optionalonTranscript: (event) =>void
Called when a transcript (interim or final) is received
Parameters
| Parameter | Type |
|---|---|
event | StreamEvent |
Returns
void
onTranslation()?
optionalonTranslation: (event) =>void
Called for real-time translation (Gladia: requires translation enabled)
Parameters
| Parameter | Type |
|---|---|
event | TranslationEvent |
Returns
void
onUtterance()?
optionalonUtterance: (utterance) =>void
Called when a complete utterance is detected
Parameters
| Parameter | Type |
|---|---|
utterance | Utterance |
Returns
void
StreamingOptions
Options for streaming transcription
Extends
Omit<TranscribeOptions,"webhookUrl">
Properties
assemblyai?
optionalassemblyai:Partial<TranscriptOptionalParams>
AssemblyAI-specific options (passed directly to API)
See
https://www.assemblyai.com/docs/api-reference/transcripts/submit
Inherited from
assemblyaiStreaming?
optionalassemblyaiStreaming:AssemblyAIStreamingOptions
AssemblyAI-specific streaming options (passed to WebSocket URL & configuration)
Includes end-of-turn detection tuning, VAD threshold, profanity filter, keyterms, speech model selection, and language detection.
See
https://www.assemblyai.com/docs/speech-to-text/streaming
Example
await adapter.transcribeStream({
assemblyaiStreaming: {
speechModel: 'universal-streaming-multilingual',
languageDetection: true,
endOfTurnConfidenceThreshold: 0.7,
minEndOfTurnSilenceWhenConfident: 500,
vadThreshold: 0.3,
formatTurns: true,
filterProfanity: true,
keyterms: ['TypeScript', 'JavaScript', 'API']
}
});audioToLlm?
optionalaudioToLlm:GladiaAudioToLlmConfig
Audio-to-LLM configuration (Gladia-specific) Run custom LLM prompts on the transcription
See
GladiaAudioToLlmConfig
Inherited from
bitDepth?
optionalbitDepth:number
Bit depth for PCM audio
Common depths: 8, 16, 24, 32 16-bit is standard for most applications
channels?
optionalchannels:number
Number of audio channels
- 1: Mono (recommended for transcription)
- 2: Stereo
- 3-8: Multi-channel (provider-specific support)
codeSwitching?
optionalcodeSwitching:boolean
Enable code switching (multilingual audio detection) Supported by: Gladia
Inherited from
TranscribeOptions.codeSwitching
codeSwitchingConfig?
optionalcodeSwitchingConfig:GladiaCodeSwitchingConfig
Code switching configuration (Gladia-specific)
See
GladiaCodeSwitchingConfig
Inherited from
TranscribeOptions.codeSwitchingConfig
customVocabulary?
optionalcustomVocabulary:string[]
Custom vocabulary to boost (provider-specific format)
Inherited from
TranscribeOptions.customVocabulary
deepgram?
optionaldeepgram:Partial<ListenV1MediaTranscribeParams>
Deepgram-specific options (passed directly to API)
See
https://developers.deepgram.com/reference/listen-file
Inherited from
deepgramStreaming?
optionaldeepgramStreaming:DeepgramStreamingOptions
Deepgram-specific streaming options (passed to WebSocket URL)
Includes filler_words, numerals, measurements, paragraphs, profanity_filter, topics, intents, custom_topic, custom_intent, keyterm, dictation, utt_split, and more.
See
https://developers.deepgram.com/docs/streaming
Example
await adapter.transcribeStream({
deepgramStreaming: {
fillerWords: true,
profanityFilter: true,
topics: true,
intents: true,
customTopic: ['sales', 'support'],
customIntent: ['purchase', 'complaint'],
numerals: true
}
});diarization?
optionaldiarization:boolean
Enable speaker diarization
Inherited from
encoding?
optionalencoding:AudioEncoding
Audio encoding format
Common formats:
linear16: PCM 16-bit (universal, recommended)mulaw: μ-law telephony codecalaw: A-law telephony codecflac,opus,speex: Advanced codecs (Deepgram only)
See
AudioEncoding for full list of supported formats
endpointing?
optionalendpointing:number
Utterance end silence threshold in milliseconds
entityDetection?
optionalentityDetection:boolean
Enable entity detection
Inherited from
TranscribeOptions.entityDetection
gladia?
optionalgladia:Partial<InitTranscriptionRequest>
Gladia-specific options (passed directly to API)
See
Inherited from
gladiaStreaming?
optionalgladiaStreaming:Partial<Omit<StreamingRequest,"encoding"|"channels"|"sample_rate"|"bit_depth">>
Gladia-specific streaming options (passed directly to API)
Includes pre_processing, realtime_processing, post_processing, messages_config, and callback configuration.
See
https://docs.gladia.io/api-reference/v2/live
Example
await adapter.transcribeStream({
gladiaStreaming: {
realtime_processing: {
words_accurate_timestamps: true
},
messages_config: {
receive_partial_transcripts: true
}
}
});interimResults?
optionalinterimResults:boolean
Enable interim results (partial transcripts)
language?
optionallanguage:string
Language code with autocomplete from OpenAPI specs
Example
'en', 'en_us', 'fr', 'de', 'es'See
TranscriptionLanguage for full list
Inherited from
languageDetection?
optionallanguageDetection:boolean
Enable automatic language detection
Inherited from
TranscribeOptions.languageDetection
maxSilence?
optionalmaxSilence:number
Maximum duration without endpointing in seconds
model?
optionalmodel:TranscriptionModel
Model to use for transcription (provider-specific)
Type-safe with autocomplete for all known models:
- Deepgram: 'nova-2', 'nova-3', 'base', 'enhanced', 'whisper-large', etc.
- Gladia: 'solaria-1' (default)
- AssemblyAI: Not applicable (uses Universal-2 automatically)
Example
// Use Nova-2 for better multilingual support
{ model: 'nova-2', language: 'fr' }Overrides
openai?
optionalopenai:Partial<Omit<OpenAIWhisperOptions,"model"|"file">>
OpenAI Whisper-specific options (passed directly to API)
See
https://platform.openai.com/docs/api-reference/audio/createTranscription
Inherited from
openaiStreaming?
optionalopenaiStreaming:OpenAIStreamingOptions
OpenAI Realtime API streaming options
Configure the OpenAI Realtime WebSocket connection for audio transcription. Uses the Realtime API which supports real-time audio input transcription.
See
https://platform.openai.com/docs/guides/realtime
Example
await adapter.transcribeStream({
openaiStreaming: {
model: 'gpt-4o-realtime-preview',
voice: 'alloy',
turnDetection: {
type: 'server_vad',
threshold: 0.5,
silenceDurationMs: 500
}
}
});piiRedaction?
optionalpiiRedaction:boolean
Enable PII redaction
Inherited from
TranscribeOptions.piiRedaction
region?
optionalregion:StreamingSupportedRegions
Regional endpoint for streaming (Gladia only)
Gladia supports regional streaming endpoints for lower latency:
us-west: US West Coasteu-west: EU West (Ireland)
Example
import { GladiaRegion } from 'voice-router-dev/constants'
await adapter.transcribeStream({
region: GladiaRegion["us-west"]
})See
https://docs.gladia.io/api-reference/v2/live
sampleRate?
optionalsampleRate:number
Sample rate in Hz
Common rates: 8000, 16000, 32000, 44100, 48000 Most providers recommend 16000 Hz for optimal quality/performance
sentimentAnalysis?
optionalsentimentAnalysis:boolean
Enable sentiment analysis
Inherited from
TranscribeOptions.sentimentAnalysis
sonioxStreaming?
optionalsonioxStreaming:SonioxStreamingOptions
Soniox-specific streaming options
Configure the Soniox WebSocket connection for real-time transcription. Supports speaker diarization, language identification, translation, and custom context.
See
https://soniox.com/docs/stt/SDKs/web-sdk
Example
await adapter.transcribeStream({
sonioxStreaming: {
model: 'stt-rt-preview',
enableSpeakerDiarization: true,
enableEndpointDetection: true,
context: {
terms: ['TypeScript', 'React'],
text: 'Technical discussion'
},
translation: { type: 'one_way', target_language: 'es' }
}
});speakersExpected?
optionalspeakersExpected:number
Expected number of speakers (for diarization)
Inherited from
TranscribeOptions.speakersExpected
summarization?
optionalsummarization:boolean
Enable summarization
Inherited from
TranscribeOptions.summarization
wordTimestamps?
optionalwordTimestamps:boolean
Enable word-level timestamps
Inherited from
TranscribeOptions.wordTimestamps
StreamingSession
Represents an active streaming transcription session
Properties
close()
close: () =>
Promise<void>
Close the streaming session
Returns
Promise<void>
createdAt
createdAt:
Date
Session creation timestamp
getStatus()
getStatus: () =>
"open"|"connecting"|"closing"|"closed"
Get current session status
Returns
"open" | "connecting" | "closing" | "closed"
id
id:
string
Unique session ID
provider
provider:
TranscriptionProvider
Provider handling this stream
sendAudio()
sendAudio: (
chunk) =>Promise<void>
Send an audio chunk to the stream
Parameters
| Parameter | Type |
|---|---|
chunk | AudioChunk |
Returns
Promise<void>
SummarizationEvent
Post-processing summarization event
Properties
summary
summary:
string
Full summarization text
error?
optionalerror:string
Error if summarization failed
TranscribeOptions
Common transcription options across all providers
For provider-specific options, use the typed provider options:
deepgram: Full Deepgram API optionsassemblyai: Full AssemblyAI API optionsgladia: Full Gladia API options
Properties
assemblyai?
optionalassemblyai:Partial<TranscriptOptionalParams>
AssemblyAI-specific options (passed directly to API)
See
https://www.assemblyai.com/docs/api-reference/transcripts/submit
audioToLlm?
optionalaudioToLlm:GladiaAudioToLlmConfig
Audio-to-LLM configuration (Gladia-specific) Run custom LLM prompts on the transcription
See
GladiaAudioToLlmConfig
codeSwitching?
optionalcodeSwitching:boolean
Enable code switching (multilingual audio detection) Supported by: Gladia
codeSwitchingConfig?
optionalcodeSwitchingConfig:GladiaCodeSwitchingConfig
Code switching configuration (Gladia-specific)
See
GladiaCodeSwitchingConfig
customVocabulary?
optionalcustomVocabulary:string[]
Custom vocabulary to boost (provider-specific format)
deepgram?
optionaldeepgram:Partial<ListenV1MediaTranscribeParams>
Deepgram-specific options (passed directly to API)
See
https://developers.deepgram.com/reference/listen-file
diarization?
optionaldiarization:boolean
Enable speaker diarization
entityDetection?
optionalentityDetection:boolean
Enable entity detection
gladia?
optionalgladia:Partial<InitTranscriptionRequest>
Gladia-specific options (passed directly to API)
See
language?
optionallanguage:string
Language code with autocomplete from OpenAPI specs
Example
'en', 'en_us', 'fr', 'de', 'es'See
TranscriptionLanguage for full list
languageDetection?
optionallanguageDetection:boolean
Enable automatic language detection
model?
optionalmodel:TranscriptionModel
Model to use for transcription (provider-specific)
Type-safe model selection derived from OpenAPI specs:
- Deepgram: 'nova-3', 'nova-2', 'enhanced', 'base', etc.
- AssemblyAI: 'best', 'slam-1', 'universal'
- Speechmatics: 'standard', 'enhanced' (operating point)
- Gladia: 'solaria-1' (streaming only)
See
TranscriptionModel for full list of available models
openai?
optionalopenai:Partial<Omit<OpenAIWhisperOptions,"model"|"file">>
OpenAI Whisper-specific options (passed directly to API)
See
https://platform.openai.com/docs/api-reference/audio/createTranscription
piiRedaction?
optionalpiiRedaction:boolean
Enable PII redaction
sentimentAnalysis?
optionalsentimentAnalysis:boolean
Enable sentiment analysis
speakersExpected?
optionalspeakersExpected:number
Expected number of speakers (for diarization)
summarization?
optionalsummarization:boolean
Enable summarization
webhookUrl?
optionalwebhookUrl:string
Webhook URL for async results
wordTimestamps?
optionalwordTimestamps:boolean
Enable word-level timestamps
TranscriptData
Transcript data structure
Contains the core transcript information returned by getTranscript and listTranscripts.
Example
const result = await router.getTranscript('abc123', 'assemblyai');
if (result.success && result.data) {
console.log(result.data.id); // string
console.log(result.data.text); // string
console.log(result.data.status); // TranscriptionStatus
console.log(result.data.metadata); // TranscriptMetadata
}Properties
id
id:
string
Unique transcript ID
status
status:
TranscriptionStatus
Transcription status
text
text:
string
Full transcribed text (empty for list items)
completedAt?
optionalcompletedAt:string
Completion timestamp (shorthand for metadata.completedAt)
confidence?
optionalconfidence:number
Overall confidence score (0-1)
createdAt?
optionalcreatedAt:string
Creation timestamp (shorthand for metadata.createdAt)
duration?
optionalduration:number
Audio duration in seconds
language?
optionallanguage:string
Detected or specified language code
metadata?
optionalmetadata:TranscriptMetadata
Transcript metadata
speakers?
optionalspeakers:Speaker[]
Speaker diarization results
summary?
optionalsummary:string
Summary of the content (if summarization enabled)
utterances?
optionalutterances:Utterance[]
Utterances (speaker turns)
words?
optionalwords:Word[]
Word-level transcription with timestamps
TranscriptMetadata
Transcript metadata with typed common fields
Contains provider-agnostic metadata fields that are commonly available. Provider-specific fields can be accessed via the index signature.
Example
const { transcripts } = await router.listTranscripts('assemblyai', { limit: 20 });
transcripts.forEach(item => {
console.log(item.data?.metadata?.audioUrl); // string | undefined
console.log(item.data?.metadata?.createdAt); // string | undefined
console.log(item.data?.metadata?.audioDuration); // number | undefined
});Indexable
[key: string]: unknown
Provider-specific fields
Properties
audioDuration?
optionalaudioDuration:number
Audio duration in seconds
audioFileAvailable?
optionalaudioFileAvailable:boolean
True if the provider stored the audio and it can be downloaded via adapter.getAudioFile(). Currently only Gladia supports this - other providers discard audio after processing.
Example
if (item.data?.metadata?.audioFileAvailable) {
const audio = await gladiaAdapter.getAudioFile(item.data.id)
// audio.data is a Blob
}completedAt?
optionalcompletedAt:string
Completion timestamp (ISO 8601)
createdAt?
optionalcreatedAt:string
Creation timestamp (ISO 8601)
customMetadata?
optionalcustomMetadata:Record<string,unknown>
Custom metadata (Gladia)
displayName?
optionaldisplayName:string
Display name (Azure)
filesUrl?
optionalfilesUrl:string
Files URL (Azure)
kind?
optionalkind:"batch"|"streaming"|"pre-recorded"|"live"
Transcript type
lastActionAt?
optionallastActionAt:string
Last action timestamp (Azure)
resourceUrl?
optionalresourceUrl:string
Resource URL for the transcript
sourceAudioUrl?
optionalsourceAudioUrl:string
Original audio URL/source you provided to the API (echoed back). This is NOT a provider-hosted URL - it's what you sent when creating the transcription.
TranslationEvent
Translation event data (for real-time translation)
Properties
targetLanguage
targetLanguage:
string
Target language
translatedText
translatedText:
string
Translated text
isFinal?
optionalisFinal:boolean
Whether this is a final translation
original?
optionaloriginal:string
Original text
utteranceId?
optionalutteranceId:string
Utterance ID this translation belongs to
UnifiedTranscriptResponse
Unified transcription response with provider-specific type safety
When a specific provider is known at compile time, both raw and extended
fields will be typed with that provider's actual types.
Examples
const result: UnifiedTranscriptResponse<'assemblyai'> = await adapter.transcribe(audio);
// result.raw is typed as AssemblyAITranscript
// result.extended is typed as AssemblyAIExtendedData
const chapters = result.extended?.chapters; // AssemblyAIChapter[] | undefined
const entities = result.extended?.entities; // AssemblyAIEntity[] | undefinedconst result: UnifiedTranscriptResponse<'gladia'> = await gladiaAdapter.transcribe(audio);
const translation = result.extended?.translation; // GladiaTranslation | undefined
const llmResults = result.extended?.audioToLlm; // GladiaAudioToLlmResult | undefinedconst result: UnifiedTranscriptResponse = await router.transcribe(audio);
// result.raw is typed as unknown (could be any provider)
// result.extended is typed as union of all extended typesType Parameters
| Type Parameter | Default type | Description |
|---|---|---|
P extends TranscriptionProvider | TranscriptionProvider | The transcription provider (defaults to all providers) |
Properties
provider
provider:
P
Provider that performed the transcription
success
success:
boolean
Operation success status
data?
optionaldata:TranscriptData
Transcription data (only present on success)
error?
optionalerror:object
Error information (only present on failure)
code
code:
string
Error code (provider-specific or normalized)
message
message:
string
Human-readable error message
details?
optionaldetails:unknown
Additional error details
statusCode?
optionalstatusCode:number
HTTP status code if applicable
extended?
optionalextended:Pextends keyofProviderExtendedDataMap?ProviderExtendedDataMap[P<P>] :unknown
Extended provider-specific data (fully typed from OpenAPI specs)
Contains rich data beyond basic transcription:
- AssemblyAI: chapters, entities, sentiment, content safety, topics
- Gladia: translation, moderation, entities, audio-to-llm, chapters
- Deepgram: detailed metadata, request tracking, model info
Example
const result = await assemblyaiAdapter.transcribe(audio, { summarization: true });
result.extended?.chapters?.forEach(chapter => {
console.log(`${chapter.headline}: ${chapter.summary}`);
});raw?
optionalraw:Pextends keyofProviderRawResponseMap?ProviderRawResponseMap[P<P>] :unknown
Raw provider response (for advanced usage)
Type-safe based on the provider:
gladia: PreRecordedResponsedeepgram: ListenV1Responseopenai-whisper: CreateTranscription200Oneassemblyai: AssemblyAITranscriptazure-stt: AzureTranscription
tracking?
optionaltracking:object
Request tracking information for debugging
audioHash?
optionalaudioHash:string
Audio fingerprint (SHA256) if available
processingTimeMs?
optionalprocessingTimeMs:number
Processing duration in milliseconds
requestId?
optionalrequestId:string
Provider's request/job ID
Utterance
Utterance (sentence or phrase by a single speaker)
Normalized from provider-specific types:
- Gladia:
UtteranceDTO - AssemblyAI:
TranscriptUtterance - Deepgram:
ListenV1ResponseResultsUtterancesItem
Properties
end
end:
number
End time in seconds
start
start:
number
Start time in seconds
text
text:
string
The transcribed text
channel?
optionalchannel:number
Audio channel number (for multi-channel/stereo recordings)
Channel numbering varies by provider:
- AssemblyAI: 1=left, 2=right, sequential for additional channels
- Deepgram: 0-indexed channel number
- Gladia: 0-indexed channel number
confidence?
optionalconfidence:number
Confidence score (0-1)
id?
optionalid:string
Unique utterance identifier (provider-assigned)
Available from: Deepgram Useful for linking utterances to other data (entities, sentiment, etc.)
language?
optionallanguage:string
Detected language for this utterance (BCP-47 code)
Available from: Gladia (with code-switching enabled) Essential for multilingual transcription where language changes mid-conversation.
Example
'en', 'es', 'fr', 'de'See
TranscriptionLanguage for full list of supported codes
speaker?
optionalspeaker:string
Speaker ID
words?
optionalwords:Word[]
Words in this utterance
Word
Word-level transcription with timing
Normalized from provider-specific types:
- Gladia:
WordDTO - AssemblyAI:
TranscriptWord - Deepgram:
ListenV1ResponseResultsChannelsItemAlternativesItemWordsItem
Properties
end
end:
number
End time in seconds
start
start:
number
Start time in seconds
word
word:
string
The transcribed word
channel?
optionalchannel:number
Audio channel number (for multi-channel/stereo recordings)
Channel numbering varies by provider:
- AssemblyAI: 1=left, 2=right, sequential for additional channels
- Deepgram: 0-indexed channel number
- Gladia: 0-indexed channel number
confidence?
optionalconfidence:number
Confidence score (0-1)
speaker?
optionalspeaker:string
Speaker ID if diarization is enabled
References
GladiaOptions
Renames and re-exports InitTranscriptionRequest
GladiaStreamingRequest
Renames and re-exports StreamingRequest
Type Aliases
AudioInput
AudioInput =
AudioInputUrl|AudioInputFile|AudioInputStream
Union of all audio input types
BatchOnlyProvider
BatchOnlyProvider =
BatchOnlyProviderType
Providers that only support batch/async transcription
Automatically derived from providers where streaming is false or undefined. Note: Speechmatics has a WebSocket API but streaming is not yet implemented in this SDK.
ProviderExtendedDataMap
ProviderExtendedDataMap =
object
Map of provider names to their extended data types
Properties
assemblyai
assemblyai:
AssemblyAIExtendedData
azure-stt
azure-stt:
Record<string,never>
deepgram
deepgram:
DeepgramExtendedData
gladia
gladia:
GladiaExtendedData
openai-whisper
openai-whisper:
Record<string,never>
soniox
soniox:
Record<string,never>
speechmatics
speechmatics:
Record<string,never>
ProviderRawResponseMap
ProviderRawResponseMap =
object
Map of provider names to their raw response types Enables type-safe access to provider-specific raw responses
Properties
assemblyai
assemblyai:
AssemblyAITranscript
azure-stt
azure-stt:
AzureTranscription
deepgram
deepgram:
ListenV1Response
gladia
gladia:
PreRecordedResponse
openai-whisper
openai-whisper:
CreateTranscription200One
soniox
soniox:
unknown
speechmatics
speechmatics:
unknown
SessionStatus
SessionStatus =
"connecting"|"open"|"closing"|"closed"
WebSocket session status for streaming transcription
SpeechmaticsOperatingPoint
SpeechmaticsOperatingPoint =
"standard"|"enhanced"
Speechmatics operating point (model) type Manually defined as Speechmatics OpenAPI spec doesn't export this cleanly
StreamEventType
StreamEventType =
"open"|"transcript"|"utterance"|"metadata"|"error"|"close"|"speech_start"|"speech_end"|"translation"|"sentiment"|"entity"|"summarization"|"chapterization"|"audio_ack"|"lifecycle"
Streaming transcription event types
StreamingProvider
StreamingProvider =
StreamingProviderType
Providers that support real-time streaming transcription
This type is automatically derived from ProviderCapabilitiesMap.streaming in provider-metadata.ts
No manual sync needed - if you set streaming: true for a provider, it's included here.
TranscriptionLanguage
TranscriptionLanguage =
AssemblyAILanguageCode|TranscriptionLanguageCodeEnum|string
Unified transcription language type with autocomplete for all providers
Includes language codes from AssemblyAI and Gladia OpenAPI specs. Deepgram uses string for flexibility.
TranscriptionModel
TranscriptionModel =
DeepgramModelType|StreamingSupportedModels|AssemblyAISpeechModel|SpeechmaticsOperatingPoint
Unified transcription model type with autocomplete for all providers
Strict union type - only accepts valid models from each provider:
- Deepgram: nova-3, nova-2, enhanced, base, etc.
- AssemblyAI: best, slam-1, universal
- Gladia: solaria-1
- Speechmatics: standard, enhanced
Use provider const objects for autocomplete:
Example
import { DeepgramModel } from 'voice-router-dev'
{ model: DeepgramModel["nova-3"] }TranscriptionProvider
TranscriptionProvider =
"gladia"|"assemblyai"|"deepgram"|"openai-whisper"|"azure-stt"|"speechmatics"|"soniox"
Supported transcription provider identifiers
TranscriptionStatus
TranscriptionStatus =
"queued"|"processing"|"completed"|"error"
Transcription status