router/types
Voice Router SDK - Azure Speech-to-Text Provider / router/types
router/types
Interfaces
AssemblyAIExtendedData
Extended data from AssemblyAI transcription Includes chapters, entities, sentiment, content safety, and more
Properties
chapters?
optionalchapters:Chapter[]
Auto-generated chapters with summaries
contentSafety?
optionalcontentSafety:ContentSafetyLabelsResult
Content safety/moderation labels
entities?
optionalentities:Entity[]
Detected named entities (people, organizations, locations)
highlights?
optionalhighlights:AutoHighlightsResult
Key phrases and highlights
languageConfidence?
optionallanguageConfidence:number
Language detection confidence (0-1)
sentimentResults?
optionalsentimentResults:SentimentAnalysisResult[]
Per-utterance sentiment analysis results
throttled?
optionalthrottled:boolean
Whether the request was throttled
topics?
optionaltopics:TopicDetectionModelResult
IAB topic categories
AudioAckEvent
Audio chunk acknowledgment event
Properties
byteRange?
optionalbyteRange: [number,number]
Byte range of the acknowledged audio chunk [start, end]
timeRange?
optionaltimeRange: [number,number]
Time range in seconds of the acknowledged audio chunk [start, end]
timestamp?
optionaltimestamp:string
Acknowledgment timestamp
AudioChunk
Audio chunk for streaming transcription
Properties
data
data:
Buffer<ArrayBufferLike> |Uint8Array<ArrayBufferLike>
Audio data as Buffer or Uint8Array
isLast?
optionalisLast:boolean
Whether this is the last chunk
ChapterizationEvent
Post-processing chapterization event
Properties
chapters
chapters:
object[]
Generated chapters
end
end:
number
End time in seconds
headline
headline:
string
Chapter title/headline
start
start:
number
Start time in seconds
summary
summary:
string
Chapter summary
error?
optionalerror:string
Error if chapterization failed
DeepgramExtendedData
Extended data from Deepgram transcription Includes detailed metadata, model info, and feature-specific data
Properties
metadata?
optionalmetadata:ListenV1ResponseMetadata
Full response metadata
modelInfo?
optionalmodelInfo:Record<string,unknown>
Model versions used
requestId?
optionalrequestId:string
Request ID for debugging/tracking
sha256?
optionalsha256:string
SHA256 hash of the audio
tags?
optionaltags:string[]
Tags echoed back from request
EntityEvent
Named entity recognition result
Properties
text
text:
string
Entity text
type
type:
string
Entity type (PERSON, ORGANIZATION, LOCATION, etc.)
end?
optionalend:number
End position
start?
optionalstart:number
Start position
utteranceId?
optionalutteranceId:string
Utterance ID this entity belongs to
GladiaExtendedData
Extended data from Gladia transcription Includes translation, moderation, entities, LLM outputs, and more
Properties
audioToLlm?
optionalaudioToLlm:AudioToLlmListDTO
Audio-to-LLM custom prompt results
chapters?
optionalchapters:ChapterizationDTO
Auto-generated chapters
customMetadata?
optionalcustomMetadata:Record<string,unknown>
Custom metadata echoed back
entities?
optionalentities:NamedEntityRecognitionDTO
Named entity recognition results
moderation?
optionalmoderation:ModerationDTO
Content moderation results
sentiment?
optionalsentiment:SentimentAnalysisDTO
Sentiment analysis results
speakerReidentification?
optionalspeakerReidentification:SpeakerReidentificationDTO
AI speaker reidentification results
structuredData?
optionalstructuredData:StructuredDataExtractionDTO
Structured data extraction results
translation?
optionaltranslation:TranslationDTO
Translation results (if translation enabled)
LifecycleEvent
Lifecycle event (session start, recording end, etc.)
Properties
eventType
eventType:
"start_session"|"start_recording"|"stop_recording"|"end_recording"|"end_session"
Lifecycle event type
sessionId?
optionalsessionId:string
Session ID
timestamp?
optionaltimestamp:string
Event timestamp
ListTranscriptsOptions
Options for listing transcripts with date/time filtering
Providers support different filtering capabilities:
- AssemblyAI: status, created_on, before_id, after_id, throttled_only
- Gladia: status, date, before_date, after_date, custom_metadata
- Azure: status, skip, top, filter (OData)
- Deepgram: start, end, status, page, request_id, endpoint (requires projectId)
Examples
await adapter.listTranscripts({
date: '2026-01-07', // Exact date (ISO format)
status: 'completed',
limit: 50
})await adapter.listTranscripts({
afterDate: '2026-01-01',
beforeDate: '2026-01-31',
limit: 100
})Properties
afterDate?
optionalafterDate:string
Filter for transcripts created after this date (ISO format)
assemblyai?
optionalassemblyai:Partial<ListTranscriptsParams>
AssemblyAI-specific list options
beforeDate?
optionalbeforeDate:string
Filter for transcripts created before this date (ISO format)
date?
optionaldate:string
Filter by exact date (ISO format: YYYY-MM-DD)
deepgram?
optionaldeepgram:Partial<ManageV1ProjectsRequestsListParams>
Deepgram-specific list options (request history)
gladia?
optionalgladia:Partial<TranscriptionControllerListV2Params>
Gladia-specific list options
limit?
optionallimit:number
Maximum number of transcripts to retrieve
offset?
optionaloffset:number
Pagination offset (skip N results)
status?
optionalstatus:string
Filter by transcript status
ListTranscriptsResponse
Response from listTranscripts
Example
import type { ListTranscriptsResponse } from 'voice-router-dev';
const response: ListTranscriptsResponse = await router.listTranscripts('assemblyai', {
status: 'completed',
limit: 50
});
response.transcripts.forEach(item => {
console.log(item.data?.id, item.data?.status);
});
if (response.hasMore) {
// Fetch next page
}Properties
transcripts
transcripts:
UnifiedTranscriptResponse<TranscriptionProvider>[]
List of transcripts
hasMore?
optionalhasMore:boolean
Whether more results are available
total?
optionaltotal:number
Total count (if available from provider)
ProviderCapabilities
Provider capability flags
Each boolean indicates whether the provider supports a specific feature. Use ProviderCapabilitiesMap from provider-metadata for runtime access.
Properties
customVocabulary
customVocabulary:
boolean
Custom vocabulary/keyword boosting
deleteTranscript
deleteTranscript:
boolean
Delete transcriptions
diarization
diarization:
boolean
Speaker diarization (identifying different speakers)
entityDetection
entityDetection:
boolean
Entity detection
languageDetection
languageDetection:
boolean
Automatic language detection
listTranscripts
listTranscripts:
boolean
List/fetch previous transcriptions
piiRedaction
piiRedaction:
boolean
PII redaction
sentimentAnalysis
sentimentAnalysis:
boolean
Sentiment analysis
streaming
streaming:
boolean
Real-time streaming transcription support
summarization
summarization:
boolean
Audio summarization
wordTimestamps
wordTimestamps:
boolean
Word-level timestamps
getAudioFile?
optionalgetAudioFile:boolean
Download original audio file
RawWebSocketMessage
Raw WebSocket message from provider
Captures the exact payload received from or sent to the provider WebSocket, before any SDK normalization or transformation. Useful for debugging and storing exact provider payloads for replay/analysis.
Example
onRawMessage: (msg) => {
// Store raw payload for debugging
rawMessages.push({
provider: msg.provider,
direction: msg.direction,
timestamp: msg.timestamp,
payload: msg.payload,
messageType: msg.messageType
});
}Properties
direction
direction:
"incoming"|"outgoing"
Message direction
payload
payload:
string|ArrayBuffer
Raw payload exactly as received/sent
- For incoming JSON messages: the raw string before parsing
- For outgoing audio: the binary data being sent
- For incoming binary: ArrayBuffer as received
provider
provider:
string
Provider name (e.g., 'gladia', 'deepgram', 'soniox')
timestamp
timestamp:
number
Timestamp in milliseconds (Date.now())
messageType?
optionalmessageType:string
Message type if cheaply derivable from payload
Provider-specific values:
- Gladia: 'transcript', 'post_final_transcript', 'speech_start', 'speech_end', etc.
- Deepgram: 'Results', 'Metadata', 'UtteranceEnd', etc.
- AssemblyAI: 'SessionBegins', 'PartialTranscript', 'FinalTranscript', etc.
- Soniox: derived from payload structure
SentimentEvent
Sentiment analysis result (for real-time sentiment)
Properties
sentiment
sentiment:
string
Sentiment label (positive, negative, neutral)
confidence?
optionalconfidence:number
Confidence score 0-1
utteranceId?
optionalutteranceId:string
Utterance ID this sentiment belongs to
Speaker
Speaker information from diarization
Properties
id
id:
string
Speaker identifier (e.g., "A", "B", "speaker_0")
confidence?
optionalconfidence:number
Confidence score for speaker identification (0-1)
label?
optionallabel:string
Speaker label if known
SpeechEvent
Speech event data (for speech_start/speech_end events)
Properties
timestamp
timestamp:
number
Timestamp in seconds
type
type:
"speech_start"|"speech_end"
Event type: speech_start or speech_end
channel?
optionalchannel:number
Channel number
sessionId?
optionalsessionId:string
Session ID
StreamEvent
Streaming transcription event
Properties
type
type:
StreamEventType
channel?
optionalchannel:number
Channel number for multi-channel audio
confidence?
optionalconfidence:number
Confidence score for this event
data?
optionaldata:unknown
Additional event data
error?
optionalerror:object
Error information (for type: "error")
code
code:
string
message
message:
string
details?
optionaldetails:unknown
isFinal?
optionalisFinal:boolean
Whether this is a final transcript (vs interim)
language?
optionallanguage:string
Language of the transcript/utterance
speaker?
optionalspeaker:string
Speaker ID if diarization is enabled
text?
optionaltext:string
Partial transcript text (for type: "transcript")
utterance?
optionalutterance:Utterance
Utterance data (for type: "utterance")
words?
optionalwords:Word[]
Words in this event
StreamingCallbacks
Properties
onAudioAck()?
optionalonAudioAck: (event) =>void
Called for audio chunk acknowledgments (Gladia: requires receive_acknowledgments)
Parameters
| Parameter | Type |
|---|---|
event | AudioAckEvent |
Returns
void
onChapterization()?
optionalonChapterization: (event) =>void
Called when post-processing chapterization completes (Gladia: requires chapterization enabled)
Parameters
| Parameter | Type |
|---|---|
event | ChapterizationEvent |
Returns
void
onClose()?
optionalonClose: (code?,reason?) =>void
Called when the stream is closed
Parameters
| Parameter | Type |
|---|---|
code? | number |
reason? | string |
Returns
void
onEntity()?
optionalonEntity: (event) =>void
Called for named entity recognition (Gladia: requires named_entity_recognition enabled)
Parameters
| Parameter | Type |
|---|---|
event | EntityEvent |
Returns
void
onError()?
optionalonError: (error) =>void
Called when an error occurs
Parameters
| Parameter | Type |
|---|---|
error | { code: string; message: string; details?: unknown; } |
error.code | string |
error.message | string |
error.details? | unknown |
Returns
void
onLifecycle()?
optionalonLifecycle: (event) =>void
Called for session lifecycle events (Gladia: requires receive_lifecycle_events)
Parameters
| Parameter | Type |
|---|---|
event | LifecycleEvent |
Returns
void
onMetadata()?
optionalonMetadata: (metadata) =>void
Called when metadata is received
Parameters
| Parameter | Type |
|---|---|
metadata | Record<string, unknown> |
Returns
void
onOpen()?
optionalonOpen: () =>void
Called when connection is established
Returns
void
onRawMessage()?
optionalonRawMessage: (message) =>void
Called for every raw WebSocket message before SDK processing
Captures exact provider payloads for debugging, replay, and analysis. Invoked for both incoming messages from the provider and outgoing audio chunks sent to the provider.
Parameters
| Parameter | Type |
|---|---|
message | RawWebSocketMessage |
Returns
void
Example
const rawMessages: RawWebSocketMessage[] = [];
await adapter.transcribeStream(options, {
onRawMessage: (msg) => {
rawMessages.push(msg);
// Or stream to storage immediately
},
onTranscript: (event) => { ... }
});
// After session, rawMessages contains all provider payloadsonSentiment()?
optionalonSentiment: (event) =>void
Called for real-time sentiment analysis (Gladia: requires sentiment_analysis enabled)
Parameters
| Parameter | Type |
|---|---|
event | SentimentEvent |
Returns
void
onSpeechEnd()?
optionalonSpeechEnd: (event) =>void
Called when speech ends (Gladia: requires receive_speech_events)
Parameters
| Parameter | Type |
|---|---|
event | SpeechEvent |
Returns
void
onSpeechStart()?
optionalonSpeechStart: (event) =>void
Called when speech starts (Gladia: requires receive_speech_events)
Parameters
| Parameter | Type |
|---|---|
event | SpeechEvent |
Returns
void
onSummarization()?
optionalonSummarization: (event) =>void
Called when post-processing summarization completes (Gladia: requires summarization enabled)
Parameters
| Parameter | Type |
|---|---|
event | SummarizationEvent |
Returns
void
onTranscript()?
optionalonTranscript: (event) =>void
Called when a transcript (interim or final) is received
Parameters
| Parameter | Type |
|---|---|
event | StreamEvent |
Returns
void
onTranslation()?
optionalonTranslation: (event) =>void
Called for real-time translation (Gladia: requires translation enabled)
Parameters
| Parameter | Type |
|---|---|
event | TranslationEvent |
Returns
void
onUtterance()?
optionalonUtterance: (utterance) =>void
Called when a complete utterance is detected
Parameters
| Parameter | Type |
|---|---|
utterance | Utterance |
Returns
void
StreamingOptions
Options for streaming transcription
Extends
Omit<TranscribeOptions,"webhookUrl">
Properties
assemblyai?
optionalassemblyai:Partial<TranscriptOptionalParams>
AssemblyAI-specific options (passed directly to API)
See
https://www.assemblyai.com/docs/api-reference/transcripts/submit
Inherited from
assemblyaiStreaming?
optionalassemblyaiStreaming:AssemblyAIStreamingOptions
AssemblyAI-specific streaming options (passed to WebSocket URL & configuration)
Includes end-of-turn detection tuning, VAD threshold, profanity filter, keyterms, speech model selection, and language detection.
See
https://www.assemblyai.com/docs/speech-to-text/streaming
Example
await adapter.transcribeStream({
assemblyaiStreaming: {
speechModel: 'universal-streaming-multilingual',
languageDetection: true,
endOfTurnConfidenceThreshold: 0.7,
minEndOfTurnSilenceWhenConfident: 500,
vadThreshold: 0.3,
formatTurns: true,
filterProfanity: true,
keyterms: ['TypeScript', 'JavaScript', 'API']
}
});audioToLlm?
optionalaudioToLlm:AudioToLlmListConfigDTO
Audio-to-LLM configuration (Gladia-specific) Run custom LLM prompts on the transcription
See
GladiaAudioToLlmConfig
Inherited from
bitDepth?
optionalbitDepth:number
Bit depth for PCM audio
Common depths: 8, 16, 24, 32 16-bit is standard for most applications
channels?
optionalchannels:number
Number of audio channels
- 1: Mono (recommended for transcription)
- 2: Stereo
- 3-8: Multi-channel (provider-specific support)
codeSwitching?
optionalcodeSwitching:boolean
Enable code switching (multilingual audio detection) Supported by: Gladia
Inherited from
TranscribeOptions.codeSwitching
codeSwitchingConfig?
optionalcodeSwitchingConfig:CodeSwitchingConfigDTO
Code switching configuration (Gladia-specific)
See
GladiaCodeSwitchingConfig
Inherited from
TranscribeOptions.codeSwitchingConfig
customVocabulary?
optionalcustomVocabulary:string[]
Custom vocabulary to boost (provider-specific format)
Inherited from
TranscribeOptions.customVocabulary
deepgram?
optionaldeepgram:Partial<ListenV1MediaTranscribeParams>
Deepgram-specific options (passed directly to API)
See
https://developers.deepgram.com/reference/listen-file
Inherited from
deepgramStreaming?
optionaldeepgramStreaming:DeepgramStreamingOptions
Deepgram-specific streaming options (passed to WebSocket URL)
Includes filler_words, numerals, measurements, paragraphs, profanity_filter, topics, intents, custom_topic, custom_intent, keyterm, dictation, utt_split, and more.
See
https://developers.deepgram.com/docs/streaming
Example
await adapter.transcribeStream({
deepgramStreaming: {
fillerWords: true,
profanityFilter: true,
topics: true,
intents: true,
customTopic: ['sales', 'support'],
customIntent: ['purchase', 'complaint'],
numerals: true
}
});diarization?
optionaldiarization:boolean
Enable speaker diarization
Inherited from
encoding?
optionalencoding:AudioEncoding
Audio encoding format
Common formats:
linear16: PCM 16-bit (universal, recommended)mulaw: μ-law telephony codecalaw: A-law telephony codecflac,opus,speex: Advanced codecs (Deepgram only)
See
AudioEncoding for full list of supported formats
endpointing?
optionalendpointing:number
Utterance end silence threshold in milliseconds
entityDetection?
optionalentityDetection:boolean
Enable entity detection
Inherited from
TranscribeOptions.entityDetection
gladia?
optionalgladia:Partial<InitTranscriptionRequest>
Gladia-specific options (passed directly to API)
See
Inherited from
gladiaStreaming?
optionalgladiaStreaming:Partial<Omit<StreamingRequest,"encoding"|"channels"|"sample_rate"|"bit_depth">>
Gladia-specific streaming options (passed directly to API)
Includes pre_processing, realtime_processing, post_processing, messages_config, and callback configuration.
See
https://docs.gladia.io/api-reference/v2/live
Example
await adapter.transcribeStream({
gladiaStreaming: {
realtime_processing: {
words_accurate_timestamps: true
},
messages_config: {
receive_partial_transcripts: true
}
}
});interimResults?
optionalinterimResults:boolean
Enable interim results (partial transcripts)
language?
optionallanguage:TranscriptionLanguage
Language code with autocomplete from OpenAPI specs
Example
'en', 'en_us', 'fr', 'de', 'es'See
TranscriptionLanguage for full list
Inherited from
languageDetection?
optionallanguageDetection:boolean
Enable automatic language detection
Inherited from
TranscribeOptions.languageDetection
maxSilence?
optionalmaxSilence:number
Maximum duration without endpointing in seconds
model?
optionalmodel:TranscriptionModel
Model to use for transcription (provider-specific)
Type-safe with autocomplete for all known models:
- Deepgram: 'nova-2', 'nova-3', 'base', 'enhanced', 'whisper-large', etc.
- Gladia: 'solaria-1' (default)
- AssemblyAI: Not applicable (uses Universal-2 automatically)
Example
// Use Nova-2 for better multilingual support
{ model: 'nova-2', language: 'fr' }Overrides
openai?
optionalopenai:Partial<Omit<CreateTranscriptionRequest,"model"|"file">>
OpenAI Whisper-specific options (passed directly to API)
See
https://platform.openai.com/docs/api-reference/audio/createTranscription
Inherited from
openaiStreaming?
optionalopenaiStreaming:OpenAIStreamingOptions
OpenAI Realtime API streaming options
Configure the OpenAI Realtime WebSocket connection for audio transcription. Uses the Realtime API which supports real-time audio input transcription.
See
https://platform.openai.com/docs/guides/realtime
Example
await adapter.transcribeStream({
openaiStreaming: {
model: 'gpt-4o-realtime-preview',
voice: 'alloy',
turnDetection: {
type: 'server_vad',
threshold: 0.5,
silenceDurationMs: 500
}
}
});piiRedaction?
optionalpiiRedaction:boolean
Enable PII redaction
Inherited from
TranscribeOptions.piiRedaction
region?
optionalregion:StreamingSupportedRegions
Regional endpoint for streaming (Gladia only)
Gladia supports regional streaming endpoints for lower latency:
us-west: US West Coasteu-west: EU West (Ireland)
Example
import { GladiaRegion } from 'voice-router-dev/constants'
await adapter.transcribeStream({
region: GladiaRegion["us-west"]
})See
https://docs.gladia.io/api-reference/v2/live
sampleRate?
optionalsampleRate:number
Sample rate in Hz
Common rates: 8000, 16000, 32000, 44100, 48000 Most providers recommend 16000 Hz for optimal quality/performance
sentimentAnalysis?
optionalsentimentAnalysis:boolean
Enable sentiment analysis
Inherited from
TranscribeOptions.sentimentAnalysis
sonioxStreaming?
optionalsonioxStreaming:SonioxStreamingOptions
Soniox-specific streaming options
Configure the Soniox WebSocket connection for real-time transcription. Supports speaker diarization, language identification, translation, and custom context.
See
https://soniox.com/docs/stt/SDKs/web-sdk
Example
await adapter.transcribeStream({
sonioxStreaming: {
model: 'stt-rt-preview',
enableSpeakerDiarization: true,
enableEndpointDetection: true,
context: {
terms: ['TypeScript', 'React'],
text: 'Technical discussion'
},
translation: { type: 'one_way', target_language: 'es' }
}
});speakersExpected?
optionalspeakersExpected:number
Expected number of speakers (for diarization)
Inherited from
TranscribeOptions.speakersExpected
summarization?
optionalsummarization:boolean
Enable summarization
Inherited from
TranscribeOptions.summarization
wordTimestamps?
optionalwordTimestamps:boolean
Enable word-level timestamps
Inherited from
TranscribeOptions.wordTimestamps
StreamingSession
Represents an active streaming transcription session
Properties
close()
close: () =>
Promise<void>
Close the streaming session
Returns
Promise<void>
createdAt
createdAt:
Date
Session creation timestamp
getStatus()
getStatus: () =>
"open"|"connecting"|"closing"|"closed"
Get current session status
Returns
"open" | "connecting" | "closing" | "closed"
id
id:
string
Unique session ID
provider
provider:
TranscriptionProvider
Provider handling this stream
sendAudio()
sendAudio: (
chunk) =>Promise<void>
Send an audio chunk to the stream
Parameters
| Parameter | Type |
|---|---|
chunk | AudioChunk |
Returns
Promise<void>
SummarizationEvent
Post-processing summarization event
Properties
summary
summary:
string
Full summarization text
error?
optionalerror:string
Error if summarization failed
TranscribeOptions
Common transcription options across all providers
For provider-specific options, use the typed provider options:
deepgram: Full Deepgram API optionsassemblyai: Full AssemblyAI API optionsgladia: Full Gladia API options
Properties
assemblyai?
optionalassemblyai:Partial<TranscriptOptionalParams>
AssemblyAI-specific options (passed directly to API)
See
https://www.assemblyai.com/docs/api-reference/transcripts/submit
audioToLlm?
optionalaudioToLlm:AudioToLlmListConfigDTO
Audio-to-LLM configuration (Gladia-specific) Run custom LLM prompts on the transcription
See
GladiaAudioToLlmConfig
codeSwitching?
optionalcodeSwitching:boolean
Enable code switching (multilingual audio detection) Supported by: Gladia
codeSwitchingConfig?
optionalcodeSwitchingConfig:CodeSwitchingConfigDTO
Code switching configuration (Gladia-specific)
See
GladiaCodeSwitchingConfig
customVocabulary?
optionalcustomVocabulary:string[]
Custom vocabulary to boost (provider-specific format)
deepgram?
optionaldeepgram:Partial<ListenV1MediaTranscribeParams>
Deepgram-specific options (passed directly to API)
See
https://developers.deepgram.com/reference/listen-file
diarization?
optionaldiarization:boolean
Enable speaker diarization
entityDetection?
optionalentityDetection:boolean
Enable entity detection
gladia?
optionalgladia:Partial<InitTranscriptionRequest>
Gladia-specific options (passed directly to API)
See
language?
optionallanguage:TranscriptionLanguage
Language code with autocomplete from OpenAPI specs
Example
'en', 'en_us', 'fr', 'de', 'es'See
TranscriptionLanguage for full list
languageDetection?
optionallanguageDetection:boolean
Enable automatic language detection
model?
optionalmodel:TranscriptionModel
Model to use for transcription (provider-specific)
Type-safe model selection derived from OpenAPI specs:
- Deepgram: 'nova-3', 'nova-2', 'enhanced', 'base', etc.
- AssemblyAI: 'best', 'slam-1', 'universal'
- Speechmatics: 'standard', 'enhanced' (operating point)
- Gladia: 'solaria-1' (streaming only)
See
TranscriptionModel for full list of available models
openai?
optionalopenai:Partial<Omit<CreateTranscriptionRequest,"model"|"file">>
OpenAI Whisper-specific options (passed directly to API)
See
https://platform.openai.com/docs/api-reference/audio/createTranscription
piiRedaction?
optionalpiiRedaction:boolean
Enable PII redaction
sentimentAnalysis?
optionalsentimentAnalysis:boolean
Enable sentiment analysis
speakersExpected?
optionalspeakersExpected:number
Expected number of speakers (for diarization)
summarization?
optionalsummarization:boolean
Enable summarization
webhookUrl?
optionalwebhookUrl:string
Webhook URL for async results
wordTimestamps?
optionalwordTimestamps:boolean
Enable word-level timestamps
TranscriptData
Transcript data structure
Contains the core transcript information returned by getTranscript and listTranscripts.
Example
const result = await router.getTranscript('abc123', 'assemblyai');
if (result.success && result.data) {
console.log(result.data.id); // string
console.log(result.data.text); // string
console.log(result.data.status); // TranscriptionStatus
console.log(result.data.metadata); // TranscriptMetadata
}Properties
id
id:
string
Unique transcript ID
status
status:
TranscriptionStatus
Transcription status
text
text:
string
Full transcribed text (empty for list items)
completedAt?
optionalcompletedAt:string
Completion timestamp (shorthand for metadata.completedAt)
confidence?
optionalconfidence:number
Overall confidence score (0-1)
createdAt?
optionalcreatedAt:string
Creation timestamp (shorthand for metadata.createdAt)
duration?
optionalduration:number
Audio duration in seconds
language?
optionallanguage:string
Detected or specified language code
metadata?
optionalmetadata:TranscriptMetadata
Transcript metadata
speakers?
optionalspeakers:Speaker[]
Speaker diarization results
summary?
optionalsummary:string
Summary of the content (if summarization enabled)
utterances?
optionalutterances:Utterance[]
Utterances (speaker turns)
words?
optionalwords:Word[]
Word-level transcription with timestamps
TranscriptMetadata
Transcript metadata with typed common fields
Contains provider-agnostic metadata fields that are commonly available. Provider-specific fields can be accessed via the index signature.
Example
const { transcripts } = await router.listTranscripts('assemblyai', { limit: 20 });
transcripts.forEach(item => {
console.log(item.data?.metadata?.audioUrl); // string | undefined
console.log(item.data?.metadata?.createdAt); // string | undefined
console.log(item.data?.metadata?.audioDuration); // number | undefined
});Indexable
[key: string]: unknown
Provider-specific fields
Properties
audioDuration?
optionalaudioDuration:number
Audio duration in seconds
audioFileAvailable?
optionalaudioFileAvailable:boolean
True if the provider stored the audio and it can be downloaded via adapter.getAudioFile(). Currently only Gladia supports this - other providers discard audio after processing.
Example
if (item.data?.metadata?.audioFileAvailable) {
const audio = await gladiaAdapter.getAudioFile(item.data.id)
// audio.data is a Blob
}completedAt?
optionalcompletedAt:string
Completion timestamp (ISO 8601)
createdAt?
optionalcreatedAt:string
Creation timestamp (ISO 8601)
customMetadata?
optionalcustomMetadata:Record<string,unknown>
Custom metadata (Gladia)
displayName?
optionaldisplayName:string
Display name (Azure)
filesUrl?
optionalfilesUrl:string
Files URL (Azure)
kind?
optionalkind:"batch"|"streaming"|"pre-recorded"|"live"
Transcript type
lastActionAt?
optionallastActionAt:string
Last action timestamp (Azure)
resourceUrl?
optionalresourceUrl:string
Resource URL for the transcript
sourceAudioUrl?
optionalsourceAudioUrl:string
Original audio URL/source you provided to the API (echoed back). This is NOT a provider-hosted URL - it's what you sent when creating the transcription.
TranslationEvent
Translation event data (for real-time translation)
Properties
targetLanguage
targetLanguage:
string
Target language
translatedText
translatedText:
string
Translated text
isFinal?
optionalisFinal:boolean
Whether this is a final translation
original?
optionaloriginal:string
Original text
utteranceId?
optionalutteranceId:string
Utterance ID this translation belongs to
UnifiedTranscriptResponse
Unified transcription response with provider-specific type safety
When a specific provider is known at compile time, both raw and extended
fields will be typed with that provider's actual types.
Examples
const result: UnifiedTranscriptResponse<'assemblyai'> = await adapter.transcribe(audio);
// result.raw is typed as AssemblyAITranscript
// result.extended is typed as AssemblyAIExtendedData
const chapters = result.extended?.chapters; // AssemblyAIChapter[] | undefined
const entities = result.extended?.entities; // AssemblyAIEntity[] | undefinedconst result: UnifiedTranscriptResponse<'gladia'> = await gladiaAdapter.transcribe(audio);
const translation = result.extended?.translation; // GladiaTranslation | undefined
const llmResults = result.extended?.audioToLlm; // GladiaAudioToLlmResult | undefinedconst result: UnifiedTranscriptResponse = await router.transcribe(audio);
// result.raw is typed as unknown (could be any provider)
// result.extended is typed as union of all extended typesType Parameters
| Type Parameter | Default type | Description |
|---|---|---|
P extends TranscriptionProvider | TranscriptionProvider | The transcription provider (defaults to all providers) |
Properties
provider
provider:
P
Provider that performed the transcription
success
success:
boolean
Operation success status
data?
optionaldata:TranscriptData
Transcription data (only present on success)
error?
optionalerror:object
Error information (only present on failure)
code
code:
string
Error code (provider-specific or normalized)
message
message:
string
Human-readable error message
details?
optionaldetails:unknown
Additional error details
statusCode?
optionalstatusCode:number
HTTP status code if applicable
extended?
optionalextended:Pextends keyofProviderExtendedDataMap?ProviderExtendedDataMap[P<P>] :unknown
Extended provider-specific data (fully typed from OpenAPI specs)
Contains rich data beyond basic transcription:
- AssemblyAI: chapters, entities, sentiment, content safety, topics
- Gladia: translation, moderation, entities, audio-to-llm, chapters
- Deepgram: detailed metadata, request tracking, model info
Example
const result = await assemblyaiAdapter.transcribe(audio, { summarization: true });
result.extended?.chapters?.forEach(chapter => {
console.log(`${chapter.headline}: ${chapter.summary}`);
});raw?
optionalraw:Pextends keyofProviderRawResponseMap?ProviderRawResponseMap[P<P>] :unknown
Raw provider response (for advanced usage)
Type-safe based on the provider:
gladia: PreRecordedResponsedeepgram: ListenV1Responseopenai-whisper: CreateTranscription200Oneassemblyai: AssemblyAITranscriptazure-stt: AzureTranscription
tracking?
optionaltracking:object
Request tracking information for debugging
audioHash?
optionalaudioHash:string
Audio fingerprint (SHA256) if available
processingTimeMs?
optionalprocessingTimeMs:number
Processing duration in milliseconds
requestId?
optionalrequestId:string
Provider's request/job ID
Utterance
Utterance (sentence or phrase by a single speaker)
Normalized from provider-specific types:
- Gladia:
UtteranceDTO - AssemblyAI:
TranscriptUtterance - Deepgram:
ListenV1ResponseResultsUtterancesItem
Properties
end
end:
number
End time in seconds
start
start:
number
Start time in seconds
text
text:
string
The transcribed text
channel?
optionalchannel:number
Audio channel number (for multi-channel/stereo recordings)
Channel numbering varies by provider:
- AssemblyAI: 1=left, 2=right, sequential for additional channels
- Deepgram: 0-indexed channel number
- Gladia: 0-indexed channel number
confidence?
optionalconfidence:number
Confidence score (0-1)
id?
optionalid:string
Unique utterance identifier (provider-assigned)
Available from: Deepgram Useful for linking utterances to other data (entities, sentiment, etc.)
language?
optionallanguage:string
Detected language for this utterance (BCP-47 code)
Available from: Gladia (with code-switching enabled) Essential for multilingual transcription where language changes mid-conversation.
Example
'en', 'es', 'fr', 'de'See
TranscriptionLanguage for full list of supported codes
speaker?
optionalspeaker:string
Speaker ID
words?
optionalwords:Word[]
Words in this utterance
Word
Word-level transcription with timing
Normalized from provider-specific types:
- Gladia:
WordDTO - AssemblyAI:
TranscriptWord - Deepgram:
ListenV1ResponseResultsChannelsItemAlternativesItemWordsItem
Properties
end
end:
number
End time in seconds
start
start:
number
Start time in seconds
word
word:
string
The transcribed word
channel?
optionalchannel:number
Audio channel number (for multi-channel/stereo recordings)
Channel numbering varies by provider:
- AssemblyAI: 1=left, 2=right, sequential for additional channels
- Deepgram: 0-indexed channel number
- Gladia: 0-indexed channel number
confidence?
optionalconfidence:number
Confidence score (0-1)
speaker?
optionalspeaker:string
Speaker ID if diarization is enabled
Type Aliases
AudioInput
AudioInput =
AudioInputUrl|AudioInputFile|AudioInputStream
Union of all audio input types
BatchOnlyProvider
BatchOnlyProvider =
BatchOnlyProviderType
Providers that only support batch/async transcription
Automatically derived from providers where streaming is false or undefined. Note: Speechmatics has a WebSocket API but streaming is not yet implemented in this SDK.
ProviderExtendedDataMap
ProviderExtendedDataMap =
object
Map of provider names to their extended data types
Properties
assemblyai
assemblyai:
AssemblyAIExtendedData
azure-stt
azure-stt:
Record<string,never>
deepgram
deepgram:
DeepgramExtendedData
gladia
gladia:
GladiaExtendedData
openai-whisper
openai-whisper:
Record<string,never>
soniox
soniox:
Record<string,never>
speechmatics
speechmatics:
Record<string,never>
ProviderRawResponseMap
ProviderRawResponseMap =
object
Map of provider names to their raw response types Enables type-safe access to provider-specific raw responses
Properties
assemblyai
assemblyai:
AssemblyAITranscript
azure-stt
azure-stt:
AzureTranscription
deepgram
deepgram:
ListenV1Response
gladia
gladia:
PreRecordedResponse
openai-whisper
openai-whisper:
CreateTranscription200One
soniox
soniox:
unknown
speechmatics
speechmatics:
unknown
SessionStatus
SessionStatus =
"connecting"|"open"|"closing"|"closed"
WebSocket session status for streaming transcription
SpeechmaticsOperatingPoint
SpeechmaticsOperatingPoint =
"standard"|"enhanced"
Speechmatics operating point (model) type Manually defined as Speechmatics OpenAPI spec doesn't export this cleanly
StreamEventType
StreamEventType =
"open"|"transcript"|"utterance"|"metadata"|"error"|"close"|"speech_start"|"speech_end"|"translation"|"sentiment"|"entity"|"summarization"|"chapterization"|"audio_ack"|"lifecycle"
Streaming transcription event types
StreamingProvider
StreamingProvider =
StreamingProviderType
Providers that support real-time streaming transcription
This type is automatically derived from ProviderCapabilitiesMap.streaming in provider-metadata.ts
No manual sync needed - if you set streaming: true for a provider, it's included here.
TranscriptionLanguage
TranscriptionLanguage =
AssemblyAILanguageCode|GladiaLanguageCode|DeepgramLanguageCode|SonioxLanguageCode|SpeechmaticsLanguageCode|AzureLocaleCode
Unified transcription language type with autocomplete for all providers
Strict union type - only accepts valid language codes from each provider's auto-generated types. This ensures compile-time validation of language codes.
Provider language sources:
- AssemblyAI: OpenAPI spec enum (102 languages)
- Gladia: OpenAPI spec enum (99 languages)
- Deepgram: Auto-generated from /v1/models API (161 BCP-47 codes)
- Soniox: Auto-generated from OpenAPI spec (60 languages)
- Speechmatics: Auto-generated from Feature Discovery API (62 languages)
- Azure: Auto-generated from Microsoft docs (154 locales)
Use provider const objects for autocomplete:
Example
import { DeepgramLanguage, SonioxLanguage } from 'voice-router-dev/constants'
{ language: DeepgramLanguage["en-US"] }
{ language: SonioxLanguage.en }TranscriptionModel
TranscriptionModel =
DeepgramModelType|StreamingSupportedModels|AssemblyAISpeechModel|SonioxModelCode|SpeechmaticsOperatingPoint
Unified transcription model type with autocomplete for all providers
Strict union type - only accepts valid models from each provider:
- Deepgram: nova-3, nova-2, enhanced, base, etc.
- AssemblyAI: best, slam-1, universal
- Gladia: solaria-1
- Soniox: stt-rt-v3, stt-rt-preview, stt-async-v3, etc.
- Speechmatics: standard, enhanced
Use provider const objects for autocomplete:
Example
import { DeepgramModel, SonioxModel } from 'voice-router-dev'
{ model: DeepgramModel["nova-3"] }
{ model: SonioxModel.stt_rt_v3 }TranscriptionProvider
TranscriptionProvider =
"gladia"|"assemblyai"|"deepgram"|"openai-whisper"|"azure-stt"|"speechmatics"|"soniox"
Supported transcription provider identifiers
TranscriptionStatus
TranscriptionStatus =
"queued"|"processing"|"completed"|"error"
Transcription status