router/types
Voice Router SDK - Azure Speech-to-Text Provider / router/types
router/types
Interfaces
AssemblyAIExtendedData
Extended data from AssemblyAI transcription Includes chapters, entities, sentiment, content safety, and more
Properties
chapters?
optionalchapters:Chapter[]
Auto-generated chapters with summaries
contentSafety?
optionalcontentSafety:ContentSafetyLabelsResult
Content safety/moderation labels
entities?
optionalentities:Entity[]
Detected named entities (people, organizations, locations)
highlights?
optionalhighlights:AutoHighlightsResult
Key phrases and highlights
languageConfidence?
optionallanguageConfidence:number
Language detection confidence (0-1)
sentimentResults?
optionalsentimentResults:SentimentAnalysisResult[]
Per-utterance sentiment analysis results
throttled?
optionalthrottled:boolean
Whether the request was throttled
topics?
optionaltopics:TopicDetectionModelResult
IAB topic categories
AudioAckEvent
Audio chunk acknowledgment event
Properties
byteRange?
optionalbyteRange: [number,number]
Byte range of the acknowledged audio chunk [start, end]
timeRange?
optionaltimeRange: [number,number]
Time range in seconds of the acknowledged audio chunk [start, end]
timestamp?
optionaltimestamp:string
Acknowledgment timestamp
AudioChunk
Audio chunk for streaming transcription
Properties
data
data:
Buffer<ArrayBufferLike> |Uint8Array<ArrayBufferLike>
Audio data as Buffer or Uint8Array
isLast?
optionalisLast:boolean
Whether this is the last chunk
ChapterizationEvent
Post-processing chapterization event
Properties
chapters
chapters:
object[]
Generated chapters
end
end:
number
End time in seconds
headline
headline:
string
Chapter title/headline
start
start:
number
Start time in seconds
summary
summary:
string
Chapter summary
error?
optionalerror:string
Error if chapterization failed
DeepgramExtendedData
Extended data from Deepgram transcription Includes detailed metadata, model info, and feature-specific data
Properties
metadata?
optionalmetadata:ListenV1ResponseMetadata
Full response metadata
modelInfo?
optionalmodelInfo:Record<string,unknown>
Model versions used
requestId?
optionalrequestId:string
Request ID for debugging/tracking
sha256?
optionalsha256:string
SHA256 hash of the audio
tags?
optionaltags:string[]
Tags echoed back from request
EntityEvent
Named entity recognition result
Properties
text
text:
string
Entity text
type
type:
string
Entity type (PERSON, ORGANIZATION, LOCATION, etc.)
end?
optionalend:number
End position
start?
optionalstart:number
Start position
utteranceId?
optionalutteranceId:string
Utterance ID this entity belongs to
GladiaExtendedData
Extended data from Gladia transcription Includes translation, moderation, entities, LLM outputs, and more
Properties
audioToLlm?
optionalaudioToLlm:AudioToLlmListDTO
Audio-to-LLM custom prompt results
chapters?
optionalchapters:ChapterizationDTO
Auto-generated chapters
customMetadata?
optionalcustomMetadata:Record<string,unknown>
Custom metadata echoed back
entities?
optionalentities:NamedEntityRecognitionDTO
Named entity recognition results
moderation?
optionalmoderation:ModerationDTO
Content moderation results
sentiment?
optionalsentiment:SentimentAnalysisDTO
Sentiment analysis results
speakerReidentification?
optionalspeakerReidentification:SpeakerReidentificationDTO
AI speaker reidentification results
structuredData?
optionalstructuredData:StructuredDataExtractionDTO
Structured data extraction results
translation?
optionaltranslation:TranslationDTO
Translation results (if translation enabled)
LifecycleEvent
Lifecycle event (session start, recording end, etc.)
Properties
eventType
eventType:
"start_session"|"start_recording"|"stop_recording"|"end_recording"|"end_session"
Lifecycle event type
sessionId?
optionalsessionId:string
Session ID
timestamp?
optionaltimestamp:string
Event timestamp
ListTranscriptsOptions
Options for listing transcripts with date/time filtering
Providers support different filtering capabilities:
- AssemblyAI: status, created_on, before_id, after_id, throttled_only
- Gladia: status, date, before_date, after_date, custom_metadata
- Azure: status, skip, top, filter (OData)
- Deepgram: start, end, status, page, request_id, endpoint (requires projectId)
Examples
await adapter.listTranscripts({
date: '2026-01-07', // Exact date (ISO format)
status: 'completed',
limit: 50
})await adapter.listTranscripts({
afterDate: '2026-01-01',
beforeDate: '2026-01-31',
limit: 100
})Properties
afterDate?
optionalafterDate:string
Filter for transcripts created after this date (ISO format)
assemblyai?
optionalassemblyai:Partial<ListTranscriptsParams>
AssemblyAI-specific list options
beforeDate?
optionalbeforeDate:string
Filter for transcripts created before this date (ISO format)
date?
optionaldate:string
Filter by exact date (ISO format: YYYY-MM-DD)
deepgram?
optionaldeepgram:Partial<ManageV1ProjectsRequestsListParams>
Deepgram-specific list options (request history)
gladia?
optionalgladia:Partial<TranscriptionControllerListV2Params>
Gladia-specific list options
limit?
optionallimit:number
Maximum number of transcripts to retrieve
offset?
optionaloffset:number
Pagination offset (skip N results)
status?
optionalstatus:string
Filter by transcript status
ListTranscriptsResponse
Response from listTranscripts
Example
import type { ListTranscriptsResponse } from 'voice-router-dev';
const response: ListTranscriptsResponse = await router.listTranscripts('assemblyai', {
status: 'completed',
limit: 50
});
response.transcripts.forEach(item => {
console.log(item.data?.id, item.data?.status);
});
if (response.hasMore) {
// Fetch next page
}Properties
transcripts
transcripts:
UnifiedTranscriptResponse<TranscriptionProvider>[]
List of transcripts
hasMore?
optionalhasMore:boolean
Whether more results are available
total?
optionaltotal:number
Total count (if available from provider)
ProviderCapabilities
Provider capability flags
Each boolean indicates whether the provider supports a specific feature. Use ProviderCapabilitiesMap from provider-metadata for runtime access.
Properties
customVocabulary
customVocabulary:
boolean
Custom vocabulary/keyword boosting
deleteTranscript
deleteTranscript:
boolean
Delete transcriptions
diarization
diarization:
boolean
Speaker diarization (identifying different speakers)
entityDetection
entityDetection:
boolean
Entity detection
languageDetection
languageDetection:
boolean
Automatic language detection
listTranscripts
listTranscripts:
boolean
List/fetch previous transcriptions
piiRedaction
piiRedaction:
boolean
PII redaction
sentimentAnalysis
sentimentAnalysis:
boolean
Sentiment analysis
streaming
streaming:
boolean
Real-time streaming transcription support
summarization
summarization:
boolean
Audio summarization
wordTimestamps
wordTimestamps:
boolean
Word-level timestamps
getAudioFile?
optionalgetAudioFile:boolean
Download original audio file
SentimentEvent
Sentiment analysis result (for real-time sentiment)
Properties
sentiment
sentiment:
string
Sentiment label (positive, negative, neutral)
confidence?
optionalconfidence:number
Confidence score 0-1
utteranceId?
optionalutteranceId:string
Utterance ID this sentiment belongs to
Speaker
Speaker information from diarization
Properties
id
id:
string
Speaker identifier (e.g., "A", "B", "speaker_0")
confidence?
optionalconfidence:number
Confidence score for speaker identification (0-1)
label?
optionallabel:string
Speaker label if known
SpeechEvent
Speech event data (for speech_start/speech_end events)
Properties
timestamp
timestamp:
number
Timestamp in seconds
type
type:
"speech_start"|"speech_end"
Event type: speech_start or speech_end
channel?
optionalchannel:number
Channel number
sessionId?
optionalsessionId:string
Session ID
StreamEvent
Streaming transcription event
Properties
type
type:
StreamEventType
channel?
optionalchannel:number
Channel number for multi-channel audio
confidence?
optionalconfidence:number
Confidence score for this event
data?
optionaldata:unknown
Additional event data
error?
optionalerror:object
Error information (for type: "error")
code
code:
string
message
message:
string
details?
optionaldetails:unknown
isFinal?
optionalisFinal:boolean
Whether this is a final transcript (vs interim)
language?
optionallanguage:string
Language of the transcript/utterance
speaker?
optionalspeaker:string
Speaker ID if diarization is enabled
text?
optionaltext:string
Partial transcript text (for type: "transcript")
utterance?
optionalutterance:Utterance
Utterance data (for type: "utterance")
words?
optionalwords:Word[]
Words in this event
StreamingCallbacks
Callback functions for streaming events
Properties
onAudioAck()?
optionalonAudioAck: (event) =>void
Called for audio chunk acknowledgments (Gladia: requires receive_acknowledgments)
Parameters
| Parameter | Type |
|---|---|
event | AudioAckEvent |
Returns
void
onChapterization()?
optionalonChapterization: (event) =>void
Called when post-processing chapterization completes (Gladia: requires chapterization enabled)
Parameters
| Parameter | Type |
|---|---|
event | ChapterizationEvent |
Returns
void
onClose()?
optionalonClose: (code?,reason?) =>void
Called when the stream is closed
Parameters
| Parameter | Type |
|---|---|
code? | number |
reason? | string |
Returns
void
onEntity()?
optionalonEntity: (event) =>void
Called for named entity recognition (Gladia: requires named_entity_recognition enabled)
Parameters
| Parameter | Type |
|---|---|
event | EntityEvent |
Returns
void
onError()?
optionalonError: (error) =>void
Called when an error occurs
Parameters
| Parameter | Type |
|---|---|
error | { code: string; message: string; details?: unknown; } |
error.code | string |
error.message | string |
error.details? | unknown |
Returns
void
onLifecycle()?
optionalonLifecycle: (event) =>void
Called for session lifecycle events (Gladia: requires receive_lifecycle_events)
Parameters
| Parameter | Type |
|---|---|
event | LifecycleEvent |
Returns
void
onMetadata()?
optionalonMetadata: (metadata) =>void
Called when metadata is received
Parameters
| Parameter | Type |
|---|---|
metadata | Record<string, unknown> |
Returns
void
onOpen()?
optionalonOpen: () =>void
Called when connection is established
Returns
void
onSentiment()?
optionalonSentiment: (event) =>void
Called for real-time sentiment analysis (Gladia: requires sentiment_analysis enabled)
Parameters
| Parameter | Type |
|---|---|
event | SentimentEvent |
Returns
void
onSpeechEnd()?
optionalonSpeechEnd: (event) =>void
Called when speech ends (Gladia: requires receive_speech_events)
Parameters
| Parameter | Type |
|---|---|
event | SpeechEvent |
Returns
void
onSpeechStart()?
optionalonSpeechStart: (event) =>void
Called when speech starts (Gladia: requires receive_speech_events)
Parameters
| Parameter | Type |
|---|---|
event | SpeechEvent |
Returns
void
onSummarization()?
optionalonSummarization: (event) =>void
Called when post-processing summarization completes (Gladia: requires summarization enabled)
Parameters
| Parameter | Type |
|---|---|
event | SummarizationEvent |
Returns
void
onTranscript()?
optionalonTranscript: (event) =>void
Called when a transcript (interim or final) is received
Parameters
| Parameter | Type |
|---|---|
event | StreamEvent |
Returns
void
onTranslation()?
optionalonTranslation: (event) =>void
Called for real-time translation (Gladia: requires translation enabled)
Parameters
| Parameter | Type |
|---|---|
event | TranslationEvent |
Returns
void
onUtterance()?
optionalonUtterance: (utterance) =>void
Called when a complete utterance is detected
Parameters
| Parameter | Type |
|---|---|
utterance | Utterance |
Returns
void
StreamingOptions
Options for streaming transcription
Extends
Omit<TranscribeOptions,"webhookUrl">
Properties
assemblyai?
optionalassemblyai:Partial<TranscriptOptionalParams>
AssemblyAI-specific options (passed directly to API)
See
https://www.assemblyai.com/docs/api-reference/transcripts/submit
Inherited from
assemblyaiStreaming?
optionalassemblyaiStreaming:AssemblyAIStreamingOptions
AssemblyAI-specific streaming options (passed to WebSocket URL & configuration)
Includes end-of-turn detection tuning, VAD threshold, profanity filter, keyterms, speech model selection, and language detection.
See
https://www.assemblyai.com/docs/speech-to-text/streaming
Example
await adapter.transcribeStream({
assemblyaiStreaming: {
speechModel: 'universal-streaming-multilingual',
languageDetection: true,
endOfTurnConfidenceThreshold: 0.7,
minEndOfTurnSilenceWhenConfident: 500,
vadThreshold: 0.3,
formatTurns: true,
filterProfanity: true,
keyterms: ['TypeScript', 'JavaScript', 'API']
}
});audioToLlm?
optionalaudioToLlm:AudioToLlmListConfigDTO
Audio-to-LLM configuration (Gladia-specific) Run custom LLM prompts on the transcription
See
GladiaAudioToLlmConfig
Inherited from
bitDepth?
optionalbitDepth:number
Bit depth for PCM audio
Common depths: 8, 16, 24, 32 16-bit is standard for most applications
channels?
optionalchannels:number
Number of audio channels
- 1: Mono (recommended for transcription)
- 2: Stereo
- 3-8: Multi-channel (provider-specific support)
codeSwitching?
optionalcodeSwitching:boolean
Enable code switching (multilingual audio detection) Supported by: Gladia
Inherited from
TranscribeOptions.codeSwitching
codeSwitchingConfig?
optionalcodeSwitchingConfig:CodeSwitchingConfigDTO
Code switching configuration (Gladia-specific)
See
GladiaCodeSwitchingConfig
Inherited from
TranscribeOptions.codeSwitchingConfig
customVocabulary?
optionalcustomVocabulary:string[]
Custom vocabulary to boost (provider-specific format)
Inherited from
TranscribeOptions.customVocabulary
deepgram?
optionaldeepgram:Partial<ListenV1MediaTranscribeParams>
Deepgram-specific options (passed directly to API)
See
https://developers.deepgram.com/reference/listen-file
Inherited from
deepgramStreaming?
optionaldeepgramStreaming:DeepgramStreamingOptions
Deepgram-specific streaming options (passed to WebSocket URL)
Includes filler_words, numerals, measurements, paragraphs, profanity_filter, topics, intents, custom_topic, custom_intent, keyterm, dictation, utt_split, and more.
See
https://developers.deepgram.com/docs/streaming
Example
await adapter.transcribeStream({
deepgramStreaming: {
fillerWords: true,
profanityFilter: true,
topics: true,
intents: true,
customTopic: ['sales', 'support'],
customIntent: ['purchase', 'complaint'],
numerals: true
}
});diarization?
optionaldiarization:boolean
Enable speaker diarization
Inherited from
encoding?
optionalencoding:AudioEncoding
Audio encoding format
Common formats:
linear16: PCM 16-bit (universal, recommended)mulaw: μ-law telephony codecalaw: A-law telephony codecflac,opus,speex: Advanced codecs (Deepgram only)
See
AudioEncoding for full list of supported formats
endpointing?
optionalendpointing:number
Utterance end silence threshold in milliseconds
entityDetection?
optionalentityDetection:boolean
Enable entity detection
Inherited from
TranscribeOptions.entityDetection
gladia?
optionalgladia:Partial<InitTranscriptionRequest>
Gladia-specific options (passed directly to API)
See
Inherited from
gladiaStreaming?
optionalgladiaStreaming:Partial<Omit<StreamingRequest,"encoding"|"channels"|"sample_rate"|"bit_depth">>
Gladia-specific streaming options (passed directly to API)
Includes pre_processing, realtime_processing, post_processing, messages_config, and callback configuration.
See
https://docs.gladia.io/api-reference/v2/live
Example
await adapter.transcribeStream({
gladiaStreaming: {
realtime_processing: {
words_accurate_timestamps: true
},
messages_config: {
receive_partial_transcripts: true
}
}
});interimResults?
optionalinterimResults:boolean
Enable interim results (partial transcripts)
language?
optionallanguage:string
Language code with autocomplete from OpenAPI specs
Example
'en', 'en_us', 'fr', 'de', 'es'See
TranscriptionLanguage for full list
Inherited from
languageDetection?
optionallanguageDetection:boolean
Enable automatic language detection
Inherited from
TranscribeOptions.languageDetection
maxSilence?
optionalmaxSilence:number
Maximum duration without endpointing in seconds
model?
optionalmodel:TranscriptionModel
Model to use for transcription (provider-specific)
Type-safe with autocomplete for all known models:
- Deepgram: 'nova-2', 'nova-3', 'base', 'enhanced', 'whisper-large', etc.
- Gladia: 'solaria-1' (default)
- AssemblyAI: Not applicable (uses Universal-2 automatically)
Example
// Use Nova-2 for better multilingual support
{ model: 'nova-2', language: 'fr' }Overrides
openai?
optionalopenai:Partial<Omit<CreateTranscriptionRequest,"model"|"file">>
OpenAI Whisper-specific options (passed directly to API)
See
https://platform.openai.com/docs/api-reference/audio/createTranscription
Inherited from
openaiStreaming?
optionalopenaiStreaming:OpenAIStreamingOptions
OpenAI Realtime API streaming options
Configure the OpenAI Realtime WebSocket connection for audio transcription. Uses the Realtime API which supports real-time audio input transcription.
See
https://platform.openai.com/docs/guides/realtime
Example
await adapter.transcribeStream({
openaiStreaming: {
model: 'gpt-4o-realtime-preview',
voice: 'alloy',
turnDetection: {
type: 'server_vad',
threshold: 0.5,
silenceDurationMs: 500
}
}
});piiRedaction?
optionalpiiRedaction:boolean
Enable PII redaction
Inherited from
TranscribeOptions.piiRedaction
region?
optionalregion:StreamingSupportedRegions
Regional endpoint for streaming (Gladia only)
Gladia supports regional streaming endpoints for lower latency:
us-west: US West Coasteu-west: EU West (Ireland)
Example
import { GladiaRegion } from 'voice-router-dev/constants'
await adapter.transcribeStream({
region: GladiaRegion["us-west"]
})See
https://docs.gladia.io/api-reference/v2/live
sampleRate?
optionalsampleRate:number
Sample rate in Hz
Common rates: 8000, 16000, 32000, 44100, 48000 Most providers recommend 16000 Hz for optimal quality/performance
sentimentAnalysis?
optionalsentimentAnalysis:boolean
Enable sentiment analysis
Inherited from
TranscribeOptions.sentimentAnalysis
sonioxStreaming?
optionalsonioxStreaming:SonioxStreamingOptions
Soniox-specific streaming options
Configure the Soniox WebSocket connection for real-time transcription. Supports speaker diarization, language identification, translation, and custom context.
See
https://soniox.com/docs/stt/SDKs/web-sdk
Example
await adapter.transcribeStream({
sonioxStreaming: {
model: 'stt-rt-preview',
enableSpeakerDiarization: true,
enableEndpointDetection: true,
context: {
terms: ['TypeScript', 'React'],
text: 'Technical discussion'
},
translation: { type: 'one_way', target_language: 'es' }
}
});speakersExpected?
optionalspeakersExpected:number
Expected number of speakers (for diarization)
Inherited from
TranscribeOptions.speakersExpected
summarization?
optionalsummarization:boolean
Enable summarization
Inherited from
TranscribeOptions.summarization
wordTimestamps?
optionalwordTimestamps:boolean
Enable word-level timestamps
Inherited from
TranscribeOptions.wordTimestamps
StreamingSession
Represents an active streaming transcription session
Properties
close()
close: () =>
Promise<void>
Close the streaming session
Returns
Promise<void>
createdAt
createdAt:
Date
Session creation timestamp
getStatus()
getStatus: () =>
"open"|"connecting"|"closing"|"closed"
Get current session status
Returns
"open" | "connecting" | "closing" | "closed"
id
id:
string
Unique session ID
provider
provider:
TranscriptionProvider
Provider handling this stream
sendAudio()
sendAudio: (
chunk) =>Promise<void>
Send an audio chunk to the stream
Parameters
| Parameter | Type |
|---|---|
chunk | AudioChunk |
Returns
Promise<void>
SummarizationEvent
Post-processing summarization event
Properties
summary
summary:
string
Full summarization text
error?
optionalerror:string
Error if summarization failed
TranscribeOptions
Common transcription options across all providers
For provider-specific options, use the typed provider options:
deepgram: Full Deepgram API optionsassemblyai: Full AssemblyAI API optionsgladia: Full Gladia API options
Properties
assemblyai?
optionalassemblyai:Partial<TranscriptOptionalParams>
AssemblyAI-specific options (passed directly to API)
See
https://www.assemblyai.com/docs/api-reference/transcripts/submit
audioToLlm?
optionalaudioToLlm:AudioToLlmListConfigDTO
Audio-to-LLM configuration (Gladia-specific) Run custom LLM prompts on the transcription
See
GladiaAudioToLlmConfig
codeSwitching?
optionalcodeSwitching:boolean
Enable code switching (multilingual audio detection) Supported by: Gladia
codeSwitchingConfig?
optionalcodeSwitchingConfig:CodeSwitchingConfigDTO
Code switching configuration (Gladia-specific)
See
GladiaCodeSwitchingConfig
customVocabulary?
optionalcustomVocabulary:string[]
Custom vocabulary to boost (provider-specific format)
deepgram?
optionaldeepgram:Partial<ListenV1MediaTranscribeParams>
Deepgram-specific options (passed directly to API)
See
https://developers.deepgram.com/reference/listen-file
diarization?
optionaldiarization:boolean
Enable speaker diarization
entityDetection?
optionalentityDetection:boolean
Enable entity detection
gladia?
optionalgladia:Partial<InitTranscriptionRequest>
Gladia-specific options (passed directly to API)
See
language?
optionallanguage:string
Language code with autocomplete from OpenAPI specs
Example
'en', 'en_us', 'fr', 'de', 'es'See
TranscriptionLanguage for full list
languageDetection?
optionallanguageDetection:boolean
Enable automatic language detection
model?
optionalmodel:TranscriptionModel
Model to use for transcription (provider-specific)
Type-safe model selection derived from OpenAPI specs:
- Deepgram: 'nova-3', 'nova-2', 'enhanced', 'base', etc.
- AssemblyAI: 'best', 'slam-1', 'universal'
- Speechmatics: 'standard', 'enhanced' (operating point)
- Gladia: 'solaria-1' (streaming only)
See
TranscriptionModel for full list of available models
openai?
optionalopenai:Partial<Omit<CreateTranscriptionRequest,"model"|"file">>
OpenAI Whisper-specific options (passed directly to API)
See
https://platform.openai.com/docs/api-reference/audio/createTranscription
piiRedaction?
optionalpiiRedaction:boolean
Enable PII redaction
sentimentAnalysis?
optionalsentimentAnalysis:boolean
Enable sentiment analysis
speakersExpected?
optionalspeakersExpected:number
Expected number of speakers (for diarization)
summarization?
optionalsummarization:boolean
Enable summarization
webhookUrl?
optionalwebhookUrl:string
Webhook URL for async results
wordTimestamps?
optionalwordTimestamps:boolean
Enable word-level timestamps
TranscriptData
Transcript data structure
Contains the core transcript information returned by getTranscript and listTranscripts.
Example
const result = await router.getTranscript('abc123', 'assemblyai');
if (result.success && result.data) {
console.log(result.data.id); // string
console.log(result.data.text); // string
console.log(result.data.status); // TranscriptionStatus
console.log(result.data.metadata); // TranscriptMetadata
}Properties
id
id:
string
Unique transcript ID
status
status:
TranscriptionStatus
Transcription status
text
text:
string
Full transcribed text (empty for list items)
completedAt?
optionalcompletedAt:string
Completion timestamp (shorthand for metadata.completedAt)
confidence?
optionalconfidence:number
Overall confidence score (0-1)
createdAt?
optionalcreatedAt:string
Creation timestamp (shorthand for metadata.createdAt)
duration?
optionalduration:number
Audio duration in seconds
language?
optionallanguage:string
Detected or specified language code
metadata?
optionalmetadata:TranscriptMetadata
Transcript metadata
speakers?
optionalspeakers:Speaker[]
Speaker diarization results
summary?
optionalsummary:string
Summary of the content (if summarization enabled)
utterances?
optionalutterances:Utterance[]
Utterances (speaker turns)
words?
optionalwords:Word[]
Word-level transcription with timestamps
TranscriptMetadata
Transcript metadata with typed common fields
Contains provider-agnostic metadata fields that are commonly available. Provider-specific fields can be accessed via the index signature.
Example
const { transcripts } = await router.listTranscripts('assemblyai', { limit: 20 });
transcripts.forEach(item => {
console.log(item.data?.metadata?.audioUrl); // string | undefined
console.log(item.data?.metadata?.createdAt); // string | undefined
console.log(item.data?.metadata?.audioDuration); // number | undefined
});Indexable
[key: string]: unknown
Provider-specific fields
Properties
audioDuration?
optionalaudioDuration:number
Audio duration in seconds
audioFileAvailable?
optionalaudioFileAvailable:boolean
True if the provider stored the audio and it can be downloaded via adapter.getAudioFile(). Currently only Gladia supports this - other providers discard audio after processing.
Example
if (item.data?.metadata?.audioFileAvailable) {
const audio = await gladiaAdapter.getAudioFile(item.data.id)
// audio.data is a Blob
}completedAt?
optionalcompletedAt:string
Completion timestamp (ISO 8601)
createdAt?
optionalcreatedAt:string
Creation timestamp (ISO 8601)
customMetadata?
optionalcustomMetadata:Record<string,unknown>
Custom metadata (Gladia)
displayName?
optionaldisplayName:string
Display name (Azure)
filesUrl?
optionalfilesUrl:string
Files URL (Azure)
kind?
optionalkind:"batch"|"streaming"|"pre-recorded"|"live"
Transcript type
lastActionAt?
optionallastActionAt:string
Last action timestamp (Azure)
resourceUrl?
optionalresourceUrl:string
Resource URL for the transcript
sourceAudioUrl?
optionalsourceAudioUrl:string
Original audio URL/source you provided to the API (echoed back). This is NOT a provider-hosted URL - it's what you sent when creating the transcription.
TranslationEvent
Translation event data (for real-time translation)
Properties
targetLanguage
targetLanguage:
string
Target language
translatedText
translatedText:
string
Translated text
isFinal?
optionalisFinal:boolean
Whether this is a final translation
original?
optionaloriginal:string
Original text
utteranceId?
optionalutteranceId:string
Utterance ID this translation belongs to
UnifiedTranscriptResponse
Unified transcription response with provider-specific type safety
When a specific provider is known at compile time, both raw and extended
fields will be typed with that provider's actual types.
Examples
const result: UnifiedTranscriptResponse<'assemblyai'> = await adapter.transcribe(audio);
// result.raw is typed as AssemblyAITranscript
// result.extended is typed as AssemblyAIExtendedData
const chapters = result.extended?.chapters; // AssemblyAIChapter[] | undefined
const entities = result.extended?.entities; // AssemblyAIEntity[] | undefinedconst result: UnifiedTranscriptResponse<'gladia'> = await gladiaAdapter.transcribe(audio);
const translation = result.extended?.translation; // GladiaTranslation | undefined
const llmResults = result.extended?.audioToLlm; // GladiaAudioToLlmResult | undefinedconst result: UnifiedTranscriptResponse = await router.transcribe(audio);
// result.raw is typed as unknown (could be any provider)
// result.extended is typed as union of all extended typesType Parameters
| Type Parameter | Default type | Description |
|---|---|---|
P extends TranscriptionProvider | TranscriptionProvider | The transcription provider (defaults to all providers) |
Properties
provider
provider:
P
Provider that performed the transcription
success
success:
boolean
Operation success status
data?
optionaldata:TranscriptData
Transcription data (only present on success)
error?
optionalerror:object
Error information (only present on failure)
code
code:
string
Error code (provider-specific or normalized)
message
message:
string
Human-readable error message
details?
optionaldetails:unknown
Additional error details
statusCode?
optionalstatusCode:number
HTTP status code if applicable
extended?
optionalextended:Pextends keyofProviderExtendedDataMap?ProviderExtendedDataMap[P<P>] :unknown
Extended provider-specific data (fully typed from OpenAPI specs)
Contains rich data beyond basic transcription:
- AssemblyAI: chapters, entities, sentiment, content safety, topics
- Gladia: translation, moderation, entities, audio-to-llm, chapters
- Deepgram: detailed metadata, request tracking, model info
Example
const result = await assemblyaiAdapter.transcribe(audio, { summarization: true });
result.extended?.chapters?.forEach(chapter => {
console.log(`${chapter.headline}: ${chapter.summary}`);
});raw?
optionalraw:Pextends keyofProviderRawResponseMap?ProviderRawResponseMap[P<P>] :unknown
Raw provider response (for advanced usage)
Type-safe based on the provider:
gladia: PreRecordedResponsedeepgram: ListenV1Responseopenai-whisper: CreateTranscription200Oneassemblyai: AssemblyAITranscriptazure-stt: AzureTranscription
tracking?
optionaltracking:object
Request tracking information for debugging
audioHash?
optionalaudioHash:string
Audio fingerprint (SHA256) if available
processingTimeMs?
optionalprocessingTimeMs:number
Processing duration in milliseconds
requestId?
optionalrequestId:string
Provider's request/job ID
Utterance
Utterance (sentence or phrase by a single speaker)
Normalized from provider-specific types:
- Gladia:
UtteranceDTO - AssemblyAI:
TranscriptUtterance - Deepgram:
ListenV1ResponseResultsUtterancesItem
Properties
end
end:
number
End time in seconds
start
start:
number
Start time in seconds
text
text:
string
The transcribed text
channel?
optionalchannel:number
Audio channel number (for multi-channel/stereo recordings)
Channel numbering varies by provider:
- AssemblyAI: 1=left, 2=right, sequential for additional channels
- Deepgram: 0-indexed channel number
- Gladia: 0-indexed channel number
confidence?
optionalconfidence:number
Confidence score (0-1)
id?
optionalid:string
Unique utterance identifier (provider-assigned)
Available from: Deepgram Useful for linking utterances to other data (entities, sentiment, etc.)
language?
optionallanguage:string
Detected language for this utterance (BCP-47 code)
Available from: Gladia (with code-switching enabled) Essential for multilingual transcription where language changes mid-conversation.
Example
'en', 'es', 'fr', 'de'See
TranscriptionLanguage for full list of supported codes
speaker?
optionalspeaker:string
Speaker ID
words?
optionalwords:Word[]
Words in this utterance
Word
Word-level transcription with timing
Normalized from provider-specific types:
- Gladia:
WordDTO - AssemblyAI:
TranscriptWord - Deepgram:
ListenV1ResponseResultsChannelsItemAlternativesItemWordsItem
Properties
end
end:
number
End time in seconds
start
start:
number
Start time in seconds
word
word:
string
The transcribed word
channel?
optionalchannel:number
Audio channel number (for multi-channel/stereo recordings)
Channel numbering varies by provider:
- AssemblyAI: 1=left, 2=right, sequential for additional channels
- Deepgram: 0-indexed channel number
- Gladia: 0-indexed channel number
confidence?
optionalconfidence:number
Confidence score (0-1)
speaker?
optionalspeaker:string
Speaker ID if diarization is enabled
Type Aliases
AudioInput
AudioInput =
AudioInputUrl|AudioInputFile|AudioInputStream
Union of all audio input types
BatchOnlyProvider
BatchOnlyProvider =
BatchOnlyProviderType
Providers that only support batch/async transcription
Automatically derived from providers where streaming is false or undefined. Note: Speechmatics has a WebSocket API but streaming is not yet implemented in this SDK.
ProviderExtendedDataMap
ProviderExtendedDataMap =
object
Map of provider names to their extended data types
Properties
assemblyai
assemblyai:
AssemblyAIExtendedData
azure-stt
azure-stt:
Record<string,never>
deepgram
deepgram:
DeepgramExtendedData
gladia
gladia:
GladiaExtendedData
openai-whisper
openai-whisper:
Record<string,never>
soniox
soniox:
Record<string,never>
speechmatics
speechmatics:
Record<string,never>
ProviderRawResponseMap
ProviderRawResponseMap =
object
Map of provider names to their raw response types Enables type-safe access to provider-specific raw responses
Properties
assemblyai
assemblyai:
AssemblyAITranscript
azure-stt
azure-stt:
AzureTranscription
deepgram
deepgram:
ListenV1Response
gladia
gladia:
PreRecordedResponse
openai-whisper
openai-whisper:
CreateTranscription200One
soniox
soniox:
unknown
speechmatics
speechmatics:
unknown
SessionStatus
SessionStatus =
"connecting"|"open"|"closing"|"closed"
WebSocket session status for streaming transcription
SpeechmaticsOperatingPoint
SpeechmaticsOperatingPoint =
"standard"|"enhanced"
Speechmatics operating point (model) type Manually defined as Speechmatics OpenAPI spec doesn't export this cleanly
StreamEventType
StreamEventType =
"open"|"transcript"|"utterance"|"metadata"|"error"|"close"|"speech_start"|"speech_end"|"translation"|"sentiment"|"entity"|"summarization"|"chapterization"|"audio_ack"|"lifecycle"
Streaming transcription event types
StreamingProvider
StreamingProvider =
StreamingProviderType
Providers that support real-time streaming transcription
This type is automatically derived from ProviderCapabilitiesMap.streaming in provider-metadata.ts
No manual sync needed - if you set streaming: true for a provider, it's included here.
TranscriptionLanguage
TranscriptionLanguage =
AssemblyAILanguageCode|GladiaLanguageCode|string
Unified transcription language type with autocomplete for all providers
Includes language codes from AssemblyAI and Gladia OpenAPI specs. Deepgram uses string for flexibility.
TranscriptionModel
TranscriptionModel =
DeepgramModelType|StreamingSupportedModels|AssemblyAISpeechModel|SpeechmaticsOperatingPoint
Unified transcription model type with autocomplete for all providers
Strict union type - only accepts valid models from each provider:
- Deepgram: nova-3, nova-2, enhanced, base, etc.
- AssemblyAI: best, slam-1, universal
- Gladia: solaria-1
- Speechmatics: standard, enhanced
Use provider const objects for autocomplete:
Example
import { DeepgramModel } from 'voice-router-dev'
{ model: DeepgramModel["nova-3"] }TranscriptionProvider
TranscriptionProvider =
"gladia"|"assemblyai"|"deepgram"|"openai-whisper"|"azure-stt"|"speechmatics"|"soniox"
Supported transcription provider identifiers
TranscriptionStatus
TranscriptionStatus =
"queued"|"processing"|"completed"|"error"
Transcription status