router/types
Voice Router SDK - AssemblyAI Provider / router/types
router/types
Interfaces
AssemblyAIChapter
Chapter of the audio file
Properties
end
end:
number
The starting time, in milliseconds, for the chapter
gist
gist:
string
An ultra-short summary (just a few words) of the content spoken in the chapter
headline
headline:
string
A single sentence summary of the content spoken during the chapter
start
start:
number
The starting time, in milliseconds, for the chapter
summary
summary:
string
A one paragraph summary of the content spoken during the chapter
AssemblyAIContentSafetyResult
An array of results for the Content Moderation model, if it is enabled. See Content moderation for more information.
Properties
results
results:
ContentSafetyLabelResult[]
An array of results for the Content Moderation model
severity_score_summary
severity_score_summary:
ContentSafetyLabelsResultSeverityScoreSummary
A summary of the Content Moderation severity results for the entire audio file
status
status:
AudioIntelligenceModelStatus
The status of the Content Moderation model. Either success, or unavailable in the rare case that the model failed.
summary
summary:
ContentSafetyLabelsResultSummary
A summary of the Content Moderation confidence results for the entire audio file
AssemblyAIEntity
A detected entity
Properties
end
end:
number
The ending time, in milliseconds, for the detected entity in the audio file
entity_type
entity_type:
EntityType
The type of entity for the detected entity
start
start:
number
The starting time, in milliseconds, at which the detected entity appears in the audio file
text
text:
string
The text for the detected entity
AssemblyAIExtendedData
Extended data from AssemblyAI transcription Includes chapters, entities, sentiment, content safety, and more
Properties
chapters?
optionalchapters:AssemblyAIChapter[]
Auto-generated chapters with summaries
contentSafety?
optionalcontentSafety:AssemblyAIContentSafetyResult
Content safety/moderation labels
entities?
optionalentities:AssemblyAIEntity[]
Detected named entities (people, organizations, locations)
highlights?
optionalhighlights:AssemblyAIHighlightsResult
Key phrases and highlights
languageConfidence?
optionallanguageConfidence:number
Language detection confidence (0-1)
sentimentResults?
optionalsentimentResults:AssemblyAISentimentResult[]
Per-utterance sentiment analysis results
throttled?
optionalthrottled:boolean
Whether the request was throttled
topics?
optionaltopics:AssemblyAITopicsResult
IAB topic categories
AssemblyAIHighlightsResult
An array of results for the Key Phrases model, if it is enabled. See Key phrases for more information.
Properties
results
results:
AutoHighlightResult[]
A temporally-sequential array of Key Phrases
status
status:
AudioIntelligenceModelStatus
The status of the Key Phrases model. Either success, or unavailable in the rare case that the model failed.
AssemblyAIOptions
The parameters for creating a transcript
Properties
audio_end_at?
optionalaudio_end_at:number
The point in time, in milliseconds, to stop transcribing in your media file
audio_start_from?
optionalaudio_start_from:number
The point in time, in milliseconds, to begin transcribing in your media file
auto_chapters?
optionalauto_chapters:boolean
Enable Auto Chapters, can be true or false
auto_highlights?
optionalauto_highlights:boolean
Enable Key Phrases, either true or false
boost_param?
optionalboost_param:TranscriptBoostParam
How much to boost specified words
content_safety?
optionalcontent_safety:boolean
Enable Content Moderation, can be true or false
content_safety_confidence?
optionalcontent_safety_confidence:number
The confidence threshold for the Content Moderation model. Values must be between 25 and 100.
Minimum
25
Maximum
100
custom_spelling?
optionalcustom_spelling:TranscriptCustomSpelling[]
Customize how words are spelled and formatted using to and from values
custom_topics?
optionalcustom_topics:boolean
Enable custom topics, either true or false
Deprecated
disfluencies?
optionaldisfluencies:boolean
Transcribe Filler Words, like "umm", in your media file; can be true or false
entity_detection?
optionalentity_detection:boolean
Enable Entity Detection, can be true or false
filter_profanity?
optionalfilter_profanity:boolean
Filter profanity from the transcribed text, can be true or false
format_text?
optionalformat_text:boolean
Enable Text Formatting, can be true or false
iab_categories?
optionaliab_categories:boolean
Enable Topic Detection, can be true or false
keyterms_prompt?
optionalkeyterms_prompt:string[]
<Warning>keyterms_prompt is only supported when the speech_model is specified as slam-1``</Warning>
Improve accuracy with up to 1000 domain-specific words or phrases (maximum 6 words per phrase).
language_code?
optionallanguage_code:TranscriptOptionalParamsLanguageCode
The language of your audio file. Possible values are found in Supported Languages. The default value is 'en_us'.
language_confidence_threshold?
optionallanguage_confidence_threshold:number
The confidence threshold for the automatically detected language. An error will be returned if the language confidence is below this threshold. Defaults to 0.
Minimum
0
Maximum
1
language_detection?
optionallanguage_detection:boolean
Enable Automatic language detection, either true or false.
multichannel?
optionalmultichannel:boolean
Enable Multichannel transcription, can be true or false.
prompt?
optionalprompt:string
This parameter does not currently have any functionality attached to it.
Deprecated
punctuate?
optionalpunctuate:boolean
Enable Automatic Punctuation, can be true or false
redact_pii?
optionalredact_pii:boolean
Redact PII from the transcribed text using the Redact PII model, can be true or false
redact_pii_audio?
optionalredact_pii_audio:boolean
Generate a copy of the original media file with spoken PII "beeped" out, can be true or false. See PII redaction for more details.
redact_pii_audio_quality?
optionalredact_pii_audio_quality:RedactPiiAudioQuality
Controls the filetype of the audio created by redact_pii_audio. Currently supports mp3 (default) and wav. See PII redaction for more details.
redact_pii_policies?
optionalredact_pii_policies:PiiPolicy[]
The list of PII Redaction policies to enable. See PII redaction for more details.
redact_pii_sub?
optionalredact_pii_sub:TranscriptOptionalParamsRedactPiiSub
The replacement logic for detected PII, can be "entity_type" or "hash". See PII redaction for more details.
sentiment_analysis?
optionalsentiment_analysis:boolean
Enable Sentiment Analysis, can be true or false
speaker_labels?
optionalspeaker_labels:boolean
Enable Speaker diarization, can be true or false
speakers_expected?
optionalspeakers_expected:TranscriptOptionalParamsSpeakersExpected
Tells the speaker label model how many speakers it should attempt to identify. See Speaker diarization for more details.
speech_model?
optionalspeech_model:TranscriptOptionalParamsSpeechModel
The speech model to use for the transcription. When null, the "best" model is used.
speech_threshold?
optionalspeech_threshold:TranscriptOptionalParamsSpeechThreshold
Reject audio files that contain less than this fraction of speech. Valid values are in the range [0, 1] inclusive.
Minimum
0
Maximum
1
summarization?
optionalsummarization:boolean
Enable Summarization, can be true or false
summary_model?
optionalsummary_model:SummaryModel
The model to summarize the transcript
summary_type?
optionalsummary_type:SummaryType
The type of summary
topics?
optionaltopics:string[]
The list of custom topics
webhook_auth_header_name?
optionalwebhook_auth_header_name:TranscriptOptionalParamsWebhookAuthHeaderName
The header name to be sent with the transcript completed or failed webhook requests
webhook_auth_header_value?
optionalwebhook_auth_header_value:TranscriptOptionalParamsWebhookAuthHeaderValue
The header value to send back with the transcript completed or failed webhook requests for added security
webhook_url?
optionalwebhook_url:string
The URL to which we send webhook requests. We sends two different types of webhook requests. One request when a transcript is completed or failed, and one request when the redacted audio is ready if redact_pii_audio is enabled.
word_boost?
optionalword_boost:string[]
The list of custom vocabulary to boost transcription probability for
Deprecated
AssemblyAISentimentResult
The result of the Sentiment Analysis model
Properties
confidence
confidence:
number
The confidence score for the detected sentiment of the sentence, from 0 to 1
Minimum
0
Maximum
1
end
end:
number
The ending time, in milliseconds, of the sentence
sentiment
sentiment:
Sentiment
The detected sentiment for the sentence, one of POSITIVE, NEUTRAL, NEGATIVE
speaker
speaker:
SentimentAnalysisResultSpeaker
The speaker of the sentence if Speaker Diarization is enabled, else null
start
start:
number
The starting time, in milliseconds, of the sentence
text
text:
string
The transcript of the sentence
channel?
optionalchannel:SentimentAnalysisResultChannel
The channel of this utterance. The left and right channels are channels 1 and 2. Additional channels increment the channel number sequentially.
AssemblyAITopicsResult
The result of the Topic Detection model, if it is enabled. See Topic Detection for more information.
Properties
results
results:
TopicDetectionResult[]
An array of results for the Topic Detection model
status
status:
AudioIntelligenceModelStatus
The status of the Topic Detection model. Either success, or unavailable in the rare case that the model failed.
summary
summary:
TopicDetectionModelResultSummary
The overall relevance of topic to the entire audio file
AudioAckEvent
Audio chunk acknowledgment event
Properties
byteRange?
optionalbyteRange: [number,number]
Byte range of the acknowledged audio chunk [start, end]
timeRange?
optionaltimeRange: [number,number]
Time range in seconds of the acknowledged audio chunk [start, end]
timestamp?
optionaltimestamp:string
Acknowledgment timestamp
AudioChunk
Audio chunk for streaming transcription
Properties
data
data:
Buffer<ArrayBufferLike> |Uint8Array<ArrayBufferLike>
Audio data as Buffer or Uint8Array
isLast?
optionalisLast:boolean
Whether this is the last chunk
ChapterizationEvent
Post-processing chapterization event
Properties
chapters
chapters:
object[]
Generated chapters
end
end:
number
End time in seconds
headline
headline:
string
Chapter title/headline
start
start:
number
Start time in seconds
summary
summary:
string
Chapter summary
error?
optionalerror:string
Error if chapterization failed
DeepgramExtendedData
Extended data from Deepgram transcription Includes detailed metadata, model info, and feature-specific data
Properties
metadata?
optionalmetadata:ListenV1ResponseMetadata
Full response metadata
modelInfo?
optionalmodelInfo:Record<string,unknown>
Model versions used
requestId?
optionalrequestId:string
Request ID for debugging/tracking
sha256?
optionalsha256:string
SHA256 hash of the audio
tags?
optionaltags:string[]
Tags echoed back from request
EntityEvent
Named entity recognition result
Properties
text
text:
string
Entity text
type
type:
string
Entity type (PERSON, ORGANIZATION, LOCATION, etc.)
end?
optionalend:number
End position
start?
optionalstart:number
Start position
utteranceId?
optionalutteranceId:string
Utterance ID this entity belongs to
GladiaExtendedData
Extended data from Gladia transcription Includes translation, moderation, entities, LLM outputs, and more
Properties
audioToLlm?
optionalaudioToLlm:AudioToLlmListDTO
Audio-to-LLM custom prompt results
chapters?
optionalchapters:ChapterizationDTO
Auto-generated chapters
customMetadata?
optionalcustomMetadata:Record<string,unknown>
Custom metadata echoed back
entities?
optionalentities:NamedEntityRecognitionDTO
Named entity recognition results
moderation?
optionalmoderation:ModerationDTO
Content moderation results
sentiment?
optionalsentiment:SentimentAnalysisDTO
Sentiment analysis results
speakerReidentification?
optionalspeakerReidentification:SpeakerReidentificationDTO
AI speaker reidentification results
structuredData?
optionalstructuredData:StructuredDataExtractionDTO
Structured data extraction results
translation?
optionaltranslation:TranslationDTO
Translation results (if translation enabled)
LifecycleEvent
Lifecycle event (session start, recording end, etc.)
Properties
eventType
eventType:
"start_session"|"start_recording"|"stop_recording"|"end_recording"|"end_session"
Lifecycle event type
sessionId?
optionalsessionId:string
Session ID
timestamp?
optionaltimestamp:string
Event timestamp
ListTranscriptsOptions
Options for listing transcripts with date/time filtering
Providers support different filtering capabilities:
- AssemblyAI: status, created_on, before_id, after_id, throttled_only
- Gladia: status, date, before_date, after_date, custom_metadata
- Azure: status, skip, top, filter (OData)
- Deepgram: start, end, status, page, request_id, endpoint (requires projectId)
Examples
await adapter.listTranscripts({
date: '2026-01-07', // Exact date (ISO format)
status: 'completed',
limit: 50
})await adapter.listTranscripts({
afterDate: '2026-01-01',
beforeDate: '2026-01-31',
limit: 100
})Properties
afterDate?
optionalafterDate:string
Filter for transcripts created after this date (ISO format)
assemblyai?
optionalassemblyai:Partial<ListTranscriptsParams>
AssemblyAI-specific list options
beforeDate?
optionalbeforeDate:string
Filter for transcripts created before this date (ISO format)
date?
optionaldate:string
Filter by exact date (ISO format: YYYY-MM-DD)
deepgram?
optionaldeepgram:Partial<ManageV1ProjectsRequestsListParams>
Deepgram-specific list options (request history)
gladia?
optionalgladia:Partial<TranscriptionControllerListV2Params>
Gladia-specific list options
limit?
optionallimit:number
Maximum number of transcripts to retrieve
offset?
optionaloffset:number
Pagination offset (skip N results)
status?
optionalstatus:string
Filter by transcript status
ListTranscriptsResponse
Response from listTranscripts
Example
import type { ListTranscriptsResponse } from 'voice-router-dev';
const response: ListTranscriptsResponse = await router.listTranscripts('assemblyai', {
status: 'completed',
limit: 50
});
response.transcripts.forEach(item => {
console.log(item.data?.id, item.data?.status);
});
if (response.hasMore) {
// Fetch next page
}Properties
transcripts
transcripts:
UnifiedTranscriptResponse<TranscriptionProvider>[]
List of transcripts
hasMore?
optionalhasMore:boolean
Whether more results are available
total?
optionaltotal:number
Total count (if available from provider)
OpenAIWhisperOptions
Properties
file
file:
Blob
The audio file object (not file name) to transcribe, in one of these formats: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, or webm.
model
model:
string
ID of the model to use. The options are gpt-4o-transcribe, gpt-4o-mini-transcribe, gpt-4o-mini-transcribe-2025-12-15, whisper-1 (which is powered by our open source Whisper V2 model), and gpt-4o-transcribe-diarize.
chunking_strategy?
optionalchunking_strategy:TranscriptionChunkingStrategy
include?
optionalinclude:"logprobs"[]
Additional information to include in the transcription response.
logprobs will return the log probabilities of the tokens in the
response to understand the model's confidence in the transcription.
logprobs only works with response_format set to json and only with
the models gpt-4o-transcribe, gpt-4o-mini-transcribe, and gpt-4o-mini-transcribe-2025-12-15. This field is not supported when using gpt-4o-transcribe-diarize.
known_speaker_names?
optionalknown_speaker_names:string[]
Optional list of speaker names that correspond to the audio samples provided in known_speaker_references[]. Each entry should be a short identifier (for example customer or agent). Up to 4 speakers are supported.
Max Items
4
known_speaker_references?
optionalknown_speaker_references:string[]
Optional list of audio samples (as data URLs) that contain known speaker references matching known_speaker_names[]. Each sample must be between 2 and 10 seconds, and can use any of the same input audio formats supported by file.
Max Items
4
language?
optionallanguage:string
The language of the input audio. Supplying the input language in ISO-639-1 (e.g. en) format will improve accuracy and latency.
prompt?
optionalprompt:string
An optional text to guide the model's style or continue a previous audio segment. The prompt should match the audio language. This field is not supported when using gpt-4o-transcribe-diarize.
response_format?
optionalresponse_format:AudioResponseFormat
stream?
optionalstream:CreateTranscriptionRequestStream
temperature?
optionaltemperature:number
The sampling temperature, between 0 and 1. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. If set to 0, the model will use log probability to automatically increase the temperature until certain thresholds are hit.
timestamp_granularities?
optionaltimestamp_granularities:CreateTranscriptionRequestTimestampGranularitiesItem[]
The timestamp granularities to populate for this transcription. response_format must be set verbose_json to use timestamp granularities. Either or both of these options are supported: word, or segment. Note: There is no additional latency for segment timestamps, but generating word timestamps incurs additional latency.
This option is not available for gpt-4o-transcribe-diarize.
ProviderCapabilities
Provider capability flags
Each boolean indicates whether the provider supports a specific feature. Use ProviderCapabilitiesMap from provider-metadata for runtime access.
Properties
customVocabulary
customVocabulary:
boolean
Custom vocabulary/keyword boosting
deleteTranscript
deleteTranscript:
boolean
Delete transcriptions
diarization
diarization:
boolean
Speaker diarization (identifying different speakers)
entityDetection
entityDetection:
boolean
Entity detection
languageDetection
languageDetection:
boolean
Automatic language detection
listTranscripts
listTranscripts:
boolean
List/fetch previous transcriptions
piiRedaction
piiRedaction:
boolean
PII redaction
sentimentAnalysis
sentimentAnalysis:
boolean
Sentiment analysis
streaming
streaming:
boolean
Real-time streaming transcription support
summarization
summarization:
boolean
Audio summarization
wordTimestamps
wordTimestamps:
boolean
Word-level timestamps
getAudioFile?
optionalgetAudioFile:boolean
Download original audio file
SentimentEvent
Sentiment analysis result (for real-time sentiment)
Properties
sentiment
sentiment:
string
Sentiment label (positive, negative, neutral)
confidence?
optionalconfidence:number
Confidence score 0-1
utteranceId?
optionalutteranceId:string
Utterance ID this sentiment belongs to
Speaker
Speaker information from diarization
Properties
id
id:
string
Speaker identifier (e.g., "A", "B", "speaker_0")
confidence?
optionalconfidence:number
Confidence score for speaker identification (0-1)
label?
optionallabel:string
Speaker label if known
SpeechEvent
Speech event data (for speech_start/speech_end events)
Properties
timestamp
timestamp:
number
Timestamp in seconds
type
type:
"speech_start"|"speech_end"
Event type: speech_start or speech_end
channel?
optionalchannel:number
Channel number
sessionId?
optionalsessionId:string
Session ID
StreamEvent
Streaming transcription event
Properties
type
type:
StreamEventType
channel?
optionalchannel:number
Channel number for multi-channel audio
confidence?
optionalconfidence:number
Confidence score for this event
data?
optionaldata:unknown
Additional event data
error?
optionalerror:object
Error information (for type: "error")
code
code:
string
message
message:
string
details?
optionaldetails:unknown
isFinal?
optionalisFinal:boolean
Whether this is a final transcript (vs interim)
language?
optionallanguage:string
Language of the transcript/utterance
speaker?
optionalspeaker:string
Speaker ID if diarization is enabled
text?
optionaltext:string
Partial transcript text (for type: "transcript")
utterance?
optionalutterance:Utterance
Utterance data (for type: "utterance")
words?
optionalwords:Word[]
Words in this event
StreamingCallbacks
Callback functions for streaming events
Properties
onAudioAck()?
optionalonAudioAck: (event) =>void
Called for audio chunk acknowledgments (Gladia: requires receive_acknowledgments)
Parameters
| Parameter | Type |
|---|---|
event | AudioAckEvent |
Returns
void
onChapterization()?
optionalonChapterization: (event) =>void
Called when post-processing chapterization completes (Gladia: requires chapterization enabled)
Parameters
| Parameter | Type |
|---|---|
event | ChapterizationEvent |
Returns
void
onClose()?
optionalonClose: (code?,reason?) =>void
Called when the stream is closed
Parameters
| Parameter | Type |
|---|---|
code? | number |
reason? | string |
Returns
void
onEntity()?
optionalonEntity: (event) =>void
Called for named entity recognition (Gladia: requires named_entity_recognition enabled)
Parameters
| Parameter | Type |
|---|---|
event | EntityEvent |
Returns
void
onError()?
optionalonError: (error) =>void
Called when an error occurs
Parameters
| Parameter | Type |
|---|---|
error | { code: string; message: string; details?: unknown; } |
error.code | string |
error.message | string |
error.details? | unknown |
Returns
void
onLifecycle()?
optionalonLifecycle: (event) =>void
Called for session lifecycle events (Gladia: requires receive_lifecycle_events)
Parameters
| Parameter | Type |
|---|---|
event | LifecycleEvent |
Returns
void
onMetadata()?
optionalonMetadata: (metadata) =>void
Called when metadata is received
Parameters
| Parameter | Type |
|---|---|
metadata | Record<string, unknown> |
Returns
void
onOpen()?
optionalonOpen: () =>void
Called when connection is established
Returns
void
onSentiment()?
optionalonSentiment: (event) =>void
Called for real-time sentiment analysis (Gladia: requires sentiment_analysis enabled)
Parameters
| Parameter | Type |
|---|---|
event | SentimentEvent |
Returns
void
onSpeechEnd()?
optionalonSpeechEnd: (event) =>void
Called when speech ends (Gladia: requires receive_speech_events)
Parameters
| Parameter | Type |
|---|---|
event | SpeechEvent |
Returns
void
onSpeechStart()?
optionalonSpeechStart: (event) =>void
Called when speech starts (Gladia: requires receive_speech_events)
Parameters
| Parameter | Type |
|---|---|
event | SpeechEvent |
Returns
void
onSummarization()?
optionalonSummarization: (event) =>void
Called when post-processing summarization completes (Gladia: requires summarization enabled)
Parameters
| Parameter | Type |
|---|---|
event | SummarizationEvent |
Returns
void
onTranscript()?
optionalonTranscript: (event) =>void
Called when a transcript (interim or final) is received
Parameters
| Parameter | Type |
|---|---|
event | StreamEvent |
Returns
void
onTranslation()?
optionalonTranslation: (event) =>void
Called for real-time translation (Gladia: requires translation enabled)
Parameters
| Parameter | Type |
|---|---|
event | TranslationEvent |
Returns
void
onUtterance()?
optionalonUtterance: (utterance) =>void
Called when a complete utterance is detected
Parameters
| Parameter | Type |
|---|---|
utterance | Utterance |
Returns
void
StreamingOptions
Options for streaming transcription
Extends
Omit<TranscribeOptions,"webhookUrl">
Properties
assemblyai?
optionalassemblyai:Partial<AssemblyAIOptions>
AssemblyAI-specific options (passed directly to API)
See
https://www.assemblyai.com/docs/api-reference/transcripts/submit
Inherited from
assemblyaiStreaming?
optionalassemblyaiStreaming:AssemblyAIStreamingOptions
AssemblyAI-specific streaming options (passed to WebSocket URL & configuration)
Includes end-of-turn detection tuning, VAD threshold, profanity filter, keyterms, speech model selection, and language detection.
See
https://www.assemblyai.com/docs/speech-to-text/streaming
Example
await adapter.transcribeStream({
assemblyaiStreaming: {
speechModel: 'universal-streaming-multilingual',
languageDetection: true,
endOfTurnConfidenceThreshold: 0.7,
minEndOfTurnSilenceWhenConfident: 500,
vadThreshold: 0.3,
formatTurns: true,
filterProfanity: true,
keyterms: ['TypeScript', 'JavaScript', 'API']
}
});audioToLlm?
optionalaudioToLlm:AudioToLlmListConfigDTO
Audio-to-LLM configuration (Gladia-specific) Run custom LLM prompts on the transcription
See
GladiaAudioToLlmConfig
Inherited from
bitDepth?
optionalbitDepth:number
Bit depth for PCM audio
Common depths: 8, 16, 24, 32 16-bit is standard for most applications
channels?
optionalchannels:number
Number of audio channels
- 1: Mono (recommended for transcription)
- 2: Stereo
- 3-8: Multi-channel (provider-specific support)
codeSwitching?
optionalcodeSwitching:boolean
Enable code switching (multilingual audio detection) Supported by: Gladia
Inherited from
TranscribeOptions.codeSwitching
codeSwitchingConfig?
optionalcodeSwitchingConfig:CodeSwitchingConfigDTO
Code switching configuration (Gladia-specific)
See
GladiaCodeSwitchingConfig
Inherited from
TranscribeOptions.codeSwitchingConfig
customVocabulary?
optionalcustomVocabulary:string[]
Custom vocabulary to boost (provider-specific format)
Inherited from
TranscribeOptions.customVocabulary
deepgram?
optionaldeepgram:Partial<ListenV1MediaTranscribeParams>
Deepgram-specific options (passed directly to API)
See
https://developers.deepgram.com/reference/listen-file
Inherited from
deepgramStreaming?
optionaldeepgramStreaming:DeepgramStreamingOptions
Deepgram-specific streaming options (passed to WebSocket URL)
Includes filler_words, numerals, measurements, paragraphs, profanity_filter, topics, intents, custom_topic, custom_intent, keyterm, dictation, utt_split, and more.
See
https://developers.deepgram.com/docs/streaming
Example
await adapter.transcribeStream({
deepgramStreaming: {
fillerWords: true,
profanityFilter: true,
topics: true,
intents: true,
customTopic: ['sales', 'support'],
customIntent: ['purchase', 'complaint'],
numerals: true
}
});diarization?
optionaldiarization:boolean
Enable speaker diarization
Inherited from
encoding?
optionalencoding:AudioEncoding
Audio encoding format
Common formats:
linear16: PCM 16-bit (universal, recommended)mulaw: μ-law telephony codecalaw: A-law telephony codecflac,opus,speex: Advanced codecs (Deepgram only)
See
AudioEncoding for full list of supported formats
endpointing?
optionalendpointing:number
Utterance end silence threshold in milliseconds
entityDetection?
optionalentityDetection:boolean
Enable entity detection
Inherited from
TranscribeOptions.entityDetection
gladia?
optionalgladia:Partial<InitTranscriptionRequest>
Gladia-specific options (passed directly to API)
See
Inherited from
gladiaStreaming?
optionalgladiaStreaming:Partial<Omit<StreamingRequest,"encoding"|"channels"|"sample_rate"|"bit_depth">>
Gladia-specific streaming options (passed directly to API)
Includes pre_processing, realtime_processing, post_processing, messages_config, and callback configuration.
See
https://docs.gladia.io/api-reference/v2/live
Example
await adapter.transcribeStream({
gladiaStreaming: {
realtime_processing: {
words_accurate_timestamps: true
},
messages_config: {
receive_partial_transcripts: true
}
}
});interimResults?
optionalinterimResults:boolean
Enable interim results (partial transcripts)
language?
optionallanguage:string
Language code with autocomplete from OpenAPI specs
Example
'en', 'en_us', 'fr', 'de', 'es'See
TranscriptionLanguage for full list
Inherited from
languageDetection?
optionallanguageDetection:boolean
Enable automatic language detection
Inherited from
TranscribeOptions.languageDetection
maxSilence?
optionalmaxSilence:number
Maximum duration without endpointing in seconds
model?
optionalmodel:TranscriptionModel
Model to use for transcription (provider-specific)
Type-safe with autocomplete for all known models:
- Deepgram: 'nova-2', 'nova-3', 'base', 'enhanced', 'whisper-large', etc.
- Gladia: 'solaria-1' (default)
- AssemblyAI: Not applicable (uses Universal-2 automatically)
Example
// Use Nova-2 for better multilingual support
{ model: 'nova-2', language: 'fr' }Overrides
openai?
optionalopenai:Partial<Omit<OpenAIWhisperOptions,"model"|"file">>
OpenAI Whisper-specific options (passed directly to API)
See
https://platform.openai.com/docs/api-reference/audio/createTranscription
Inherited from
openaiStreaming?
optionalopenaiStreaming:OpenAIStreamingOptions
OpenAI Realtime API streaming options
Configure the OpenAI Realtime WebSocket connection for audio transcription. Uses the Realtime API which supports real-time audio input transcription.
See
https://platform.openai.com/docs/guides/realtime
Example
await adapter.transcribeStream({
openaiStreaming: {
model: 'gpt-4o-realtime-preview',
voice: 'alloy',
turnDetection: {
type: 'server_vad',
threshold: 0.5,
silenceDurationMs: 500
}
}
});piiRedaction?
optionalpiiRedaction:boolean
Enable PII redaction
Inherited from
TranscribeOptions.piiRedaction
region?
optionalregion:StreamingSupportedRegions
Regional endpoint for streaming (Gladia only)
Gladia supports regional streaming endpoints for lower latency:
us-west: US West Coasteu-west: EU West (Ireland)
Example
import { GladiaRegion } from 'voice-router-dev/constants'
await adapter.transcribeStream({
region: GladiaRegion["us-west"]
})See
https://docs.gladia.io/api-reference/v2/live
sampleRate?
optionalsampleRate:number
Sample rate in Hz
Common rates: 8000, 16000, 32000, 44100, 48000 Most providers recommend 16000 Hz for optimal quality/performance
sentimentAnalysis?
optionalsentimentAnalysis:boolean
Enable sentiment analysis
Inherited from
TranscribeOptions.sentimentAnalysis
sonioxStreaming?
optionalsonioxStreaming:SonioxStreamingOptions
Soniox-specific streaming options
Configure the Soniox WebSocket connection for real-time transcription. Supports speaker diarization, language identification, translation, and custom context.
See
https://soniox.com/docs/stt/SDKs/web-sdk
Example
await adapter.transcribeStream({
sonioxStreaming: {
model: 'stt-rt-preview',
enableSpeakerDiarization: true,
enableEndpointDetection: true,
context: {
terms: ['TypeScript', 'React'],
text: 'Technical discussion'
},
translation: { type: 'one_way', target_language: 'es' }
}
});speakersExpected?
optionalspeakersExpected:number
Expected number of speakers (for diarization)
Inherited from
TranscribeOptions.speakersExpected
summarization?
optionalsummarization:boolean
Enable summarization
Inherited from
TranscribeOptions.summarization
wordTimestamps?
optionalwordTimestamps:boolean
Enable word-level timestamps
Inherited from
TranscribeOptions.wordTimestamps
StreamingSession
Represents an active streaming transcription session
Properties
close()
close: () =>
Promise<void>
Close the streaming session
Returns
Promise<void>
createdAt
createdAt:
Date
Session creation timestamp
getStatus()
getStatus: () =>
"open"|"connecting"|"closing"|"closed"
Get current session status
Returns
"open" | "connecting" | "closing" | "closed"
id
id:
string
Unique session ID
provider
provider:
TranscriptionProvider
Provider handling this stream
sendAudio()
sendAudio: (
chunk) =>Promise<void>
Send an audio chunk to the stream
Parameters
| Parameter | Type |
|---|---|
chunk | AudioChunk |
Returns
Promise<void>
SummarizationEvent
Post-processing summarization event
Properties
summary
summary:
string
Full summarization text
error?
optionalerror:string
Error if summarization failed
TranscribeOptions
Common transcription options across all providers
For provider-specific options, use the typed provider options:
deepgram: Full Deepgram API optionsassemblyai: Full AssemblyAI API optionsgladia: Full Gladia API options
Properties
assemblyai?
optionalassemblyai:Partial<AssemblyAIOptions>
AssemblyAI-specific options (passed directly to API)
See
https://www.assemblyai.com/docs/api-reference/transcripts/submit
audioToLlm?
optionalaudioToLlm:AudioToLlmListConfigDTO
Audio-to-LLM configuration (Gladia-specific) Run custom LLM prompts on the transcription
See
GladiaAudioToLlmConfig
codeSwitching?
optionalcodeSwitching:boolean
Enable code switching (multilingual audio detection) Supported by: Gladia
codeSwitchingConfig?
optionalcodeSwitchingConfig:CodeSwitchingConfigDTO
Code switching configuration (Gladia-specific)
See
GladiaCodeSwitchingConfig
customVocabulary?
optionalcustomVocabulary:string[]
Custom vocabulary to boost (provider-specific format)
deepgram?
optionaldeepgram:Partial<ListenV1MediaTranscribeParams>
Deepgram-specific options (passed directly to API)
See
https://developers.deepgram.com/reference/listen-file
diarization?
optionaldiarization:boolean
Enable speaker diarization
entityDetection?
optionalentityDetection:boolean
Enable entity detection
gladia?
optionalgladia:Partial<InitTranscriptionRequest>
Gladia-specific options (passed directly to API)
See
language?
optionallanguage:string
Language code with autocomplete from OpenAPI specs
Example
'en', 'en_us', 'fr', 'de', 'es'See
TranscriptionLanguage for full list
languageDetection?
optionallanguageDetection:boolean
Enable automatic language detection
model?
optionalmodel:TranscriptionModel
Model to use for transcription (provider-specific)
Type-safe model selection derived from OpenAPI specs:
- Deepgram: 'nova-3', 'nova-2', 'enhanced', 'base', etc.
- AssemblyAI: 'best', 'slam-1', 'universal'
- Speechmatics: 'standard', 'enhanced' (operating point)
- Gladia: 'solaria-1' (streaming only)
See
TranscriptionModel for full list of available models
openai?
optionalopenai:Partial<Omit<OpenAIWhisperOptions,"model"|"file">>
OpenAI Whisper-specific options (passed directly to API)
See
https://platform.openai.com/docs/api-reference/audio/createTranscription
piiRedaction?
optionalpiiRedaction:boolean
Enable PII redaction
sentimentAnalysis?
optionalsentimentAnalysis:boolean
Enable sentiment analysis
speakersExpected?
optionalspeakersExpected:number
Expected number of speakers (for diarization)
summarization?
optionalsummarization:boolean
Enable summarization
webhookUrl?
optionalwebhookUrl:string
Webhook URL for async results
wordTimestamps?
optionalwordTimestamps:boolean
Enable word-level timestamps
TranscriptData
Transcript data structure
Contains the core transcript information returned by getTranscript and listTranscripts.
Example
const result = await router.getTranscript('abc123', 'assemblyai');
if (result.success && result.data) {
console.log(result.data.id); // string
console.log(result.data.text); // string
console.log(result.data.status); // TranscriptionStatus
console.log(result.data.metadata); // TranscriptMetadata
}Properties
id
id:
string
Unique transcript ID
status
status:
TranscriptionStatus
Transcription status
text
text:
string
Full transcribed text (empty for list items)
completedAt?
optionalcompletedAt:string
Completion timestamp (shorthand for metadata.completedAt)
confidence?
optionalconfidence:number
Overall confidence score (0-1)
createdAt?
optionalcreatedAt:string
Creation timestamp (shorthand for metadata.createdAt)
duration?
optionalduration:number
Audio duration in seconds
language?
optionallanguage:string
Detected or specified language code
metadata?
optionalmetadata:TranscriptMetadata
Transcript metadata
speakers?
optionalspeakers:Speaker[]
Speaker diarization results
summary?
optionalsummary:string
Summary of the content (if summarization enabled)
utterances?
optionalutterances:Utterance[]
Utterances (speaker turns)
words?
optionalwords:Word[]
Word-level transcription with timestamps
TranscriptMetadata
Transcript metadata with typed common fields
Contains provider-agnostic metadata fields that are commonly available. Provider-specific fields can be accessed via the index signature.
Example
const { transcripts } = await router.listTranscripts('assemblyai', { limit: 20 });
transcripts.forEach(item => {
console.log(item.data?.metadata?.audioUrl); // string | undefined
console.log(item.data?.metadata?.createdAt); // string | undefined
console.log(item.data?.metadata?.audioDuration); // number | undefined
});Indexable
[key: string]: unknown
Provider-specific fields
Properties
audioDuration?
optionalaudioDuration:number
Audio duration in seconds
audioFileAvailable?
optionalaudioFileAvailable:boolean
True if the provider stored the audio and it can be downloaded via adapter.getAudioFile(). Currently only Gladia supports this - other providers discard audio after processing.
Example
if (item.data?.metadata?.audioFileAvailable) {
const audio = await gladiaAdapter.getAudioFile(item.data.id)
// audio.data is a Blob
}completedAt?
optionalcompletedAt:string
Completion timestamp (ISO 8601)
createdAt?
optionalcreatedAt:string
Creation timestamp (ISO 8601)
customMetadata?
optionalcustomMetadata:Record<string,unknown>
Custom metadata (Gladia)
displayName?
optionaldisplayName:string
Display name (Azure)
filesUrl?
optionalfilesUrl:string
Files URL (Azure)
kind?
optionalkind:"batch"|"streaming"|"pre-recorded"|"live"
Transcript type
lastActionAt?
optionallastActionAt:string
Last action timestamp (Azure)
resourceUrl?
optionalresourceUrl:string
Resource URL for the transcript
sourceAudioUrl?
optionalsourceAudioUrl:string
Original audio URL/source you provided to the API (echoed back). This is NOT a provider-hosted URL - it's what you sent when creating the transcription.
TranslationEvent
Translation event data (for real-time translation)
Properties
targetLanguage
targetLanguage:
string
Target language
translatedText
translatedText:
string
Translated text
isFinal?
optionalisFinal:boolean
Whether this is a final translation
original?
optionaloriginal:string
Original text
utteranceId?
optionalutteranceId:string
Utterance ID this translation belongs to
UnifiedTranscriptResponse
Unified transcription response with provider-specific type safety
When a specific provider is known at compile time, both raw and extended
fields will be typed with that provider's actual types.
Examples
const result: UnifiedTranscriptResponse<'assemblyai'> = await adapter.transcribe(audio);
// result.raw is typed as AssemblyAITranscript
// result.extended is typed as AssemblyAIExtendedData
const chapters = result.extended?.chapters; // AssemblyAIChapter[] | undefined
const entities = result.extended?.entities; // AssemblyAIEntity[] | undefinedconst result: UnifiedTranscriptResponse<'gladia'> = await gladiaAdapter.transcribe(audio);
const translation = result.extended?.translation; // GladiaTranslation | undefined
const llmResults = result.extended?.audioToLlm; // GladiaAudioToLlmResult | undefinedconst result: UnifiedTranscriptResponse = await router.transcribe(audio);
// result.raw is typed as unknown (could be any provider)
// result.extended is typed as union of all extended typesType Parameters
| Type Parameter | Default type | Description |
|---|---|---|
P extends TranscriptionProvider | TranscriptionProvider | The transcription provider (defaults to all providers) |
Properties
provider
provider:
P
Provider that performed the transcription
success
success:
boolean
Operation success status
data?
optionaldata:TranscriptData
Transcription data (only present on success)
error?
optionalerror:object
Error information (only present on failure)
code
code:
string
Error code (provider-specific or normalized)
message
message:
string
Human-readable error message
details?
optionaldetails:unknown
Additional error details
statusCode?
optionalstatusCode:number
HTTP status code if applicable
extended?
optionalextended:Pextends keyofProviderExtendedDataMap?ProviderExtendedDataMap[P<P>] :unknown
Extended provider-specific data (fully typed from OpenAPI specs)
Contains rich data beyond basic transcription:
- AssemblyAI: chapters, entities, sentiment, content safety, topics
- Gladia: translation, moderation, entities, audio-to-llm, chapters
- Deepgram: detailed metadata, request tracking, model info
Example
const result = await assemblyaiAdapter.transcribe(audio, { summarization: true });
result.extended?.chapters?.forEach(chapter => {
console.log(`${chapter.headline}: ${chapter.summary}`);
});raw?
optionalraw:Pextends keyofProviderRawResponseMap?ProviderRawResponseMap[P<P>] :unknown
Raw provider response (for advanced usage)
Type-safe based on the provider:
gladia: PreRecordedResponsedeepgram: ListenV1Responseopenai-whisper: CreateTranscription200Oneassemblyai: AssemblyAITranscriptazure-stt: AzureTranscription
tracking?
optionaltracking:object
Request tracking information for debugging
audioHash?
optionalaudioHash:string
Audio fingerprint (SHA256) if available
processingTimeMs?
optionalprocessingTimeMs:number
Processing duration in milliseconds
requestId?
optionalrequestId:string
Provider's request/job ID
Utterance
Utterance (sentence or phrase by a single speaker)
Normalized from provider-specific types:
- Gladia:
UtteranceDTO - AssemblyAI:
TranscriptUtterance - Deepgram:
ListenV1ResponseResultsUtterancesItem
Properties
end
end:
number
End time in seconds
start
start:
number
Start time in seconds
text
text:
string
The transcribed text
channel?
optionalchannel:number
Audio channel number (for multi-channel/stereo recordings)
Channel numbering varies by provider:
- AssemblyAI: 1=left, 2=right, sequential for additional channels
- Deepgram: 0-indexed channel number
- Gladia: 0-indexed channel number
confidence?
optionalconfidence:number
Confidence score (0-1)
id?
optionalid:string
Unique utterance identifier (provider-assigned)
Available from: Deepgram Useful for linking utterances to other data (entities, sentiment, etc.)
language?
optionallanguage:string
Detected language for this utterance (BCP-47 code)
Available from: Gladia (with code-switching enabled) Essential for multilingual transcription where language changes mid-conversation.
Example
'en', 'es', 'fr', 'de'See
TranscriptionLanguage for full list of supported codes
speaker?
optionalspeaker:string
Speaker ID
words?
optionalwords:Word[]
Words in this utterance
Word
Word-level transcription with timing
Normalized from provider-specific types:
- Gladia:
WordDTO - AssemblyAI:
TranscriptWord - Deepgram:
ListenV1ResponseResultsChannelsItemAlternativesItemWordsItem
Properties
end
end:
number
End time in seconds
start
start:
number
Start time in seconds
word
word:
string
The transcribed word
channel?
optionalchannel:number
Audio channel number (for multi-channel/stereo recordings)
Channel numbering varies by provider:
- AssemblyAI: 1=left, 2=right, sequential for additional channels
- Deepgram: 0-indexed channel number
- Gladia: 0-indexed channel number
confidence?
optionalconfidence:number
Confidence score (0-1)
speaker?
optionalspeaker:string
Speaker ID if diarization is enabled
Type Aliases
AudioInput
AudioInput =
AudioInputUrl|AudioInputFile|AudioInputStream
Union of all audio input types
BatchOnlyProvider
BatchOnlyProvider =
BatchOnlyProviderType
Providers that only support batch/async transcription
Automatically derived from providers where streaming is false or undefined. Note: Speechmatics has a WebSocket API but streaming is not yet implemented in this SDK.
ProviderExtendedDataMap
ProviderExtendedDataMap =
object
Map of provider names to their extended data types
Properties
assemblyai
assemblyai:
AssemblyAIExtendedData
azure-stt
azure-stt:
Record<string,never>
deepgram
deepgram:
DeepgramExtendedData
gladia
gladia:
GladiaExtendedData
openai-whisper
openai-whisper:
Record<string,never>
soniox
soniox:
Record<string,never>
speechmatics
speechmatics:
Record<string,never>
ProviderRawResponseMap
ProviderRawResponseMap =
object
Map of provider names to their raw response types Enables type-safe access to provider-specific raw responses
Properties
assemblyai
assemblyai:
Transcript
azure-stt
azure-stt:
AzureTranscription
deepgram
deepgram:
ListenV1Response
gladia
gladia:
PreRecordedResponse
openai-whisper
openai-whisper:
CreateTranscription200One
soniox
soniox:
unknown
speechmatics
speechmatics:
unknown
SessionStatus
SessionStatus =
"connecting"|"open"|"closing"|"closed"
WebSocket session status for streaming transcription
SpeechmaticsOperatingPoint
SpeechmaticsOperatingPoint =
"standard"|"enhanced"
Speechmatics operating point (model) type Manually defined as Speechmatics OpenAPI spec doesn't export this cleanly
StreamEventType
StreamEventType =
"open"|"transcript"|"utterance"|"metadata"|"error"|"close"|"speech_start"|"speech_end"|"translation"|"sentiment"|"entity"|"summarization"|"chapterization"|"audio_ack"|"lifecycle"
Streaming transcription event types
StreamingProvider
StreamingProvider =
StreamingProviderType
Providers that support real-time streaming transcription
This type is automatically derived from ProviderCapabilitiesMap.streaming in provider-metadata.ts
No manual sync needed - if you set streaming: true for a provider, it's included here.
TranscriptionLanguage
TranscriptionLanguage =
AssemblyAILanguageCode|GladiaLanguageCode|string
Unified transcription language type with autocomplete for all providers
Includes language codes from AssemblyAI and Gladia OpenAPI specs. Deepgram uses string for flexibility.
TranscriptionModel
TranscriptionModel =
DeepgramModelType|StreamingSupportedModels|AssemblyAISpeechModel|SpeechmaticsOperatingPoint
Unified transcription model type with autocomplete for all providers
Strict union type - only accepts valid models from each provider:
- Deepgram: nova-3, nova-2, enhanced, base, etc.
- AssemblyAI: best, slam-1, universal
- Gladia: solaria-1
- Speechmatics: standard, enhanced
Use provider const objects for autocomplete:
Example
import { DeepgramModel } from 'voice-router-dev'
{ model: DeepgramModel["nova-3"] }TranscriptionProvider
TranscriptionProvider =
"gladia"|"assemblyai"|"deepgram"|"openai-whisper"|"azure-stt"|"speechmatics"|"soniox"
Supported transcription provider identifiers
TranscriptionStatus
TranscriptionStatus =
"queued"|"processing"|"completed"|"error"
Transcription status