VoiceRouter

router/types

Voice Router SDK - Core API / router/types

router/types

Interfaces

AssemblyAIExtendedData

Extended data from AssemblyAI transcription Includes chapters, entities, sentiment, content safety, and more

Properties

chapters?

optional chapters: Chapter[]

Auto-generated chapters with summaries

contentSafety?

optional contentSafety: ContentSafetyLabelsResult

Content safety/moderation labels

entities?

optional entities: Entity[]

Detected named entities (people, organizations, locations)

highlights?

optional highlights: AutoHighlightsResult

Key phrases and highlights

languageConfidence?

optional languageConfidence: number

Language detection confidence (0-1)

sentimentResults?

optional sentimentResults: SentimentAnalysisResult[]

Per-utterance sentiment analysis results

throttled?

optional throttled: boolean

Whether the request was throttled

topics?

optional topics: TopicDetectionModelResult

IAB topic categories

AudioAckEvent

Audio chunk acknowledgment event

Properties

byteRange?

optional byteRange: [number, number]

Byte range of the acknowledged audio chunk [start, end]

timeRange?

optional timeRange: [number, number]

Time range in seconds of the acknowledged audio chunk [start, end]

timestamp?

optional timestamp: string

Acknowledgment timestamp

AudioChunk

Audio chunk for streaming transcription

Properties

data

data: Buffer<ArrayBufferLike> | Uint8Array<ArrayBufferLike>

Audio data as Buffer or Uint8Array

isLast?

optional isLast: boolean

Whether this is the last chunk

ChapterizationEvent

Post-processing chapterization event

Properties

chapters

chapters: object[]

Generated chapters

end

end: number

End time in seconds

headline

headline: string

Chapter title/headline

start

start: number

Start time in seconds

summary

summary: string

Chapter summary

error?

optional error: string

Error if chapterization failed

DeepgramExtendedData

Extended data from Deepgram transcription Includes detailed metadata, model info, and feature-specific data

Properties

metadata?

optional metadata: ListenV1ResponseMetadata

Full response metadata

modelInfo?

optional modelInfo: Record<string, unknown>

Model versions used

requestId?

optional requestId: string

Request ID for debugging/tracking

sha256?

optional sha256: string

SHA256 hash of the audio

tags?

optional tags: string[]

Tags echoed back from request

EntityEvent

Named entity recognition result

Properties

text

text: string

Entity text

type

type: string

Entity type (PERSON, ORGANIZATION, LOCATION, etc.)

end?

optional end: number

End position

start?

optional start: number

Start position

utteranceId?

optional utteranceId: string

Utterance ID this entity belongs to

GladiaExtendedData

Extended data from Gladia transcription Includes translation, moderation, entities, LLM outputs, and more

Properties

audioToLlm?

optional audioToLlm: AudioToLlmListDTO

Audio-to-LLM custom prompt results

chapters?

optional chapters: ChapterizationDTO

Auto-generated chapters

customMetadata?

optional customMetadata: Record<string, unknown>

Custom metadata echoed back

entities?

optional entities: NamedEntityRecognitionDTO

Named entity recognition results

moderation?

optional moderation: ModerationDTO

Content moderation results

sentiment?

optional sentiment: SentimentAnalysisDTO

Sentiment analysis results

speakerReidentification?

optional speakerReidentification: SpeakerReidentificationDTO

AI speaker reidentification results

structuredData?

optional structuredData: StructuredDataExtractionDTO

Structured data extraction results

translation?

optional translation: TranslationDTO

Translation results (if translation enabled)

LifecycleEvent

Lifecycle event (session start, recording end, etc.)

Properties

eventType

eventType: "start_session" | "start_recording" | "stop_recording" | "end_recording" | "end_session"

Lifecycle event type

sessionId?

optional sessionId: string

Session ID

timestamp?

optional timestamp: string

Event timestamp

ListTranscriptsOptions

Options for listing transcripts with date/time filtering

Providers support different filtering capabilities:

  • AssemblyAI: status, created_on, before_id, after_id, throttled_only
  • Gladia: status, date, before_date, after_date, custom_metadata
  • Azure: status, skip, top, filter (OData)
  • Deepgram: start, end, status, page, request_id, endpoint (requires projectId)

Examples

await adapter.listTranscripts({
  date: '2026-01-07',           // Exact date (ISO format)
  status: 'completed',
  limit: 50
})
await adapter.listTranscripts({
  afterDate: '2026-01-01',
  beforeDate: '2026-01-31',
  limit: 100
})

Properties

afterDate?

optional afterDate: string

Filter for transcripts created after this date (ISO format)

assemblyai?

optional assemblyai: Partial<ListTranscriptsParams>

AssemblyAI-specific list options

beforeDate?

optional beforeDate: string

Filter for transcripts created before this date (ISO format)

date?

optional date: string

Filter by exact date (ISO format: YYYY-MM-DD)

deepgram?

optional deepgram: Partial<ManageV1ProjectsRequestsListParams>

Deepgram-specific list options (request history)

gladia?

optional gladia: Partial<TranscriptionControllerListV2Params>

Gladia-specific list options

limit?

optional limit: number

Maximum number of transcripts to retrieve

offset?

optional offset: number

Pagination offset (skip N results)

status?

optional status: string

Filter by transcript status

ListTranscriptsResponse

Response from listTranscripts

Example

import type { ListTranscriptsResponse } from 'voice-router-dev';

const response: ListTranscriptsResponse = await router.listTranscripts('assemblyai', {
  status: 'completed',
  limit: 50
});

response.transcripts.forEach(item => {
  console.log(item.data?.id, item.data?.status);
});

if (response.hasMore) {
  // Fetch next page
}

Properties

transcripts

transcripts: UnifiedTranscriptResponse<TranscriptionProvider>[]

List of transcripts

hasMore?

optional hasMore: boolean

Whether more results are available

total?

optional total: number

Total count (if available from provider)

ProviderCapabilities

Provider capability flags

Each boolean indicates whether the provider supports a specific feature. Use ProviderCapabilitiesMap from provider-metadata for runtime access.

Properties

customVocabulary

customVocabulary: boolean

Custom vocabulary/keyword boosting

deleteTranscript

deleteTranscript: boolean

Delete transcriptions

diarization

diarization: boolean

Speaker diarization (identifying different speakers)

entityDetection

entityDetection: boolean

Entity detection

languageDetection

languageDetection: boolean

Automatic language detection

listTranscripts

listTranscripts: boolean

List/fetch previous transcriptions

piiRedaction

piiRedaction: boolean

PII redaction

sentimentAnalysis

sentimentAnalysis: boolean

Sentiment analysis

streaming

streaming: boolean

Real-time streaming transcription support

summarization

summarization: boolean

Audio summarization

wordTimestamps

wordTimestamps: boolean

Word-level timestamps

getAudioFile?

optional getAudioFile: boolean

Download original audio file

SentimentEvent

Sentiment analysis result (for real-time sentiment)

Properties

sentiment

sentiment: string

Sentiment label (positive, negative, neutral)

confidence?

optional confidence: number

Confidence score 0-1

utteranceId?

optional utteranceId: string

Utterance ID this sentiment belongs to

Speaker

Speaker information from diarization

Properties

id

id: string

Speaker identifier (e.g., "A", "B", "speaker_0")

confidence?

optional confidence: number

Confidence score for speaker identification (0-1)

label?

optional label: string

Speaker label if known

SpeechEvent

Speech event data (for speech_start/speech_end events)

Properties

timestamp

timestamp: number

Timestamp in seconds

type

type: "speech_start" | "speech_end"

Event type: speech_start or speech_end

channel?

optional channel: number

Channel number

sessionId?

optional sessionId: string

Session ID

StreamEvent

Streaming transcription event

Properties

type

type: StreamEventType

channel?

optional channel: number

Channel number for multi-channel audio

confidence?

optional confidence: number

Confidence score for this event

data?

optional data: unknown

Additional event data

error?

optional error: object

Error information (for type: "error")

code

code: string

message

message: string

details?

optional details: unknown

isFinal?

optional isFinal: boolean

Whether this is a final transcript (vs interim)

language?

optional language: string

Language of the transcript/utterance

speaker?

optional speaker: string

Speaker ID if diarization is enabled

text?

optional text: string

Partial transcript text (for type: "transcript")

utterance?

optional utterance: Utterance

Utterance data (for type: "utterance")

words?

optional words: Word[]

Words in this event

StreamingCallbacks

Callback functions for streaming events

Properties

onAudioAck()?

optional onAudioAck: (event) => void

Called for audio chunk acknowledgments (Gladia: requires receive_acknowledgments)

Parameters
ParameterType
eventAudioAckEvent
Returns

void

onChapterization()?

optional onChapterization: (event) => void

Called when post-processing chapterization completes (Gladia: requires chapterization enabled)

Parameters
ParameterType
eventChapterizationEvent
Returns

void

onClose()?

optional onClose: (code?, reason?) => void

Called when the stream is closed

Parameters
ParameterType
code?number
reason?string
Returns

void

onEntity()?

optional onEntity: (event) => void

Called for named entity recognition (Gladia: requires named_entity_recognition enabled)

Parameters
ParameterType
eventEntityEvent
Returns

void

onError()?

optional onError: (error) => void

Called when an error occurs

Parameters
ParameterType
error{ code: string; message: string; details?: unknown; }
error.codestring
error.messagestring
error.details?unknown
Returns

void

onLifecycle()?

optional onLifecycle: (event) => void

Called for session lifecycle events (Gladia: requires receive_lifecycle_events)

Parameters
ParameterType
eventLifecycleEvent
Returns

void

onMetadata()?

optional onMetadata: (metadata) => void

Called when metadata is received

Parameters
ParameterType
metadataRecord<string, unknown>
Returns

void

onOpen()?

optional onOpen: () => void

Called when connection is established

Returns

void

onSentiment()?

optional onSentiment: (event) => void

Called for real-time sentiment analysis (Gladia: requires sentiment_analysis enabled)

Parameters
ParameterType
eventSentimentEvent
Returns

void

onSpeechEnd()?

optional onSpeechEnd: (event) => void

Called when speech ends (Gladia: requires receive_speech_events)

Parameters
ParameterType
eventSpeechEvent
Returns

void

onSpeechStart()?

optional onSpeechStart: (event) => void

Called when speech starts (Gladia: requires receive_speech_events)

Parameters
ParameterType
eventSpeechEvent
Returns

void

onSummarization()?

optional onSummarization: (event) => void

Called when post-processing summarization completes (Gladia: requires summarization enabled)

Parameters
ParameterType
eventSummarizationEvent
Returns

void

onTranscript()?

optional onTranscript: (event) => void

Called when a transcript (interim or final) is received

Parameters
ParameterType
eventStreamEvent
Returns

void

onTranslation()?

optional onTranslation: (event) => void

Called for real-time translation (Gladia: requires translation enabled)

Parameters
ParameterType
eventTranslationEvent
Returns

void

onUtterance()?

optional onUtterance: (utterance) => void

Called when a complete utterance is detected

Parameters
ParameterType
utteranceUtterance
Returns

void

StreamingOptions

Options for streaming transcription

Extends

Properties

assemblyai?

optional assemblyai: Partial<TranscriptOptionalParams>

AssemblyAI-specific options (passed directly to API)

See

https://www.assemblyai.com/docs/api-reference/transcripts/submit

Inherited from

TranscribeOptions.assemblyai

assemblyaiStreaming?

optional assemblyaiStreaming: AssemblyAIStreamingOptions

AssemblyAI-specific streaming options (passed to WebSocket URL & configuration)

Includes end-of-turn detection tuning, VAD threshold, profanity filter, keyterms, speech model selection, and language detection.

See

https://www.assemblyai.com/docs/speech-to-text/streaming

Example
await adapter.transcribeStream({
  assemblyaiStreaming: {
    speechModel: 'universal-streaming-multilingual',
    languageDetection: true,
    endOfTurnConfidenceThreshold: 0.7,
    minEndOfTurnSilenceWhenConfident: 500,
    vadThreshold: 0.3,
    formatTurns: true,
    filterProfanity: true,
    keyterms: ['TypeScript', 'JavaScript', 'API']
  }
});
audioToLlm?

optional audioToLlm: AudioToLlmListConfigDTO

Audio-to-LLM configuration (Gladia-specific) Run custom LLM prompts on the transcription

See

GladiaAudioToLlmConfig

Inherited from

TranscribeOptions.audioToLlm

bitDepth?

optional bitDepth: number

Bit depth for PCM audio

Common depths: 8, 16, 24, 32 16-bit is standard for most applications

channels?

optional channels: number

Number of audio channels

  • 1: Mono (recommended for transcription)
  • 2: Stereo
  • 3-8: Multi-channel (provider-specific support)
codeSwitching?

optional codeSwitching: boolean

Enable code switching (multilingual audio detection) Supported by: Gladia

Inherited from

TranscribeOptions.codeSwitching

codeSwitchingConfig?

optional codeSwitchingConfig: CodeSwitchingConfigDTO

Code switching configuration (Gladia-specific)

See

GladiaCodeSwitchingConfig

Inherited from

TranscribeOptions.codeSwitchingConfig

customVocabulary?

optional customVocabulary: string[]

Custom vocabulary to boost (provider-specific format)

Inherited from

TranscribeOptions.customVocabulary

deepgram?

optional deepgram: Partial<ListenV1MediaTranscribeParams>

Deepgram-specific options (passed directly to API)

See

https://developers.deepgram.com/reference/listen-file

Inherited from

TranscribeOptions.deepgram

deepgramStreaming?

optional deepgramStreaming: DeepgramStreamingOptions

Deepgram-specific streaming options (passed to WebSocket URL)

Includes filler_words, numerals, measurements, paragraphs, profanity_filter, topics, intents, custom_topic, custom_intent, keyterm, dictation, utt_split, and more.

See

https://developers.deepgram.com/docs/streaming

Example
await adapter.transcribeStream({
  deepgramStreaming: {
    fillerWords: true,
    profanityFilter: true,
    topics: true,
    intents: true,
    customTopic: ['sales', 'support'],
    customIntent: ['purchase', 'complaint'],
    numerals: true
  }
});
diarization?

optional diarization: boolean

Enable speaker diarization

Inherited from

TranscribeOptions.diarization

encoding?

optional encoding: AudioEncoding

Audio encoding format

Common formats:

  • linear16: PCM 16-bit (universal, recommended)
  • mulaw: μ-law telephony codec
  • alaw: A-law telephony codec
  • flac, opus, speex: Advanced codecs (Deepgram only)
See

AudioEncoding for full list of supported formats

endpointing?

optional endpointing: number

Utterance end silence threshold in milliseconds

entityDetection?

optional entityDetection: boolean

Enable entity detection

Inherited from

TranscribeOptions.entityDetection

gladia?

optional gladia: Partial<InitTranscriptionRequest>

Gladia-specific options (passed directly to API)

See

https://docs.gladia.io/

Inherited from

TranscribeOptions.gladia

gladiaStreaming?

optional gladiaStreaming: Partial<Omit<StreamingRequest, "encoding" | "channels" | "sample_rate" | "bit_depth">>

Gladia-specific streaming options (passed directly to API)

Includes pre_processing, realtime_processing, post_processing, messages_config, and callback configuration.

See

https://docs.gladia.io/api-reference/v2/live

Example
await adapter.transcribeStream({
  gladiaStreaming: {
    realtime_processing: {
      words_accurate_timestamps: true
    },
    messages_config: {
      receive_partial_transcripts: true
    }
  }
});
interimResults?

optional interimResults: boolean

Enable interim results (partial transcripts)

language?

optional language: string

Language code with autocomplete from OpenAPI specs

Example
'en', 'en_us', 'fr', 'de', 'es'
See

TranscriptionLanguage for full list

Inherited from

TranscribeOptions.language

languageDetection?

optional languageDetection: boolean

Enable automatic language detection

Inherited from

TranscribeOptions.languageDetection

maxSilence?

optional maxSilence: number

Maximum duration without endpointing in seconds

model?

optional model: TranscriptionModel

Model to use for transcription (provider-specific)

Type-safe with autocomplete for all known models:

  • Deepgram: 'nova-2', 'nova-3', 'base', 'enhanced', 'whisper-large', etc.
  • Gladia: 'solaria-1' (default)
  • AssemblyAI: Not applicable (uses Universal-2 automatically)
Example
// Use Nova-2 for better multilingual support
{ model: 'nova-2', language: 'fr' }
Overrides

TranscribeOptions.model

openai?

optional openai: Partial<Omit<CreateTranscriptionRequest, "model" | "file">>

OpenAI Whisper-specific options (passed directly to API)

See

https://platform.openai.com/docs/api-reference/audio/createTranscription

Inherited from

TranscribeOptions.openai

openaiStreaming?

optional openaiStreaming: OpenAIStreamingOptions

OpenAI Realtime API streaming options

Configure the OpenAI Realtime WebSocket connection for audio transcription. Uses the Realtime API which supports real-time audio input transcription.

See

https://platform.openai.com/docs/guides/realtime

Example
await adapter.transcribeStream({
  openaiStreaming: {
    model: 'gpt-4o-realtime-preview',
    voice: 'alloy',
    turnDetection: {
      type: 'server_vad',
      threshold: 0.5,
      silenceDurationMs: 500
    }
  }
});
piiRedaction?

optional piiRedaction: boolean

Enable PII redaction

Inherited from

TranscribeOptions.piiRedaction

region?

optional region: StreamingSupportedRegions

Regional endpoint for streaming (Gladia only)

Gladia supports regional streaming endpoints for lower latency:

  • us-west: US West Coast
  • eu-west: EU West (Ireland)
Example
import { GladiaRegion } from 'voice-router-dev/constants'

await adapter.transcribeStream({
  region: GladiaRegion["us-west"]
})
See

https://docs.gladia.io/api-reference/v2/live

sampleRate?

optional sampleRate: number

Sample rate in Hz

Common rates: 8000, 16000, 32000, 44100, 48000 Most providers recommend 16000 Hz for optimal quality/performance

sentimentAnalysis?

optional sentimentAnalysis: boolean

Enable sentiment analysis

Inherited from

TranscribeOptions.sentimentAnalysis

sonioxStreaming?

optional sonioxStreaming: SonioxStreamingOptions

Soniox-specific streaming options

Configure the Soniox WebSocket connection for real-time transcription. Supports speaker diarization, language identification, translation, and custom context.

See

https://soniox.com/docs/stt/SDKs/web-sdk

Example
await adapter.transcribeStream({
  sonioxStreaming: {
    model: 'stt-rt-preview',
    enableSpeakerDiarization: true,
    enableEndpointDetection: true,
    context: {
      terms: ['TypeScript', 'React'],
      text: 'Technical discussion'
    },
    translation: { type: 'one_way', target_language: 'es' }
  }
});
speakersExpected?

optional speakersExpected: number

Expected number of speakers (for diarization)

Inherited from

TranscribeOptions.speakersExpected

summarization?

optional summarization: boolean

Enable summarization

Inherited from

TranscribeOptions.summarization

wordTimestamps?

optional wordTimestamps: boolean

Enable word-level timestamps

Inherited from

TranscribeOptions.wordTimestamps

StreamingSession

Represents an active streaming transcription session

Properties

close()

close: () => Promise<void>

Close the streaming session

Returns

Promise<void>

createdAt

createdAt: Date

Session creation timestamp

getStatus()

getStatus: () => "open" | "connecting" | "closing" | "closed"

Get current session status

Returns

"open" | "connecting" | "closing" | "closed"

id

id: string

Unique session ID

provider

provider: TranscriptionProvider

Provider handling this stream

sendAudio()

sendAudio: (chunk) => Promise<void>

Send an audio chunk to the stream

Parameters
ParameterType
chunkAudioChunk
Returns

Promise<void>

SummarizationEvent

Post-processing summarization event

Properties

summary

summary: string

Full summarization text

error?

optional error: string

Error if summarization failed

TranscribeOptions

Common transcription options across all providers

For provider-specific options, use the typed provider options:

  • deepgram: Full Deepgram API options
  • assemblyai: Full AssemblyAI API options
  • gladia: Full Gladia API options

Properties

assemblyai?

optional assemblyai: Partial<TranscriptOptionalParams>

AssemblyAI-specific options (passed directly to API)

See

https://www.assemblyai.com/docs/api-reference/transcripts/submit

audioToLlm?

optional audioToLlm: AudioToLlmListConfigDTO

Audio-to-LLM configuration (Gladia-specific) Run custom LLM prompts on the transcription

See

GladiaAudioToLlmConfig

codeSwitching?

optional codeSwitching: boolean

Enable code switching (multilingual audio detection) Supported by: Gladia

codeSwitchingConfig?

optional codeSwitchingConfig: CodeSwitchingConfigDTO

Code switching configuration (Gladia-specific)

See

GladiaCodeSwitchingConfig

customVocabulary?

optional customVocabulary: string[]

Custom vocabulary to boost (provider-specific format)

deepgram?

optional deepgram: Partial<ListenV1MediaTranscribeParams>

Deepgram-specific options (passed directly to API)

See

https://developers.deepgram.com/reference/listen-file

diarization?

optional diarization: boolean

Enable speaker diarization

entityDetection?

optional entityDetection: boolean

Enable entity detection

gladia?

optional gladia: Partial<InitTranscriptionRequest>

Gladia-specific options (passed directly to API)

See

https://docs.gladia.io/

language?

optional language: string

Language code with autocomplete from OpenAPI specs

Example
'en', 'en_us', 'fr', 'de', 'es'
See

TranscriptionLanguage for full list

languageDetection?

optional languageDetection: boolean

Enable automatic language detection

model?

optional model: TranscriptionModel

Model to use for transcription (provider-specific)

Type-safe model selection derived from OpenAPI specs:

  • Deepgram: 'nova-3', 'nova-2', 'enhanced', 'base', etc.
  • AssemblyAI: 'best', 'slam-1', 'universal'
  • Speechmatics: 'standard', 'enhanced' (operating point)
  • Gladia: 'solaria-1' (streaming only)
See

TranscriptionModel for full list of available models

openai?

optional openai: Partial<Omit<CreateTranscriptionRequest, "model" | "file">>

OpenAI Whisper-specific options (passed directly to API)

See

https://platform.openai.com/docs/api-reference/audio/createTranscription

piiRedaction?

optional piiRedaction: boolean

Enable PII redaction

sentimentAnalysis?

optional sentimentAnalysis: boolean

Enable sentiment analysis

speakersExpected?

optional speakersExpected: number

Expected number of speakers (for diarization)

summarization?

optional summarization: boolean

Enable summarization

webhookUrl?

optional webhookUrl: string

Webhook URL for async results

wordTimestamps?

optional wordTimestamps: boolean

Enable word-level timestamps

TranscriptData

Transcript data structure

Contains the core transcript information returned by getTranscript and listTranscripts.

Example

const result = await router.getTranscript('abc123', 'assemblyai');
if (result.success && result.data) {
  console.log(result.data.id);           // string
  console.log(result.data.text);         // string
  console.log(result.data.status);       // TranscriptionStatus
  console.log(result.data.metadata);     // TranscriptMetadata
}

Properties

id

id: string

Unique transcript ID

status

status: TranscriptionStatus

Transcription status

text

text: string

Full transcribed text (empty for list items)

completedAt?

optional completedAt: string

Completion timestamp (shorthand for metadata.completedAt)

confidence?

optional confidence: number

Overall confidence score (0-1)

createdAt?

optional createdAt: string

Creation timestamp (shorthand for metadata.createdAt)

duration?

optional duration: number

Audio duration in seconds

language?

optional language: string

Detected or specified language code

metadata?

optional metadata: TranscriptMetadata

Transcript metadata

speakers?

optional speakers: Speaker[]

Speaker diarization results

summary?

optional summary: string

Summary of the content (if summarization enabled)

utterances?

optional utterances: Utterance[]

Utterances (speaker turns)

words?

optional words: Word[]

Word-level transcription with timestamps

TranscriptMetadata

Transcript metadata with typed common fields

Contains provider-agnostic metadata fields that are commonly available. Provider-specific fields can be accessed via the index signature.

Example

const { transcripts } = await router.listTranscripts('assemblyai', { limit: 20 });
transcripts.forEach(item => {
  console.log(item.data?.metadata?.audioUrl);     // string | undefined
  console.log(item.data?.metadata?.createdAt);    // string | undefined
  console.log(item.data?.metadata?.audioDuration); // number | undefined
});

Indexable

[key: string]: unknown

Provider-specific fields

Properties

audioDuration?

optional audioDuration: number

Audio duration in seconds

audioFileAvailable?

optional audioFileAvailable: boolean

True if the provider stored the audio and it can be downloaded via adapter.getAudioFile(). Currently only Gladia supports this - other providers discard audio after processing.

Example
if (item.data?.metadata?.audioFileAvailable) {
  const audio = await gladiaAdapter.getAudioFile(item.data.id)
  // audio.data is a Blob
}
completedAt?

optional completedAt: string

Completion timestamp (ISO 8601)

createdAt?

optional createdAt: string

Creation timestamp (ISO 8601)

customMetadata?

optional customMetadata: Record<string, unknown>

Custom metadata (Gladia)

displayName?

optional displayName: string

Display name (Azure)

filesUrl?

optional filesUrl: string

Files URL (Azure)

kind?

optional kind: "batch" | "streaming" | "pre-recorded" | "live"

Transcript type

lastActionAt?

optional lastActionAt: string

Last action timestamp (Azure)

resourceUrl?

optional resourceUrl: string

Resource URL for the transcript

sourceAudioUrl?

optional sourceAudioUrl: string

Original audio URL/source you provided to the API (echoed back). This is NOT a provider-hosted URL - it's what you sent when creating the transcription.

TranslationEvent

Translation event data (for real-time translation)

Properties

targetLanguage

targetLanguage: string

Target language

translatedText

translatedText: string

Translated text

isFinal?

optional isFinal: boolean

Whether this is a final translation

original?

optional original: string

Original text

utteranceId?

optional utteranceId: string

Utterance ID this translation belongs to

UnifiedTranscriptResponse

Unified transcription response with provider-specific type safety

When a specific provider is known at compile time, both raw and extended fields will be typed with that provider's actual types.

Examples

const result: UnifiedTranscriptResponse<'assemblyai'> = await adapter.transcribe(audio);
// result.raw is typed as AssemblyAITranscript
// result.extended is typed as AssemblyAIExtendedData
const chapters = result.extended?.chapters; // AssemblyAIChapter[] | undefined
const entities = result.extended?.entities; // AssemblyAIEntity[] | undefined
const result: UnifiedTranscriptResponse<'gladia'> = await gladiaAdapter.transcribe(audio);
const translation = result.extended?.translation; // GladiaTranslation | undefined
const llmResults = result.extended?.audioToLlm; // GladiaAudioToLlmResult | undefined
const result: UnifiedTranscriptResponse = await router.transcribe(audio);
// result.raw is typed as unknown (could be any provider)
// result.extended is typed as union of all extended types

Type Parameters

Type ParameterDefault typeDescription
P extends TranscriptionProviderTranscriptionProviderThe transcription provider (defaults to all providers)

Properties

provider

provider: P

Provider that performed the transcription

success

success: boolean

Operation success status

data?

optional data: TranscriptData

Transcription data (only present on success)

error?

optional error: object

Error information (only present on failure)

code

code: string

Error code (provider-specific or normalized)

message

message: string

Human-readable error message

details?

optional details: unknown

Additional error details

statusCode?

optional statusCode: number

HTTP status code if applicable

extended?

optional extended: P extends keyof ProviderExtendedDataMap ? ProviderExtendedDataMap[P<P>] : unknown

Extended provider-specific data (fully typed from OpenAPI specs)

Contains rich data beyond basic transcription:

  • AssemblyAI: chapters, entities, sentiment, content safety, topics
  • Gladia: translation, moderation, entities, audio-to-llm, chapters
  • Deepgram: detailed metadata, request tracking, model info
Example
const result = await assemblyaiAdapter.transcribe(audio, { summarization: true });
result.extended?.chapters?.forEach(chapter => {
  console.log(`${chapter.headline}: ${chapter.summary}`);
});
raw?

optional raw: P extends keyof ProviderRawResponseMap ? ProviderRawResponseMap[P<P>] : unknown

Raw provider response (for advanced usage)

Type-safe based on the provider:

  • gladia: PreRecordedResponse
  • deepgram: ListenV1Response
  • openai-whisper: CreateTranscription200One
  • assemblyai: AssemblyAITranscript
  • azure-stt: AzureTranscription
tracking?

optional tracking: object

Request tracking information for debugging

audioHash?

optional audioHash: string

Audio fingerprint (SHA256) if available

processingTimeMs?

optional processingTimeMs: number

Processing duration in milliseconds

requestId?

optional requestId: string

Provider's request/job ID

Utterance

Utterance (sentence or phrase by a single speaker)

Normalized from provider-specific types:

  • Gladia: UtteranceDTO
  • AssemblyAI: TranscriptUtterance
  • Deepgram: ListenV1ResponseResultsUtterancesItem

Properties

end

end: number

End time in seconds

start

start: number

Start time in seconds

text

text: string

The transcribed text

channel?

optional channel: number

Audio channel number (for multi-channel/stereo recordings)

Channel numbering varies by provider:

  • AssemblyAI: 1=left, 2=right, sequential for additional channels
  • Deepgram: 0-indexed channel number
  • Gladia: 0-indexed channel number
confidence?

optional confidence: number

Confidence score (0-1)

id?

optional id: string

Unique utterance identifier (provider-assigned)

Available from: Deepgram Useful for linking utterances to other data (entities, sentiment, etc.)

language?

optional language: string

Detected language for this utterance (BCP-47 code)

Available from: Gladia (with code-switching enabled) Essential for multilingual transcription where language changes mid-conversation.

Example
'en', 'es', 'fr', 'de'
See

TranscriptionLanguage for full list of supported codes

speaker?

optional speaker: string

Speaker ID

words?

optional words: Word[]

Words in this utterance

Word

Word-level transcription with timing

Normalized from provider-specific types:

  • Gladia: WordDTO
  • AssemblyAI: TranscriptWord
  • Deepgram: ListenV1ResponseResultsChannelsItemAlternativesItemWordsItem

Properties

end

end: number

End time in seconds

start

start: number

Start time in seconds

word

word: string

The transcribed word

channel?

optional channel: number

Audio channel number (for multi-channel/stereo recordings)

Channel numbering varies by provider:

  • AssemblyAI: 1=left, 2=right, sequential for additional channels
  • Deepgram: 0-indexed channel number
  • Gladia: 0-indexed channel number
confidence?

optional confidence: number

Confidence score (0-1)

speaker?

optional speaker: string

Speaker ID if diarization is enabled

Type Aliases

AudioInput

AudioInput = AudioInputUrl | AudioInputFile | AudioInputStream

Union of all audio input types

BatchOnlyProvider

BatchOnlyProvider = BatchOnlyProviderType

Providers that only support batch/async transcription

Automatically derived from providers where streaming is false or undefined. Note: Speechmatics has a WebSocket API but streaming is not yet implemented in this SDK.

ProviderExtendedDataMap

ProviderExtendedDataMap = object

Map of provider names to their extended data types

Properties

assemblyai

assemblyai: AssemblyAIExtendedData

azure-stt

azure-stt: Record<string, never>

deepgram

deepgram: DeepgramExtendedData

gladia

gladia: GladiaExtendedData

openai-whisper

openai-whisper: Record<string, never>

soniox

soniox: Record<string, never>

speechmatics

speechmatics: Record<string, never>

ProviderRawResponseMap

ProviderRawResponseMap = object

Map of provider names to their raw response types Enables type-safe access to provider-specific raw responses

Properties

assemblyai

assemblyai: AssemblyAITranscript

azure-stt

azure-stt: AzureTranscription

deepgram

deepgram: ListenV1Response

gladia

gladia: PreRecordedResponse

openai-whisper

openai-whisper: CreateTranscription200One

soniox

soniox: unknown

speechmatics

speechmatics: unknown

SessionStatus

SessionStatus = "connecting" | "open" | "closing" | "closed"

WebSocket session status for streaming transcription

SpeechmaticsOperatingPoint

SpeechmaticsOperatingPoint = "standard" | "enhanced"

Speechmatics operating point (model) type Manually defined as Speechmatics OpenAPI spec doesn't export this cleanly

StreamEventType

StreamEventType = "open" | "transcript" | "utterance" | "metadata" | "error" | "close" | "speech_start" | "speech_end" | "translation" | "sentiment" | "entity" | "summarization" | "chapterization" | "audio_ack" | "lifecycle"

Streaming transcription event types

StreamingProvider

StreamingProvider = StreamingProviderType

Providers that support real-time streaming transcription

This type is automatically derived from ProviderCapabilitiesMap.streaming in provider-metadata.ts No manual sync needed - if you set streaming: true for a provider, it's included here.

TranscriptionLanguage

TranscriptionLanguage = AssemblyAILanguageCode | GladiaLanguageCode | string

Unified transcription language type with autocomplete for all providers

Includes language codes from AssemblyAI and Gladia OpenAPI specs. Deepgram uses string for flexibility.

TranscriptionModel

TranscriptionModel = DeepgramModelType | StreamingSupportedModels | AssemblyAISpeechModel | SpeechmaticsOperatingPoint

Unified transcription model type with autocomplete for all providers

Strict union type - only accepts valid models from each provider:

  • Deepgram: nova-3, nova-2, enhanced, base, etc.
  • AssemblyAI: best, slam-1, universal
  • Gladia: solaria-1
  • Speechmatics: standard, enhanced

Use provider const objects for autocomplete:

Example

import { DeepgramModel } from 'voice-router-dev'
{ model: DeepgramModel["nova-3"] }

TranscriptionProvider

TranscriptionProvider = "gladia" | "assemblyai" | "deepgram" | "openai-whisper" | "azure-stt" | "speechmatics" | "soniox"

Supported transcription provider identifiers

TranscriptionStatus

TranscriptionStatus = "queued" | "processing" | "completed" | "error"

Transcription status

On this page

router/typesInterfacesAssemblyAIExtendedDataPropertieschapters?contentSafety?entities?highlights?languageConfidence?sentimentResults?throttled?topics?AudioAckEventPropertiesbyteRange?timeRange?timestamp?AudioChunkPropertiesdataisLast?ChapterizationEventPropertieschaptersendheadlinestartsummaryerror?DeepgramExtendedDataPropertiesmetadata?modelInfo?requestId?sha256?tags?EntityEventPropertiestexttypeend?start?utteranceId?GladiaExtendedDataPropertiesaudioToLlm?chapters?customMetadata?entities?moderation?sentiment?speakerReidentification?structuredData?translation?LifecycleEventPropertieseventTypesessionId?timestamp?ListTranscriptsOptionsExamplesPropertiesafterDate?assemblyai?beforeDate?date?deepgram?gladia?limit?offset?status?ListTranscriptsResponseExamplePropertiestranscriptshasMore?total?ProviderCapabilitiesPropertiescustomVocabularydeleteTranscriptdiarizationentityDetectionlanguageDetectionlistTranscriptspiiRedactionsentimentAnalysisstreamingsummarizationwordTimestampsgetAudioFile?SentimentEventPropertiessentimentconfidence?utteranceId?SpeakerPropertiesidconfidence?label?SpeechEventPropertiestimestamptypechannel?sessionId?StreamEventPropertiestypechannel?confidence?data?error?codemessagedetails?isFinal?language?speaker?text?utterance?words?StreamingCallbacksPropertiesonAudioAck()?ParametersReturnsonChapterization()?ParametersReturnsonClose()?ParametersReturnsonEntity()?ParametersReturnsonError()?ParametersReturnsonLifecycle()?ParametersReturnsonMetadata()?ParametersReturnsonOpen()?ReturnsonSentiment()?ParametersReturnsonSpeechEnd()?ParametersReturnsonSpeechStart()?ParametersReturnsonSummarization()?ParametersReturnsonTranscript()?ParametersReturnsonTranslation()?ParametersReturnsonUtterance()?ParametersReturnsStreamingOptionsExtendsPropertiesassemblyai?SeeInherited fromassemblyaiStreaming?SeeExampleaudioToLlm?SeeInherited frombitDepth?channels?codeSwitching?Inherited fromcodeSwitchingConfig?SeeInherited fromcustomVocabulary?Inherited fromdeepgram?SeeInherited fromdeepgramStreaming?SeeExamplediarization?Inherited fromencoding?Seeendpointing?entityDetection?Inherited fromgladia?SeeInherited fromgladiaStreaming?SeeExampleinterimResults?language?ExampleSeeInherited fromlanguageDetection?Inherited frommaxSilence?model?ExampleOverridesopenai?SeeInherited fromopenaiStreaming?SeeExamplepiiRedaction?Inherited fromregion?ExampleSeesampleRate?sentimentAnalysis?Inherited fromsonioxStreaming?SeeExamplespeakersExpected?Inherited fromsummarization?Inherited fromwordTimestamps?Inherited fromStreamingSessionPropertiesclose()ReturnscreatedAtgetStatus()ReturnsidprovidersendAudio()ParametersReturnsSummarizationEventPropertiessummaryerror?TranscribeOptionsPropertiesassemblyai?SeeaudioToLlm?SeecodeSwitching?codeSwitchingConfig?SeecustomVocabulary?deepgram?Seediarization?entityDetection?gladia?Seelanguage?ExampleSeelanguageDetection?model?Seeopenai?SeepiiRedaction?sentimentAnalysis?speakersExpected?summarization?webhookUrl?wordTimestamps?TranscriptDataExamplePropertiesidstatustextcompletedAt?confidence?createdAt?duration?language?metadata?speakers?summary?utterances?words?TranscriptMetadataExampleIndexablePropertiesaudioDuration?audioFileAvailable?ExamplecompletedAt?createdAt?customMetadata?displayName?filesUrl?kind?lastActionAt?resourceUrl?sourceAudioUrl?TranslationEventPropertiestargetLanguagetranslatedTextisFinal?original?utteranceId?UnifiedTranscriptResponseExamplesType ParametersPropertiesprovidersuccessdata?error?codemessagedetails?statusCode?extended?Exampleraw?tracking?audioHash?processingTimeMs?requestId?UtterancePropertiesendstarttextchannel?confidence?id?language?ExampleSeespeaker?words?WordPropertiesendstartwordchannel?confidence?speaker?Type AliasesAudioInputBatchOnlyProviderProviderExtendedDataMapPropertiesassemblyaiazure-sttdeepgramgladiaopenai-whispersonioxspeechmaticsProviderRawResponseMapPropertiesassemblyaiazure-sttdeepgramgladiaopenai-whispersonioxspeechmaticsSessionStatusSpeechmaticsOperatingPointStreamEventTypeStreamingProviderTranscriptionLanguageTranscriptionModelExampleTranscriptionProviderTranscriptionStatus