VoiceRouter

Changelog

All notable changes to the VoiceRouter SDK


[0.8.0] - 2026-01-27

Fixed

WebSocket/Streaming URLs Now Respect baseUrl Override

Previously, setting config.baseUrl only affected REST/HTTP calls. WebSocket streaming URLs remained hardcoded or region-derived, so streaming traffic still went to production endpoints even when pointing at a proxy, mock, or private deployment.

Now all streaming adapters respect baseUrl (and the new wsBaseUrl for explicit WS override):

// All traffic (REST + WebSocket) goes to your custom endpoint
adapter.initialize({
  apiKey: 'test-key',
  baseUrl: 'https://my-proxy.internal:8080'
})

// Or override WS separately (e.g., proxy returns public WS URL you don't want)
adapter.initialize({
  apiKey: 'test-key',
  baseUrl: 'https://my-proxy.internal:8080',
  wsBaseUrl: 'wss://my-proxy.internal:8080'
})

Priority: wsBaseUrl > derived from baseUrl (https→wss) > region default > hardcoded default

AdapterBeforeAfter
DeepgramWS always from region, ignored baseUrlDerives WS from baseUrl. setRegion() skips when explicit URLs set.
AssemblyAIWS hardcoded to streaming.assemblyai.comDerives WS from baseUrl + /v3/ws
SonioxBoth REST and WS ignored baseUrlbaseUrl getter + WS URL both respect config
GladiaWS URL from API response (correct for prod)API response URL overridable with wsBaseUrl

Speechmatics Real-Time Spec Sync Fixed

The speechmaticsAsync spec sync was failing with HTTP 404 because Speechmatics moved the real-time AsyncAPI spec from their JS SDK repo to their docs repo (commit fb21f29).

  • Old URL: speechmatics/speechmatics-js-sdk/.../packages/real-time-client/schema/realtime.yml (deleted)
  • New URL: speechmatics/docs/.../spec/realtime.yaml (AsyncAPI 3.0.0)

The updated spec includes new fields: channel_diarization_labels, get_speakers, channel and channel_and_speaker diarization modes, and entity recognition result type.

custom_metadata Now Correctly Typed as Object

The custom_metadata field in Gladia (and other providers) was incorrectly typed as string in field metadata. It's now correctly typed as object with inputFormat: "json".

// Before (0.7.8)
{ name: "custom_metadata", type: "string", ... }

// After (0.7.9)
{ name: "custom_metadata", type: "object", inputFormat: "json", ... }

This allows UI importers to properly handle nested properties like custom_metadata.folder_id instead of treating them as unrecognized fields.

Technical fix: Added ZodRecord handling in zodToFieldConfigs() - record types like zod.record(zod.string(), zod.any()) are now recognized as objects with JSON input format.


[0.7.8] - 2026-01-26

Added

Field Equivalences Export for Cross-Provider Mapping

New voice-router-dev/field-equivalences export provides programmatic access to semantic field mappings across providers. Use this to build your own translation logic instead of relying on lossy auto-translation.

import {
  FIELD_EQUIVALENCES,
  getEquivalentField,
  getCategoryFields,
  supportsCategory
} from 'voice-router-dev/field-equivalences'

// Get the diarization field for Deepgram
getEquivalentField('diarization', 'deepgram', 'transcription') // 'diarize'
getEquivalentField('diarization', 'assemblyai', 'transcription') // 'speaker_labels'

// Check if provider supports a feature
supportsCategory('sentiment', 'openai', 'transcription') // false
supportsCategory('sentiment', 'deepgram', 'transcription') // true

// Get all fields for a category
getCategoryFields('diarization', 'transcription')
// { deepgram: ['diarize'], gladia: ['diarization', ...], ... }

// Access full metadata with notes and non-equivalences
FIELD_EQUIVALENCES.diarization.notes
FIELD_EQUIVALENCES.diarization.nonEquivalences

Categories covered: diarization, punctuation, language, model, translation, sentiment, entities, profanity, redaction, timestamps, callback

Also generates: docs/FIELD_EQUIVALENCES.md - Human-readable documentation

Regenerate with: pnpm docs:field-equivalences

Fixed

Gladia Streaming: Added Missing words_accurate_timestamps Field

The Gladia OpenAPI spec was missing the words_accurate_timestamps field in RealtimeProcessingConfig. This field exists in Gladia's V2 Live API (documented in their V1→V2 migration guide) but was not present in their published OpenAPI spec.

// Now available in streaming config
await gladia.transcribeStream({
  realtime_processing: {
    words_accurate_timestamps: true  // ✅ Now typed correctly
  }
})

Note: emotion_analysis and structured_data_extraction do NOT exist in Gladia's streaming API - only in batch transcription.


[0.7.7] - 2026-01-25

Changed

Improved Package Exports for Bundler Compatibility

Fixed ERR_PACKAGE_PATH_NOT_EXPORTED errors in Next.js and other bundlers:

  • Added default condition to all subpath exports as bundler fallback
  • Added explicit file path exports (./constants.mjs, ./constants.js, etc.) for direct aliasing
  • Exported ./package.json for tooling that needs version metadata
// Now works without aliasing hacks
import { DeepgramModel } from 'voice-router-dev/constants'
import { DeepgramModel } from 'voice-router-dev/constants.mjs'
import pkg from 'voice-router-dev/package.json'

[0.7.6] - 2026-01-25

Added

Auto-Generated Model Constants

Replaced hardcoded model lists with auto-generated constants from provider APIs/specs:

import {
  DeepgramModel,      // 48 models from Deepgram API
  OpenAIModel,        // 5 transcription models from OpenAI spec
  OpenAIRealtimeModel // 14 realtime models from OpenAI spec
} from 'voice-router-dev/constants'

// All models now auto-sync when specs update
{ model: DeepgramModel["nova-3"] }
{ model: OpenAIModel["gpt-4o-transcribe"] }
{ model: OpenAIRealtimeModel["gpt-4o-realtime-preview"] }

New generation scripts:

  • generate-deepgram-models.js - Fetches from https://api.deepgram.com/v1/models
  • generate-openai-models.js - Extracts from orval-generated TypeScript types

New exports:

  • DeepgramModel, DeepgramModelCodes, DeepgramModelLabels, DeepgramModelCode
  • OpenAIModelCodes, OpenAIModelLabels, OpenAIModelCode
  • OpenAITranscriptionModel, OpenAITranscriptionModelCode
  • OpenAIRealtimeModel, OpenAIRealtimeModelCode

Changed

  • Models are now automatically synced during prebuild and openapi:generate
  • Pipeline diagram updated to show MODEL EXTRACTION section

[0.7.5] - 2026-01-25

Added

Soniox Model Type Safety

Added type-safe SonioxModel constants with full autocomplete support. Model selection is now a dropdown in UI field configs.

import { SonioxModel, SonioxRealtimeModel, SonioxLanguage } from 'voice-router-dev/constants'

// Real-time streaming models with autocomplete
await adapter.transcribeStream({
  model: SonioxModel.stt_rt_v3,  // ✅ Type-safe, autocomplete
  sonioxStreaming: {
    model: SonioxRealtimeModel.stt_rt_v3,  // ✅ Strictly real-time models only
    languageHints: [SonioxLanguage.en, SonioxLanguage.es]  // ✅ Strict language codes
  }
})

Available models:

  • Real-time: stt-rt-v3, stt-rt-preview, stt-rt-v3-preview, stt-rt-preview-v2
  • Async: stt-async-v3, stt-async-preview, stt-async-preview-v1

Exports:

  • SonioxModel, SonioxRealtimeModel, SonioxAsyncModel - const objects for autocomplete
  • SonioxModelCode, SonioxRealtimeModelCode, SonioxAsyncModelCode - type unions
  • SonioxModelLabels - display names for UI

Speechmatics Operating Point Export

Added SpeechmaticsOperatingPoint constant for type-safe model quality tier selection:

import { SpeechmaticsOperatingPoint } from 'voice-router-dev/constants'

await router.transcribe('speechmatics', audioUrl, {
  model: SpeechmaticsOperatingPoint.enhanced  // ✅ Type-safe
})

Values: standard (faster), enhanced (higher accuracy)

Changed

Strict Typing for SonioxConfig

SonioxConfig.model now uses SonioxModelCode instead of loose string:

// Before (0.7.4)
const adapter = createSonioxAdapter({
  apiKey: '...',
  model: 'any-string'  // ❌ No validation
})

// After (0.7.5)
import { SonioxModel } from 'voice-router-dev/constants'

const adapter = createSonioxAdapter({
  apiKey: '...',
  model: SonioxModel.stt_async_v3  // ✅ Type-safe with autocomplete
})

Also improved getLanguagesForModel() parameter from string to SonioxModelCode.


[0.7.4] - 2026-01-23

Added

Raw WebSocket Message Capture (onRawMessage callback)

New onRawMessage callback for capturing raw, untouched provider WebSocket messages before any SDK normalization. Essential for debugging, logging, and replay scenarios.

import { VoiceRouter, RawWebSocketMessage } from 'voice-router-dev'

const session = await router.transcribeStream(
  { type: 'stream' },
  {
    onRawMessage: (msg: RawWebSocketMessage) => {
      console.log(`[${msg.provider}] ${msg.direction}: ${msg.messageType}`)
      // msg.payload contains the original, untouched data
    }
  }
)

RawWebSocketMessage interface:

interface RawWebSocketMessage {
  provider: string           // "deepgram", "gladia", "assemblyai", etc.
  direction: "incoming" | "outgoing"
  timestamp: number          // Date.now() at capture time
  payload: string | ArrayBuffer  // Raw, untouched message
  messageType?: string       // Provider-specific type if available
}

Supported providers: Deepgram, Gladia, AssemblyAI, Soniox, OpenAI Realtime

Captured messages include:

  • Incoming: All server responses (transcripts, errors, session events)
  • Outgoing: Audio chunks, control messages (stop, terminate, etc.)

[0.7.3] - 2026-01-23

Added

Strict Language Type Safety Across All Providers

Removed the | string escape hatch from TranscriptionLanguage type. Invalid language codes now cause compile-time errors instead of silent runtime failures.

// Before: any string was allowed (no compile-time validation)
export type TranscriptionLanguage = AssemblyAILanguageCode | GladiaLanguageCode | string

// After: strict union of all provider types
export type TranscriptionLanguage =
  | AssemblyAILanguageCode
  | GladiaLanguageCode
  | DeepgramLanguageCode
  | SonioxLanguageCode
  | SpeechmaticsLanguageCode
  | AzureLocaleCode

Soniox-specific options are now strictly typed:

// This now fails at compile time - "multi" is Deepgram-only
sonioxStreaming: {
  languageHints: ["multi"]  // ❌ Type error: "multi" not in SonioxLanguageCode
}

// Correct usage with autocomplete
import { SonioxLanguage } from 'voice-router-dev/constants'
sonioxStreaming: {
  languageHints: [SonioxLanguage.en, SonioxLanguage.es]  // ✅
}

Improved Soniox Error Messages

Added explicit error detection when Soniox closes connection immediately after opening (common with auth/config issues):

onError: {
  code: "SONIOX_CONFIG_REJECTED",
  message: "Soniox closed connection immediately after opening.
    Likely causes:
    - Invalid API key or region mismatch (keys are region-specific)
    - Invalid language value (e.g., 'multi' is Deepgram-only)
    - Unsupported audio format or sample rate for the model"
}

Also added console warning when language: "multi" is passed to Soniox adapter.


[0.7.2] - 2026-01-21

Added

Deepgram Architecture-Based Language Support

Added type-safe language constants with per-model architecture support:

import {
  DeepgramLanguage,
  DeepgramLanguageCodes,
  DeepgramArchitectures,
  DeepgramArchitectureLanguages,
  DeepgramMultilingualArchitectures
} from 'voice-router-dev/constants'

// 161 language codes (BCP-47 format)
{ language: DeepgramLanguage.en }
{ language: DeepgramLanguage["en-US"] }
{ language: DeepgramLanguage["pt-BR"] }

// Check languages supported by a specific architecture
const nova3Langs = DeepgramArchitectureLanguages["nova-3"]
// Includes: en, es, fr, de, ... and "multi" for codeswitching

// Multilingual codeswitching (Nova-2 and Nova-3 only)
if (DeepgramMultilingualArchitectures.includes("nova-3")) {
  // Use language: "multi" for automatic language detection
  { language: DeepgramLanguage.multi }
}

New exports:

ExportDescription
DeepgramLanguageConstant object for autocomplete (161 codes)
DeepgramLanguageCodesArray of all language codes
DeepgramLanguageCodeType for language codes
DeepgramArchitecturesArray of model architectures (base, nova-2, nova-3, etc.)
DeepgramArchitectureType for architectures
DeepgramArchitectureLanguagesMapping of architecture → supported languages
DeepgramMultilingualArchitecturesArchitectures supporting language=multi
DeepgramMultilingualArchitectureType for multilingual architectures

Technical details:

  • Languages auto-generated from https://api.deepgram.com/v1/models (public API)
  • multi (multilingual codeswitching) only available for nova-2 and nova-3 per Deepgram docs
  • Script: scripts/generate-deepgram-languages.js
  • Run: pnpm openapi:sync-deepgram-languages

Fixed

SonioxLanguage Constant Now Exported

Added missing SonioxLanguage constant object to the generated Soniox languages file. This matches the pattern used by other providers (Deepgram, Speechmatics, Azure) for autocomplete support.

import { SonioxLanguage } from 'voice-router-dev/constants'

{ language: SonioxLanguage.en }
{ language: SonioxLanguage.es }

[0.7.1] - 2026-01-16

Fixed

Nested Fields Now Extracted from Object Types

Fixed bug where nested fields inside object-type parameters were not extracted. The zodToFieldConfigs() function now properly recurses into nested Zod objects.

Affected providers:

ProviderNested Fields Now Exposed
Gladiapre_processing (audio_enhancer, speech_threshold), realtime_processing (custom_vocabulary, translation, NER, sentiment), post_processing (summarization, chapterization), language_config, messages_config, callback_config
Azureproperties.diarization.speakers, properties.languageIdentification, properties.error, model, dataset, project
Sonioxtranslation (type, target_language), context (terms, text, translation_terms)

Example - accessing nested field metadata:

import { GLADIA_STREAMING_FIELDS } from 'voice-router-dev/field-metadata'

const preProcessing = GLADIA_STREAMING_FIELDS.find(f => f.name === 'pre_processing')
console.log(preProcessing?.nestedFields)
// [
//   { name: "audio_enhancer", type: "boolean", ... },
//   { name: "speech_threshold", type: "number", min: 0, max: 1, default: 0.6, ... }
// ]

Technical fix: extractShape() in zod-to-field-configs.ts now handles objects with .shape function but no _def property (created during recursive extraction).


[0.7.0] - 2026-01-16

Added

Unified Language Constants for All Providers

All 7 STT providers now have auto-generated language constants fetched from external sources:

ProviderConstantCountSource
GladiaGladiaLanguage99OpenAPI spec enum
AssemblyAIAssemblyAILanguage102OpenAPI spec enum
DeepgramDeepgramLanguage149Deepgram API /v1/models
SpeechmaticsSpeechmaticsLanguage62Feature Discovery API
SonioxSonioxLanguage60OpenAPI spec
AzureAzureLocale154Microsoft docs parsing
OpenAIOpenAILanguage30Manual (Whisper common languages)
import {
  GladiaLanguage,
  DeepgramLanguage,
  SpeechmaticsLanguage,
  SonioxLanguage,
  AzureLocale,
  OpenAILanguage
} from 'voice-router-dev/constants'

// All providers now have consistent language constant objects
adapter.transcribe(audio, { language: GladiaLanguage.en })
adapter.transcribe(audio, { language: DeepgramLanguage["en-US"] })
adapter.transcribe(audio, { locale: AzureLocale["en-US"] })

Auto-Generated Language Scripts

New generator scripts fetch language data from external sources at build time:

  • generate-speechmatics-languages.js - Fetches from Feature Discovery API
  • generate-azure-locales.js - Parses Microsoft documentation HTML
  • generate-soniox-languages.js - Parses OpenAPI spec (updated with SonioxLanguage constant)
  • generate-deepgram-languages.js - Fetches from Deepgram API (existing)

Run pnpm openapi:generate to regenerate all language constants.

Field Metadata Auto-Population

Language fields in field-metadata.ts now automatically:

  • Set type: "select" (or "multiselect" for array fields)
  • Populate options from generated language constants

This eliminates manual override needs for language dropdowns in UI forms.

Changed

  • OpenAILanguageCodes and OpenAILanguage now exported from constants.ts (was only in provider-metadata.ts)
  • Pipeline diagram generator now includes locale scripts in LANGUAGE/LOCALE EXTRACTION section
  • Field metadata generator loads language codes from all 7 providers

[0.6.9] - 2026-01-15

Added

DeepgramLanguage Constants

New DeepgramLanguage constant with all 36+ language codes fetched from Deepgram API:

import { DeepgramLanguage } from 'voice-router-dev'

await adapter.transcribe(audioUrl, {
  language: DeepgramLanguage.en,      // English
  // or
  language: DeepgramLanguage["en-US"], // English (US)
  language: DeepgramLanguage.es,      // Spanish
  language: DeepgramLanguage.fr,      // French
})

Generated by scripts/generate-deepgram-languages.js from api.deepgram.com/v1/models.

Improved

Type Documentation

Enhanced JSDoc for Word and Utterance types with:

  • Provider-specific normalization notes (Gladia, AssemblyAI, Deepgram source types)
  • New channel field for multi-channel/stereo recordings
  • Channel numbering differences documented per provider

Pipeline Diagram

Updated docs/sdk-generation-pipeline.mmd with complete exports:

  • field-metadata.ts in SDK EXPORTS
  • constants.ts with all provider enums
  • Deepgram languages generation in LANGS section

[0.6.8] - 2026-01-14

Added

Complete Constants Export

All provider constants now exported from main SDK entry point:

import {
  // AssemblyAI batch
  AssemblyAITranscriptionModel,  // best, slam-1, universal
  AssemblyAILanguage,            // en, en_us, es, fr, de, ...
  AssemblyAIStatus,

  // Deepgram batch & TTS
  DeepgramCallbackMethod,
  DeepgramIntentMode,
  DeepgramRegion,
  DeepgramSampleRate,
  DeepgramStatus,
  DeepgramTTSContainer,
  DeepgramTTSEncoding,
  DeepgramTTSModel,
  DeepgramTTSSampleRate,

  // Gladia
  GladiaRegion,
  GladiaStatus,

  // OpenAI Whisper batch
  OpenAIModel,
  OpenAIResponseFormat,

  // OpenAI Realtime streaming
  OpenAIRealtimeAudioFormat,
  OpenAIRealtimeModel,
  OpenAIRealtimeTranscriptionModel,
  OpenAIRealtimeTurnDetection,

  // Soniox & Speechmatics
  SonioxRegion,
  SpeechmaticsRegion
} from 'voice-router-dev'

These constants enable type-safe select fields in forms with proper autocomplete.

[0.6.6] - 2026-01-14

Added

OpenAI Realtime Streaming Transcription

New transcribeStream() method for OpenAI adapter using the Realtime API:

import { createOpenAIWhisperAdapter, OpenAIRealtimeModel, OpenAIRealtimeAudioFormat } from 'voice-router-dev'

const adapter = createOpenAIWhisperAdapter({ apiKey: process.env.OPENAI_API_KEY })

const session = await adapter.transcribeStream({
  openaiStreaming: {
    model: OpenAIRealtimeModel["gpt-4o-realtime-preview"],
    inputAudioFormat: OpenAIRealtimeAudioFormat.pcm16,
    turnDetection: {
      type: "server_vad",
      threshold: 0.5,
      silenceDurationMs: 500
    }
  }
}, {
  onTranscript: (event) => console.log(event.text),
  onSpeechStart: (event) => console.log('Speech detected'),
  onSpeechEnd: (event) => console.log('Speech ended')
})

// Send audio (base64-encoded PCM16 at 24kHz)
session.sendAudio(audioChunk)
session.close()

New constants from generated OpenAPI types:

  • OpenAIRealtimeModel - Realtime API models (gpt-4o-realtime-preview, etc.)
  • OpenAIRealtimeAudioFormat - Input formats (pcm16, g711_ulaw, g711_alaw)
  • OpenAIRealtimeTurnDetection - VAD type (server_vad)
  • OpenAIRealtimeTranscriptionModel - Transcription models (whisper-1, etc.)

Type-Safe StreamingProvider Derivation

StreamingProvider type is now automatically derived from ProviderCapabilitiesMap:

// No more manual sync needed!
// If you set streaming: true in a provider's capabilities,
// it's automatically included in StreamingProvider type

type StreamingProvider = "gladia" | "deepgram" | "assemblyai" | "soniox" | "openai-whisper"
// ↑ Auto-derived from providers where streaming: true

How it works:

  • Capabilities use as const satisfies to preserve literal true/false types
  • ProvidersWithCapability<"streaming"> extracts providers where capability is true
  • Works at compile-time - no runtime overhead, full browser compatibility

Lightweight Field Metadata Export (solves 2.8MB type bundle OOM issue)

Fixed

Encoding Format Documentation Clarification

Fixed misleading documentation that showed provider-specific encoding formats instead of unified SDK format:

ProviderSDK Unified FormatProvider API Format
Gladialinear16wav/pcm
Gladiamulawwav/ulaw
Gladiaalawwav/alaw
AssemblyAIlinear16pcm_s16le
Deepgramlinear16linear16

When using VoiceRouter or adapters, always use the unified format (linear16, mulaw, alaw). The SDK handles translation to provider-specific formats automatically.

Field-configs show provider-native values (what the API expects). These are correct for direct API usage but different from VoiceRouter unified format.


Lightweight Field Metadata Export (solves 2.8MB type bundle OOM issue)

New voice-router-dev/field-metadata export provides pre-computed field metadata without the heavy Zod schema types:

  • 156 KB types vs 2.8 MB for field-configs
  • 18x reduction in type declaration size
  • No Zod dependency - plain TypeScript const arrays
  • Same field information: name, type, required, description, options, min/max, etc.
// Lightweight import - 156KB types (was 2.8MB OOM-inducing)
import {
  GLADIA_STREAMING_FIELDS,
  GladiaStreamingFieldName,
  PROVIDER_FIELDS,
  FieldMetadata
} from 'voice-router-dev/field-metadata'

// Use for UI rendering without loading heavy Zod schemas
GLADIA_STREAMING_FIELDS.forEach(field => {
  if (field.type === 'select' && field.options) {
    renderDropdown(field.name, field.options)
  }
})

// Type-safe field names still work
const fieldName: GladiaStreamingFieldName = 'encoding' // ✓
// const bad: GladiaStreamingFieldName = 'typo' // ✗ TypeScript error

When to use which:

Use CaseImport
UI form generation (no validation)voice-router-dev/field-metadata (156KB)
Runtime Zod validation neededvoice-router-dev/field-configs (2.8MB)

Generated by new build script: pnpm openapi:generate-field-metadata


[0.6.5] - 2026-01-13

Added

Typed Field Names for Compile-Time Safety

New typed field name exports enable compile-time type checking for field overrides - no more typos going unnoticed!

import {
  GladiaStreamingFieldName,
  GladiaStreamingConfig,
  FieldOverrides,
  GladiaStreamingSchema
} from 'voice-router-dev/field-configs'

// Type-safe field overrides - typos caught at compile time!
const overrides: Partial<Record<GladiaStreamingFieldName, FieldConfig | null>> = {
  encoding: { name: 'encoding', type: 'select', required: false },
  language_config: null, // Hide this field
  // typo_field: null, // ✗ TypeScript error!
}

// Fully typed config values - option values validated too!
const config: Partial<GladiaStreamingConfig> = {
  encoding: 'wav/pcm', // ✓ Only valid options allowed
  sample_rate: 16000,
}

// Extract specific field's valid options
type EncodingOptions = GladiaStreamingConfig['encoding']
// = 'wav/pcm' | 'wav/alaw' | 'wav/ulaw'

Exports for all 7 providers:

ProviderTranscriptionStreamingStreamingUpdateList
Gladia-
Deepgram-
AssemblyAI
OpenAI---
Azure--
Speechmatics
Soniox

New exports:

  • Field name types: GladiaStreamingFieldName, DeepgramTranscriptionFieldName, etc.
  • Config types: GladiaStreamingConfig, AssemblyAITranscriptionConfig, etc.
  • Zod schemas: GladiaStreamingSchema, DeepgramTranscriptionSchema, etc.
  • Generic helper: FieldOverrides<Schema> for any Zod schema
  • Union types: TranscriptionFieldName, StreamingFieldName (all providers combined)

[0.6.4] - 2026-01-13

Fixed

Zod discriminatedUnion Fix (Complete)

The fix in 0.6.3 was incomplete - it didn't handle multiline patterns generated by Orval.

What was missing:

  • Regex patterns looked for zod.discriminatedUnion but Orval generates .discriminatedUnion (chained)
  • Regex patterns expected single-line code but Orval generates multiline patterns

Updated fixes:

  • Match both zod.discriminatedUnion AND .discriminatedUnion (method chaining)
  • Handle multiline patterns where type: zod\n .enum(...)\n .optional() spans multiple lines
  • Preserve original prefix when replacing (.union([ vs zod.union([)

[0.6.3] - 2026-01-12

Fixed

Zod discriminatedUnion Runtime Crash (Incomplete)

Fixed critical runtime error that crashed Node.js on module load:

Error: Discriminator property type has duplicate value undefined
Error: A discriminator value for key `type` could not be extracted from all schema options

Root cause: Orval-generated OpenAI Zod schemas had two bugs:

  1. Discriminator fields marked .optional() - allowing all variants to have undefined as the discriminator value
  2. Discriminator fields using zod.string() instead of zod.enum()/zod.literal() - Zod can't extract a literal discriminator from a generic string

Fix: Added two post-generation fixes in scripts/fix-generated.js:

  • fixDiscriminatedUnionOptionalDiscriminator: Removes .optional() from discriminator fields
  • Enhanced fixDiscriminatedUnionMissingField: Converts discriminatedUnion to union when discriminator uses zod.string()

Affected schemas: OpenAI Realtime API audio format and turn detection configs


[0.6.2] - 2026-01-12

Added

Typed Soniox Streaming Options

New SonioxStreamingOptions interface with full type safety - no more as any casts needed:

await adapter.transcribeStream({
  sonioxStreaming: {
    model: 'stt-rt-preview',
    enableSpeakerDiarization: true,
    enableEndpointDetection: true,
    context: {
      terms: ['TypeScript', 'React'],
      text: 'Technical discussion'
    },
    translation: { type: 'one_way', target_language: 'es' }
  }
});

Features:

  • audioFormat - PCM encodings (pcm_s16le, mulaw, etc.) or auto-detect (wav, mp3, etc.)
  • enableSpeakerDiarization - Speaker labels on each token
  • enableLanguageIdentification - Language detection per token
  • enableEndpointDetection - Detect when speaker finishes
  • context - Structured vocabulary hints (terms, text, translation terms)
  • translation - One-way or two-way translation config
  • languageHints - Expected languages for better accuracy
  • clientReferenceId - Custom tracking ID

[0.6.1] - 2026-01-12

Changed

Browser-Safe Main Entry Point

The main SDK entry point is now browser-safe. Webhooks (which use node:crypto) are moved to a separate entry point.

Before (0.6.0):

// This pulled in node:crypto and broke Next.js/browser builds
import { WebhookRouter, AllProviders } from 'voice-router-dev';

After (0.6.1):

// Main entry is now browser-safe
import { VoiceRouter, AllProviders, StreamingProviders } from 'voice-router-dev';

// Webhooks are server-side only - import separately
import { WebhookRouter } from 'voice-router-dev/webhooks';

Why this matters:

  • Next.js apps can now import from the main entry without webpack errors
  • No more node:crypto pollution in client bundles
  • Cloudflare Workers and edge runtimes work out of the box

Entry point summary:

Entry PointBrowser SafeContains
voice-router-dev✅ YesRouter, Adapters, Types, Metadata
voice-router-dev/webhooks❌ No (node:crypto)WebhookRouter, handlers
voice-router-dev/constants✅ YesEnums only
voice-router-dev/field-configs✅ YesField configurations
voice-router-dev/provider-metadata✅ YesCapabilities, languages

[0.6.0] - 2026-01-11

Added

OpenAI Official Spec Integration

OpenAI types now auto-generated from the official Stainless-hosted OpenAPI spec:

import { OpenAIModel, OpenAIResponseFormat } from 'voice-router-dev/constants'
import type {
  RealtimeSessionCreateRequest,
  RealtimeTranscriptionSessionCreateRequest,
  CreateTranscriptionResponseDiarizedJson
} from 'voice-router-dev'

// All models from official spec
const model = OpenAIModel["gpt-4o-transcribe-diarize"]

// Response formats including diarization
const format = OpenAIResponseFormat.diarized_json

What changed:

  • Single source of truth: Stainless live spec (auto-updated by OpenAI)
  • 54 schemas generated (up from 15 manual types)
  • 7 endpoints included: batch audio + realtime streaming
  • Diarization types now from official spec (CreateTranscriptionResponseDiarizedJson)
  • Realtime API types: RealtimeSessionCreateRequest, RealtimeTranscriptionSessionCreateRequest, VadConfig, etc.

New models in OpenAIModel:

  • whisper-1 - Open source Whisper V2
  • gpt-4o-transcribe - GPT-4o based transcription
  • gpt-4o-mini-transcribe - Faster, cost-effective
  • gpt-4o-mini-transcribe-2025-12-15 - Dated version
  • gpt-4o-transcribe-diarize - With speaker diarization

New response formats in OpenAIResponseFormat:

  • diarized_json - JSON with speaker annotations (requires gpt-4o-transcribe-diarize)

OpenAI Realtime Streaming Types

WebSocket event types for OpenAI Realtime API:

import { OpenAIStreamingTypes } from 'voice-router-dev'

// Session creation
const session: OpenAIStreamingTypes.RealtimeSessionConfig = {
  modalities: ['text', 'audio'],
  voice: 'ash',
  input_audio_format: 'pcm16',
  input_audio_transcription: { model: 'whisper-1' },
  turn_detection: { type: 'server_vad', threshold: 0.6 }
}

// WebSocket event handling
type ServerEvent = OpenAIStreamingTypes.RealtimeServerEvent
type ClientEvent = OpenAIStreamingTypes.RealtimeClientEvent

Endpoints:

  • OpenAI: wss://api.openai.com/v1/realtime?model=gpt-4o-realtime-preview
  • Azure OpenAI: wss://{endpoint}/openai/realtime?deployment={model}&api-version=\{version\}

Soniox Provider (8th Provider)

New adapter for Soniox speech-to-text with batch and streaming support:

import { createSonioxAdapter, SonioxLanguages } from 'voice-router-dev'

const adapter = createSonioxAdapter({
  apiKey: process.env.SONIOX_API_KEY
})

// Batch transcription
const result = await adapter.transcribe({
  type: 'url',
  url: 'https://example.com/audio.mp3'
}, {
  language: 'en',
  diarization: true
})

// Real-time streaming
const session = await adapter.transcribeStream({
  language: 'en',
  sampleRate: 16000
}, {
  onTranscript: (event) => console.log(event.text),
  onError: (error) => console.error(error)
})

// Dynamic model/language discovery
const models = await adapter.getModels()
const languages = await adapter.getLanguagesForModel('stt-rt-preview')

Features:

  • Batch transcription via URL or file upload
  • Real-time WebSocket streaming with endpoint detection
  • Speaker diarization
  • Language identification (auto-detect)
  • Translation support (one-way and bidirectional)
  • Custom vocabulary via structured context
  • 60+ supported languages

Generated types from OpenAPI spec (api.soniox.com/v1/openapi.json):

  • SonioxLanguages - Array of \{code, name\} for all 60 languages
  • SonioxLanguageCodes - ISO 639-1 language codes
  • SonioxLanguageLabels - Code-to-name mapping
  • 90+ schema types via Orval (Transcription, Model, Language, etc.)

Speechmatics Batch API Type Generation

Full type generation from Speechmatics SDK batch spec (speechmatics-batch.yml):

import type { JobConfig, RetrieveTranscriptResponse } from 'voice-router-dev'
import { OperatingPoint, TranscriptionConfigDiarization } from 'voice-router-dev'

// Use generated enums instead of hardcoded strings
const config: JobConfig = {
  type: 'transcription',
  transcription_config: {
    language: 'en',
    operating_point: OperatingPoint.enhanced,
    diarization: TranscriptionConfigDiarization.speaker
  }
}

Generated from SDK spec:

  • 100+ TypeScript types from speechmatics-batch.yml
  • Enums: OperatingPoint, TranscriptionConfigDiarization, SummarizationConfigSummaryType, SummarizationConfigSummaryLength, JobDetailsStatus
  • Removed manual src/types/speechmatics.ts (replaced by generated types)

Soniox Field Configs

Field config functions for Soniox now available:

import {
  getSonioxTranscriptionFields,
  getSonioxStreamingFields,
  getSonioxListFilterFields,
  getSonioxFieldConfigs
} from 'voice-router-dev/field-configs'

const fields = getSonioxTranscriptionFields()
// → [{ name: 'model', type: 'string', ... }, { name: 'language_hints', ... }, ...]

Field Config Coverage (All Providers)

ProviderTranscriptionStreamingList FiltersUpdate Config
Gladia-
Deepgram-
AssemblyAI
OpenAI---
Speechmatics
Soniox-
Azure--

Zod Schema Exports Reference

All generated Zod schemas are exported for direct use with zodToFieldConfigs():

Export NameProviderSource
GladiaZodSchemasGladiaOpenAPI spec
DeepgramZodSchemasDeepgramOpenAPI spec
AssemblyAIZodSchemasAssemblyAIOpenAPI spec
OpenAIZodSchemasOpenAIOpenAPI spec
SpeechmaticsZodSchemasSpeechmaticsOpenAPI spec (batch)
SonioxApiZodSchemasSonioxOpenAPI spec (batch)
SonioxStreamingZodSchemasSonioxManual spec (real-time WebSocket)

Note on manual specs: Soniox and Deepgram streaming types are manually maintained because these providers do not publish AsyncAPI specs for their WebSocket APIs. Types were extracted from their official SDKs (@soniox/speech-to-text-web and @deepgram/sdk). The REST API types are auto-synced from their OpenAPI specs. If these providers publish AsyncAPI specs in the future, we will switch to auto-generation.

import { zodToFieldConfigs, SonioxApiZodSchemas } from 'voice-router-dev'

// Extract fields from any Zod schema
const transcriptionFields = zodToFieldConfigs(SonioxApiZodSchemas.createTranscriptionBody)

SDK Generation Pipeline Diagram

New auto-generated Mermaid diagram showing the SDK generation flow:

pnpm openapi:diagram

Generates docs/sdk-generation-pipeline.mmd from codebase analysis:

  • Analyzes sync-specs.js for remote/manual spec sources
  • Extracts orval config for API/Zod generation
  • Maps streaming type sync scripts
  • Includes consumer layer (router, webhooks, adapters)
  • Shows public API exports

Changed

  • OpenAI spec source: Now uses Stainless live spec instead of manual openai-whisper-openapi.yml
  • fix-openai-spec.js: Filters full OpenAI API to audio + realtime endpoints only
  • OpenAI adapter: Uses OpenAIModel constants instead of hardcoded strings
  • Provider capabilities: OpenAI now shows streaming: true (via Realtime API)
  • Azure adapter: Uses generated enums instead of hardcoded strings, removed any type casts
  • Speechmatics adapter now uses generated enums instead of hardcoded string values
  • Speechmatics adapter fixed API structure: sentiment_analysis_config and summarization_config moved to job level (was incorrectly in transcription_config)
  • Speechmatics adapter fixed additional_vocab format: now uses \{content: string\}[] per spec
  • Speechmatics adapter fixed speaker_diarization_config: uses speaker_sensitivity (not max_speakers)
  • Soniox language codes now generated from OpenAPI spec (60 languages vs 28 hardcoded)
  • OpenAPI sync scripts now include Speechmatics batch spec and Soniox specs
  • Added openapi:generate:speechmatics, openapi:generate:soniox, openapi:clean:speechmatics, openapi:clean:soniox scripts
  • Added openapi:sync-soniox-languages to generate flow

Fixed

  • OpenAI model values now stay in sync with official spec
  • OpenAIResponseFormat now includes diarized_json from official spec
  • OpenAI languageDetection capability is now true (language is optional in request)
  • Azure languageDetection capability fixed (was incorrectly false)
  • Azure customVocabulary capability fixed
  • AssemblyAI/Speechmatics streaming types now survive openapi:clean (stored in specs/)
  • Speechmatics batch field configs now work (was returning empty array)
  • Speechmatics webhook handler now uses generated RetrieveTranscriptResponse type
  • AssemblyAI streaming field configs now include SDK v3 fields (keyterms, keytermsPrompt, speechModel, languageDetection, etc.) - sync script parses both AsyncAPI spec and SDK TypeScript types

Soniox Regional Endpoints (Sovereign Cloud)

Regional endpoint support for Soniox data residency:

import { createSonioxAdapter, SonioxRegion } from 'voice-router-dev'

const adapter = createSonioxAdapter({
  apiKey: process.env.SONIOX_EU_API_KEY,
  region: SonioxRegion.eu  // EU data residency
})
RegionREST APIWebSocket
us (default)api.soniox.comstt-rt.soniox.com
euapi.eu.soniox.comstt-rt.eu.soniox.com
jpapi.jp.soniox.comstt-rt.jp.soniox.com

Note: Soniox API keys are region-specific. Each project is created with a specific region, and the API key only works with that region's endpoint.


[0.5.5] - 2026-01-09

Changed

  • Dynamic streaming types synced from AsyncAPI/SDK specs for all providers
  • Deepgram streaming params derived from official SDK (TranscriptionSchema.ts)
  • AssemblyAI streaming Zod auto-generated from SDK types
  • Speechmatics streaming types from AsyncAPI spec

[0.5.0] - 2026-01-09

Added

Zero-Hardcoding Field Configs

All field configs are now derived from Zod schemas at runtime - zero hardcoded field definitions:

import { zodToFieldConfigs, DeepgramZodSchemas } from 'voice-router-dev'

// Extract fields directly from generated Zod schemas
const fields = zodToFieldConfigs(DeepgramZodSchemas.listenV1MediaTranscribeQueryParams)
// → [{ name, type, description, options, default, min, max, ... }]

// Or use pre-built helpers
import { getDeepgramTranscriptionFields } from 'voice-router-dev'
const deepgramFields = getDeepgramTranscriptionFields() // 36 fields from Zod

Exports:

  • zodToFieldConfigs(schema) - Extract field configs from any Zod schema
  • filterFields(fields, names) - Include only specified fields
  • excludeFields(fields, names) - Exclude specified fields
  • GladiaZodSchemas, DeepgramZodSchemas, AssemblyAIZodSchemas, etc.

100% Streaming Field Coverage

ProviderFieldsSource
Gladia10OpenAPI Zod
Deepgram30OpenAPI Zod
AssemblyAI13SDK Zod

Changed

  • Deleted streaming-field-schemas.ts (was 461 lines of hardcoding)
  • Rewrote field-configs.ts: 890 → 205 lines (zero hardcoded fields)
  • All field configs now derived from Zod schemas at runtime

[0.4.1] - 2026-01-09

Added

Provider Metadata Exports for UI Rendering

Static runtime data derived from OpenAPI specs and adapter definitions:

import {
  ProviderCapabilitiesMap,
  CapabilityLabels,
  LanguageLabels,
  AllLanguageCodes,
  ProviderDisplayNames,
  StreamingProviders,
  BatchOnlyProviders
} from 'voice-router-dev/provider-metadata'

// Capability matrix for all providers
const capabilities = ProviderCapabilitiesMap['deepgram']
// → { streaming: true, diarization: true, ... }

// Language dropdown data
const languages = AllLanguageCodes['gladia']
// → ['en', 'es', 'fr', ...]
const label = LanguageLabels['en'] // → 'English'

Browser-Safe Subpath Exports

New subpath exports with no node:crypto dependency:

// Browser-safe imports
import { AllFieldConfigs } from 'voice-router-dev/field-configs'
import { ProviderCapabilitiesMap } from 'voice-router-dev/provider-metadata'

// Full SDK (server-side only)
import { VoiceRouter } from 'voice-router-dev'

Exports:

  • voice-router-dev/constants - Enums only (existing)
  • voice-router-dev/field-configs - Field configurations
  • voice-router-dev/provider-metadata - Capabilities, languages, display names

Changed

  • Types refactored to shared src/types/core.ts for browser compatibility
  • router/types.ts re-exports from core.ts (no duplication)

[0.3.7] - 2026-01-09

Added

Region Support for Multiple Providers

Region support for data residency, compliance, and latency optimization:

Deepgram EU Region (GA Jan 2026):

import { createDeepgramAdapter, DeepgramRegion } from 'voice-router-dev'

const adapter = createDeepgramAdapter({
  apiKey: process.env.DEEPGRAM_API_KEY,
  region: DeepgramRegion.eu  // All processing in EU
})

Speechmatics Regional Endpoints (EU, US, AU):

import { createSpeechmaticsAdapter, SpeechmaticsRegion } from 'voice-router-dev'

const adapter = createSpeechmaticsAdapter({
  apiKey: process.env.SPEECHMATICS_API_KEY,
  region: SpeechmaticsRegion.us1  // USA endpoint
})
RegionEndpointAvailability
eu1eu1.asr.api.speechmatics.comAll customers
eu2eu2.asr.api.speechmatics.comEnterprise only
us1us1.asr.api.speechmatics.comAll customers
us2us2.asr.api.speechmatics.comEnterprise only
au1au1.asr.api.speechmatics.comAll customers

Gladia Streaming Regions:

import { GladiaRegion } from 'voice-router-dev/constants'

await adapter.transcribeStream({
  region: GladiaRegion["us-west"]  // or "eu-west"
})

Dynamic region switching for debugging and testing:

// Switch regions on the fly without reinitializing
adapter.setRegion(DeepgramRegion.eu)
await adapter.transcribe(audio)

// Check current region
console.log(adapter.getRegion())
// Deepgram: { api: "https://api.eu.deepgram.com/v1", websocket: "wss://api.eu.deepgram.com/v1/listen" }
// Speechmatics: "https://us1.asr.api.speechmatics.com/v2"

Region support summary:

ProviderRegionsConfig LevelDynamic Switch
Deepgramglobal, euAdapter initsetRegion()
Speechmaticseu1, eu2, us1, us2, au1Adapter initsetRegion()
Gladiaus-west, eu-westStreaming optionsPer-request
AzureVia speechConfigAdapter initReinitialize

* Enterprise only

OpenAPI Spec Sync

New unified spec management system for syncing provider OpenAPI specs from official sources:

# Sync all specs from remote sources
pnpm openapi:sync

# Sync specific providers
pnpm openapi:sync:gladia
pnpm openapi:sync:deepgram
pnpm openapi:sync:assemblyai

# Full rebuild with fresh specs
pnpm openapi:rebuild

Spec sources:

All specs are now stored locally in ./specs/ for reproducible builds.

Fixed

  • Deepgram spec regeneration now works correctly with Orval input transformer
  • Manual Deepgram parameter files (SpeakV1Container, SpeakV1Encoding, SpeakV1SampleRate) are preserved during regeneration

Changed

  • prepublishOnly now syncs and validates specs before publishing

[0.3.3] - 2026-01-08

Added

Gladia Audio File Download

New getAudioFile() method for Gladia adapter - download the original audio used for transcription.

Returns ArrayBuffer for cross-platform compatibility (Node.js and browser):

const result = await gladiaAdapter.getAudioFile('transcript-123')
if (result.success && result.data) {
  // Node.js: Convert to Buffer and save
  const buffer = Buffer.from(result.data)
  fs.writeFileSync('audio.mp3', buffer)

  // Browser: Convert to Blob for playback/download
  const blob = new Blob([result.data], { type: result.contentType || 'audio/mpeg' })
  const url = URL.createObjectURL(blob)
}

// Download audio from a live/streaming session
const liveResult = await gladiaAdapter.getAudioFile('stream-456', 'streaming')
console.log('Size:', liveResult.data?.byteLength, 'bytes')

Note: This is a Gladia-specific feature. Other providers (Deepgram, AssemblyAI, Azure) do not store audio files after transcription.

New capability flag: capabilities.getAudioFile indicates provider support for audio retrieval.

Improved Metadata Clarity

New metadata fields for better discoverability:

interface TranscriptMetadata {
  /** Original audio URL you provided (echoed back) - renamed from audioUrl */
  sourceAudioUrl?: string

  /** True if getAudioFile() can retrieve the audio (Gladia only) */
  audioFileAvailable?: boolean
  // ...
}

Usage pattern:

const { transcripts } = await router.listTranscripts('gladia')

transcripts.forEach(item => {
  // What you sent
  console.log(item.data?.metadata?.sourceAudioUrl)  // "https://your-bucket.s3.amazonaws.com/audio.mp3"

  // Can we download from provider?
  if (item.data?.metadata?.audioFileAvailable) {
    const audio = await gladiaAdapter.getAudioFile(item.data.id)
    // audio.data is a Blob - actual file stored by Gladia
  }
})

Changed

  • BREAKING: metadata.audioUrl renamed to metadata.sourceAudioUrl for clarity
    • This field contains the URL you originally provided, not a provider-hosted URL
  • audioFileAvailable is now set on all provider responses (derived from capabilities.getAudioFile)

listTranscripts Implementation

Full listTranscripts() support for AssemblyAI, Gladia, Azure, and Deepgram using only generated types:

// List recent transcripts with filtering
const { transcripts, hasMore } = await router.listTranscripts('assemblyai', {
  status: 'completed',
  date: '2026-01-07',
  limit: 50
})

// Date range filtering (Gladia)
const { transcripts } = await router.listTranscripts('gladia', {
  afterDate: '2026-01-01',
  beforeDate: '2026-01-31'
})

// Provider-specific passthrough
const { transcripts } = await router.listTranscripts('assemblyai', {
  assemblyai: { after_id: 'cursor-123' }
})

// Deepgram request history (requires projectId)
const adapter = new DeepgramAdapter()
adapter.initialize({
  apiKey: process.env.DEEPGRAM_API_KEY,
  projectId: process.env.DEEPGRAM_PROJECT_ID
})

// List requests (metadata only)
const { transcripts } = await adapter.listTranscripts({
  status: 'succeeded',
  afterDate: '2026-01-01'
})

// Get full transcript by request ID
const fullTranscript = await adapter.getTranscript(transcripts[0].data?.id)
console.log(fullTranscript.data?.text)  // Full transcript!

Status Enums for Filtering

New status constants with IDE autocomplete:

import { AssemblyAIStatus, GladiaStatus, AzureStatus, DeepgramStatus } from 'voice-router-dev/constants'

await router.listTranscripts('assemblyai', {
  status: AssemblyAIStatus.completed  // queued | processing | completed | error
})

await router.listTranscripts('gladia', {
  status: GladiaStatus.done  // queued | processing | done | error
})

await router.listTranscripts('azure-stt', {
  status: AzureStatus.Succeeded  // NotStarted | Running | Succeeded | Failed
})

// Deepgram (request history - requires projectId)
await adapter.listTranscripts({
  status: DeepgramStatus.succeeded  // succeeded | failed
})

JSDoc Comments for All Constants

All constants now have JSDoc with:

  • Available values listed
  • Usage examples
  • Provider-specific notes

Typed Response Interfaces

New exported types for full autocomplete on transcript responses:

import type {
  TranscriptData,
  TranscriptMetadata,
  ListTranscriptsResponse
} from 'voice-router-dev';

const response: ListTranscriptsResponse = await router.listTranscripts('assemblyai', { limit: 20 });

response.transcripts.forEach(item => {
  // Full autocomplete - no `as any` casts needed!
  console.log(item.data?.id);                    // string
  console.log(item.data?.status);                // TranscriptionStatus
  console.log(item.data?.metadata?.audioUrl);    // string | undefined
  console.log(item.data?.metadata?.createdAt);   // string | undefined
});

Note: These are manual normalization types that unify different provider schemas. For raw provider types, use result.raw with the generic parameter:

const result: UnifiedTranscriptResponse<'assemblyai'> = await adapter.transcribe(audio);
// result.raw is typed as AssemblyAITranscript

DeepgramSampleRate Const

New convenience const for Deepgram sample rates (not in OpenAPI spec):

import { DeepgramSampleRate } from 'voice-router-dev/constants'

{ sampleRate: DeepgramSampleRate.NUMBER_16000 }

Additional Deepgram OpenAPI Re-exports

New constants directly re-exported from OpenAPI-generated types:

import { DeepgramIntentMode, DeepgramCallbackMethod } from 'voice-router-dev/constants'

// Intent detection mode
{ customIntentMode: DeepgramIntentMode.extended }  // extended | strict

// Async callback method
{ callbackMethod: DeepgramCallbackMethod.POST }  // POST | PUT

Changed

  • All adapter listTranscripts() implementations use generated API functions and types only
  • Status mappings use generated enums (TranscriptStatus, TranscriptionControllerListV2StatusItem, Status, ManageV1FilterStatusParameter)
  • Deepgram adapter now supports listTranscripts() via request history API (metadata only)
  • Deepgram getTranscript() now returns full transcript data from request history

Fixed

  • Gladia listTranscripts() now includes file metadata:

    • data.duration - audio duration in seconds
    • metadata.audioUrl - source URL (if audio_url was used)
    • metadata.filename - original filename
    • metadata.audioDuration - audio duration (also in metadata)
    • metadata.numberOfChannels - number of audio channels
  • All adapters now include raw: item in listTranscripts() responses for consistency:

    • AssemblyAI: now includes raw field with original TranscriptListItem
    • Azure: now includes raw field with original Transcription item
    • Added metadata.description to Azure list responses
  • Added clarifying comments in adapters about provider limitations:

    • AssemblyAI: audio_duration only available in full Transcript, not TranscriptListItem
    • Azure: contentUrls is write-only (not returned in list responses per API docs)

[0.3.0] - 2026-01-07

Added

Browser-Safe Constants Export

New /constants subpath export for browser, Cloudflare Workers, and edge environments:

// Browser-safe import (no node:crypto, ws, or axios)
import { DeepgramModel, GladiaEncoding, AssemblyAIEncoding } from 'voice-router-dev/constants'

const model = DeepgramModel["nova-3"]
const encoding = GladiaEncoding["wav/pcm"]

The main entry point (voice-router-dev) still works but bundles Node.js dependencies. Use /constants when you only need the enum values without the adapter classes.

Type-Safe Streaming Enums with Autocomplete

New const objects provide IDE autocomplete and compile-time validation for all streaming options. All enums are derived from OpenAPI specs and stay in sync with provider APIs.

Deepgram:

import { DeepgramEncoding, DeepgramModel, DeepgramRedact } from 'voice-router-dev'

await adapter.transcribeStream({
  deepgramStreaming: {
    encoding: DeepgramEncoding.linear16,       // "linear16" | "flac" | "mulaw" | ...
    model: DeepgramModel["nova-3"],            // "nova-3" | "nova-2" | "enhanced" | ...
    redact: [DeepgramRedact.pii],              // "pii" | "pci" | "numbers"
  }
})

Gladia:

import { GladiaEncoding, GladiaSampleRate, GladiaLanguage } from 'voice-router-dev'

await adapter.transcribeStream({
  encoding: GladiaEncoding['wav/pcm'],         // "wav/pcm" | "wav/alaw" | "wav/ulaw"
  sampleRate: GladiaSampleRate.NUMBER_16000,   // 8000 | 16000 | 32000 | 44100 | 48000
  language: GladiaLanguage.en,                 // 100+ language codes
})

AssemblyAI:

import { AssemblyAIEncoding, AssemblyAISpeechModel, AssemblyAISampleRate } from 'voice-router-dev'

await adapter.transcribeStream({
  assemblyaiStreaming: {
    encoding: AssemblyAIEncoding.pcmS16le,              // "pcm_s16le" | "pcm_mulaw"
    speechModel: AssemblyAISpeechModel.multilingual,    // English or multilingual
    sampleRate: AssemblyAISampleRate.rate16000,         // 8000-48000
  }
})

Type Safety Audit

All enums are either re-exported from OpenAPI-generated types or type-checked with satisfies:

EnumSourceType Safety
DeepgramEncodingRe-exported from ListenV1EncodingParameter✅ OpenAPI
DeepgramRedactRe-exported from ListenV1RedactParameterOneOfItem✅ OpenAPI
DeepgramModelManual const with satisfies ListenV1ModelParameter⚠️ Type-checked
DeepgramTopicModeRe-exported from SharedCustomTopicModeParameter✅ OpenAPI
GladiaEncodingRe-exported from StreamingSupportedEncodingEnum✅ OpenAPI
GladiaSampleRateRe-exported from StreamingSupportedSampleRateEnum✅ OpenAPI
GladiaBitDepthRe-exported from StreamingSupportedBitDepthEnum✅ OpenAPI
GladiaModelRe-exported from StreamingSupportedModels✅ OpenAPI
GladiaLanguageRe-exported from TranscriptionLanguageCodeEnum✅ OpenAPI
AssemblyAIEncodingManual const with satisfies AudioEncoding⚠️ Type-checked
AssemblyAISpeechModelManual const with satisfies StreamingSpeechModel⚠️ Type-checked
AssemblyAISampleRateManual const (no generated type exists)❌ Unchecked

Why some remain manual:

  • DeepgramModel: OpenAPI generates a type union, not a const object
  • AssemblyAI*: Synced from SDK types which are unions, not const objects
  • AssemblyAISampleRate: Not defined in any spec (values from SDK documentation)

The satisfies keyword ensures compile-time errors if values drift from the generated types.

Full Streaming Implementation for All Providers

  • Gladia: Complete streaming with pre-processing, real-time processing (translation, NER, sentiment), post-processing (summarization, chapterization), and all WebSocket message types
  • Deepgram: Full streaming with 30+ options including filler words, numerals, measurements, topics, intents, sentiment, entities, keyterm prompting, and VAD events
  • AssemblyAI: v3 Universal Streaming API with end-of-turn detection tuning, VAD threshold, format turns, profanity filtering, keyterms, and dynamic configuration updates

New Streaming Event Callbacks

await adapter.transcribeStream(options, {
  onTranscript: (event) => { /* interim/final transcripts */ },
  onUtterance: (utterance) => { /* complete utterances */ },
  onSpeechStart: (event) => { /* speech detected */ },
  onSpeechEnd: (event) => { /* speech ended */ },
  onTranslation: (event) => { /* real-time translation (Gladia) */ },
  onSentiment: (event) => { /* sentiment analysis (Gladia) */ },
  onEntity: (event) => { /* named entity recognition (Gladia) */ },
  onSummarization: (event) => { /* post-processing summary (Gladia) */ },
  onChapterization: (event) => { /* auto-chapters (Gladia) */ },
  onMetadata: (metadata) => { /* stream metadata */ },
  onError: (error) => { /* error handling */ },
  onClose: (code, reason) => { /* connection closed */ },
})

AssemblyAI Dynamic Configuration

const session = await adapter.transcribeStream(options, callbacks)

// Update configuration mid-stream
session.updateConfiguration?.({
  end_of_turn_confidence_threshold: 0.8,
  vad_threshold: 0.4,
  format_turns: true,
})

// Force end-of-turn detection
session.forceEndpoint?.()

Changed

  • TranscriptionModel (batch) now uses strict union type (no | string fallback)
  • DeepgramStreamingOptions.model now uses strict union type (no | string fallback)
  • AssemblyAIStreamingOptions.speechModel now uses strict union type
  • ProviderCapabilities now includes listTranscripts and deleteTranscript flags
  • DeepgramStreamingOptions now includes 30+ typed parameters from OpenAPI spec
  • AssemblyAIStreamingOptions now includes all v3 streaming parameters
  • GladiaStreamingOptions now includes full pre/realtime/post processing options
  • Provider-specific streaming options now have JSDoc examples for better discoverability

Deprecated

Raw generated enum exports are deprecated in favor of user-friendly aliases:

DeprecatedUse Instead
ListenV1EncodingParameterDeepgramEncoding
ListenV1ModelParameterDeepgramModel
ListenV1RedactParameterOneOfItemDeepgramRedact
StreamingSupportedEncodingEnumGladiaEncoding
StreamingSupportedSampleRateEnumGladiaSampleRate
StreamingSupportedBitDepthEnumGladiaBitDepth

Migration Guide (0.2.x → 0.3.0)

1. Update Enum Imports

Before (0.2.x):

import {
  ListenV1EncodingParameter,
  StreamingSupportedEncodingEnum
} from 'voice-router-dev'

const encoding = ListenV1EncodingParameter.linear16
const gladiaEncoding = StreamingSupportedEncodingEnum['wav/pcm']

After (0.3.0):

import {
  DeepgramEncoding,
  GladiaEncoding
} from 'voice-router-dev'

const encoding = DeepgramEncoding.linear16
const gladiaEncoding = GladiaEncoding['wav/pcm']

2. Update Model References

Before:

// String literals (still work but no autocomplete)
model: "nova-3"

After:

import { DeepgramModel } from 'voice-router-dev'

// With autocomplete
model: DeepgramModel["nova-3"]

3. Update Streaming Options

Before (0.2.x):

await adapter.transcribeStream({
  encoding: 'linear16',
  sampleRate: 16000,
})

After (0.3.0):

await adapter.transcribeStream({
  deepgramStreaming: {
    encoding: DeepgramEncoding.linear16,
    sampleRate: 16000,
    // Now supports 30+ additional options with autocomplete
    fillerWords: true,
    smartFormat: true,
  }
})

4. New Callback Handlers

If you were only using onTranscript, you now have access to more granular events:

await adapter.transcribeStream(options, {
  onTranscript: (event) => { /* still works */ },

  // New in 0.3.0:
  onSpeechStart: (event) => console.log('Speech started'),
  onSpeechEnd: (event) => console.log('Speech ended'),
  onUtterance: (utterance) => console.log('Complete utterance:', utterance.text),
})

[0.2.8] - 2025-12-30

Added

  • Typed extended response data with extendedData field
  • Request tracking with requestId field
  • Type-safe provider-specific options from OpenAPI specs

Changed

  • Replace 'text' with 'words' in SDK responses

[0.2.5] - 2025-12-15

Added

  • Initial OpenAPI-generated types for Gladia, Deepgram, AssemblyAI
  • Webhook normalization handlers
  • Basic streaming support

On this page

[0.8.0] - 2026-01-27FixedWebSocket/Streaming URLs Now Respect baseUrl OverrideSpeechmatics Real-Time Spec Sync Fixedcustom_metadata Now Correctly Typed as Object[0.7.8] - 2026-01-26AddedField Equivalences Export for Cross-Provider MappingFixedGladia Streaming: Added Missing words_accurate_timestamps Field[0.7.7] - 2026-01-25ChangedImproved Package Exports for Bundler Compatibility[0.7.6] - 2026-01-25AddedAuto-Generated Model ConstantsChanged[0.7.5] - 2026-01-25AddedSoniox Model Type SafetySpeechmatics Operating Point ExportChangedStrict Typing for SonioxConfig[0.7.4] - 2026-01-23AddedRaw WebSocket Message Capture (onRawMessage callback)[0.7.3] - 2026-01-23AddedStrict Language Type Safety Across All ProvidersImproved Soniox Error Messages[0.7.2] - 2026-01-21AddedDeepgram Architecture-Based Language SupportFixedSonioxLanguage Constant Now Exported[0.7.1] - 2026-01-16FixedNested Fields Now Extracted from Object Types[0.7.0] - 2026-01-16AddedUnified Language Constants for All ProvidersAuto-Generated Language ScriptsField Metadata Auto-PopulationChanged[0.6.9] - 2026-01-15AddedDeepgramLanguage ConstantsImprovedType DocumentationPipeline Diagram[0.6.8] - 2026-01-14AddedComplete Constants Export[0.6.6] - 2026-01-14AddedOpenAI Realtime Streaming TranscriptionType-Safe StreamingProvider DerivationLightweight Field Metadata Export (solves 2.8MB type bundle OOM issue)FixedEncoding Format Documentation ClarificationLightweight Field Metadata Export (solves 2.8MB type bundle OOM issue)[0.6.5] - 2026-01-13AddedTyped Field Names for Compile-Time Safety[0.6.4] - 2026-01-13FixedZod discriminatedUnion Fix (Complete)[0.6.3] - 2026-01-12FixedZod discriminatedUnion Runtime Crash (Incomplete)[0.6.2] - 2026-01-12AddedTyped Soniox Streaming Options[0.6.1] - 2026-01-12ChangedBrowser-Safe Main Entry Point[0.6.0] - 2026-01-11AddedOpenAI Official Spec IntegrationOpenAI Realtime Streaming TypesSoniox Provider (8th Provider)Speechmatics Batch API Type GenerationSoniox Field ConfigsField Config Coverage (All Providers)Zod Schema Exports ReferenceSDK Generation Pipeline DiagramChangedFixedSoniox Regional Endpoints (Sovereign Cloud)[0.5.5] - 2026-01-09Changed[0.5.0] - 2026-01-09AddedZero-Hardcoding Field Configs100% Streaming Field CoverageChanged[0.4.1] - 2026-01-09AddedProvider Metadata Exports for UI RenderingBrowser-Safe Subpath ExportsChanged[0.3.7] - 2026-01-09AddedRegion Support for Multiple ProvidersOpenAPI Spec SyncFixedChanged[0.3.3] - 2026-01-08AddedGladia Audio File DownloadImproved Metadata ClarityChangedlistTranscripts ImplementationStatus Enums for FilteringJSDoc Comments for All ConstantsTyped Response InterfacesDeepgramSampleRate ConstAdditional Deepgram OpenAPI Re-exportsChangedFixed[0.3.0] - 2026-01-07AddedBrowser-Safe Constants ExportType-Safe Streaming Enums with AutocompleteType Safety AuditFull Streaming Implementation for All ProvidersNew Streaming Event CallbacksAssemblyAI Dynamic ConfigurationChangedDeprecatedMigration Guide (0.2.x → 0.3.0)1. Update Enum Imports2. Update Model References3. Update Streaming Options4. New Callback Handlers[0.2.8] - 2025-12-30AddedChanged[0.2.5] - 2025-12-15Added