SDK Generation Pipeline

The Voice Router SDK is a multi-provider transcription SDK that provides a unified interface for multiple Speech-to-Text APIs (Gladia, AssemblyAI, Deepgram, etc.). This document explains the complete workflow from raw OpenAPI specifications to a production-ready, documented SDK.

The Journey

OpenAPI Specs → Type Generation → Adapter Code → SDK Build → Documentation → npm Package

Key Technologies

Technology	Purpose	Stage
OpenAPI/Swagger	API specification format	Input
Orval	OpenAPI → TypeScript generator	Type Generation
TypeScript	Type-safe SDK code	Development
tsup	TypeScript bundler	Build
TypeDoc	Documentation generator	Documentation
pnpm	Package manager & scripts	Orchestration

Pipeline Overview

REMOTE SOURCES

gladia

api.gladia.io/openapi.json...

assemblyai

raw.githubusercontent.com/AssemblyAI/ass...

assemblyaiAsync

raw.githubusercontent.com/AssemblyAI/ass...

assemblyaiStreaming

raw.githubusercontent.com/AssemblyAI/ass...

deepgram

raw.githubusercontent.com/deepgram/deepg...

openai

app.stainless.com/api/spec/documented/op...

azure

raw.githubusercontent.com/Azure/azure-re...

speechmatics

raw.githubusercontent.com/speechmatics/s...

speechmaticsAsync

raw.githubusercontent.com/speechmatics/s...

deepgramStreaming

raw.githubusercontent.com/deepgram/deepg...

soniox

api.soniox.com/v1/openapi.json...

MANUAL SPECS

assemblyai-streaming-types.ts

Curated streaming types - generated by sync-assemblyai-streaming-types.js

openai-realtime-types.ts

Realtime API types - from Azure-Samples/RealtimeAIApp-JS

speechmatics-batch-types.zod.ts

Curated batch schemas for field-configs

soniox-streaming-types.ts

No official AsyncAPI spec - types extracted from @soniox/speech-to-text-web SDK

SPEC SYNC

sync-specs.js

pnpm openapi:sync

specs/ directory

assemblyai-asyncapi.json

assemblyai-openapi.json

assemblyai-streaming-sdk.ts

assemblyai-streaming-types.ts

azure-stt-openapi.json

deepgram-openapi.yml

deepgram-streaming-sdk.ts

gladia-openapi.json

openai-openapi.yaml

openai-realtime-types.ts

soniox-openapi.json

soniox-streaming-types.ts

speechmatics-asyncapi.yml

speechmatics-batch-types.zod.ts

speechmatics-batch.yml

PRE-ORVAL FIXES

fix-deepgram-spec.js

fixes deepgram spec

fix-openai-spec.js

fixes openai spec

fix-speechmatics-spec.js

fixes speechmatics spec

MANUAL TYPE OVERRIDES

deepgram/

3 type files

ORVAL GENERATION

orval.config.ts

14 projects

API clients

axios-functions

Zod schemas

runtime validation

TypeScript types

schema/ directories

POST-ORVAL FIXES

fix-assemblyai-missing-schemas.js

fix-generated.js

sed fixes

speechmatics string literals

STREAMING TYPE GENERATION

sync-assemblyai-streaming-types.js

assemblyai

sync-deepgram-streaming-types.js

deepgram

sync-soniox-streaming-types.js

soniox

sync-speechmatics-streaming-types.js

speechmatics

LANGUAGE/LOCALE EXTRACTION

generate-azure-locales.js

generate-deepgram-languages.js

generate-soniox-languages.js

generate-speechmatics-languages.js

src/generated/

assemblyai/

api/ schema/ streaming

azure/

api/ schema/ locales

deepgram/

api/ schema/ streaming languages models

gladia/

api/ schema/

openai/

api/ schema/ streaming models

soniox/

api/ schema/ streaming languages models

speechmatics/

api/ schema/ streaming batch languages

SDK EXPORTS

field-configs.ts

zodToFieldConfigs() + typed field names

field-metadata.ts

Pre-computed field metadata (lightweight)

index.ts

Types + Zod namespaces

provider-metadata.ts

Provider info + capabilities

constants.ts

Enums + constants (all providers)

SDK INTERNALS

Provider Adapters

assemblyai

azure-stt

base

deepgram

gladia

openai-whisper

soniox

speechmatics

Webhook Handlers

assemblyai

azure

base

deepgram

gladia

speechmatics

Voice Router

voice-router.ts

VoiceRouter class

types.ts

TranscriptionConfig

provider-streaming-types.ts

PUBLIC API (what users import)

VoiceRouter

Multi-provider routing

Provider Adapters

Direct provider access

WebhookRouter

Webhook handling

Types + Zod Schemas

Runtime validation

Field Configs

UI form generation + typed overrides

Field Metadata

Lightweight alternative to full Zod

Provider Metadata

Provider capabilities

Constants

Type-safe enums for all providers

SDK Generation Pipeline

104 nodes • 46 edges

Pipeline Stages

OpenAPI Fetching: Download provider API specifications
Type Generation: Convert OpenAPI to TypeScript types
Adapter Implementation: Write provider-specific adapters
Bridge Layer: Implement unified VoiceRouter interface
SDK Compilation: Bundle TypeScript → JavaScript/ESM/CJS
Documentation: Generate markdown from JSDoc comments
Publishing: Package and publish to npm

Phase 1: OpenAPI Specification Fetching

What Are OpenAPI Specifications?

OpenAPI (formerly Swagger) is a standard format for describing REST APIs. Providers like Gladia and AssemblyAI publish their API definitions as OpenAPI JSON files.

Example OpenAPI Structure:

{
  "openapi": "3.0.0",
  "info": {
    "title": "Gladia API",
    "version": "1.0.0"
  },
  "paths": {
    "/transcription": {
      "post": {
        "summary": "Create transcription",
        "requestBody": {
          "content": {
            "application/json": {
              "schema": {
                "$ref": "#/components/schemas/TranscriptionRequest"
              }
            }
          }
        }
      }
    }
  },
  "components": {
    "schemas": {
      "TranscriptionRequest": {
        "type": "object",
        "properties": {
          "audio_url": { "type": "string" }
        }
      }
    }
  }
}

Provider OpenAPI URLs

We fetch OpenAPI specs directly from provider URLs:

// orval.config.ts
export default {
  gladiaApi: {
    input: {
      target: "https://api.gladia.io/openapi.json",
    },
    output: {
      target: "./src/generated/gladia/schema/",
      mode: "single",
    }
  },
  assemblyaiApi: {
    input: {
      target: "https://raw.githubusercontent.com/AssemblyAI/assemblyai-api-spec/main/openapi.json",
    },
    output: {
      target: "./src/generated/assemblyai/schema/",
      mode: "single",
    }
  }
}

How Fetching Works

When you run pnpm openapi:generate:

Orval reads the configuration file (orval.config.ts)
HTTP GET request to each provider's OpenAPI URL
Download JSON specification
Parse and validate the OpenAPI schema
Generate TypeScript types from the schema

Phase 2: Type Generation with Orval

What Is Orval?

Orval is a code generator that converts OpenAPI specifications into TypeScript code. It creates:

TypeScript interfaces for request/response types
Zod schemas for runtime validation (optional)
Type-safe API client code (optional)

Orval Configuration

import { defineConfig } from "orval"

export default defineConfig({
  gladiaApi: {
    input: {
      target: "https://api.gladia.io/openapi.json",
    },
    output: {
      target: "./src/generated/gladia/schema/",
      mode: "single",
      client: "axios",
      clean: true,
      prettier: true,
    },
  },
  gladiaZod: {
    input: {
      target: "https://api.gladia.io/openapi.json",
    },
    output: {
      target: "./src/generated/gladia/zod/",
      mode: "single",
      client: "zod",
    },
  },
})

Generated Type Example

From OpenAPI:

{
  "Transcript": {
    "type": "object",
    "properties": {
      "id": { "type": "string" },
      "text": { "type": "string" },
      "confidence": { "type": "number" }
    },
    "required": ["id"]
  }
}

To TypeScript:

/**
 * Generated by orval v7.17.0 🍺
 * Do not edit manually.
 * Gladia API
 */

export interface Transcript {
  /** Unique identifier */
  id: string;
  /** Transcribed text */
  text?: string;
  /** Confidence score */
  confidence?: number;
}

Output Structure

After generation:

src/generated/
├── gladia/
│   ├── schema/                    # TypeScript types (260 files)
│   │   ├── index.ts
│   │   ├── transcript.ts
│   │   ├── initTranscriptionRequest.ts
│   │   └── ...
│   └── zod/                       # Zod schemas (optional)
└── assemblyai/
    ├── schema/                    # TypeScript types (161 files)
    └── zod/

Phase 3: Adapter Implementation

The Three Layers

┌─────────────────────────────────────────┐
│   VoiceRouter (Bridge Layer)            │  ← User-facing API
│   - Provider selection                  │
│   - Unified interface                   │
└──────────────┬──────────────────────────┘
               │
    ┌──────────┴──────────┬──────────────────┐
    │                     │                   │
┌───▼─────────────┐  ┌───▼──────────────┐  ┌─▼────────────┐
│ GladiaAdapter   │  │ AssemblyAIAdapter│  │ DeepgramA... │
│ (Provider impl) │  │ (Provider impl)  │  │ (Provider..  │
└────────┬────────┘  └──────┬───────────┘  └──┬───────────┘
         │                  │                  │
    ┌────▼──────────────────▼──────────────────▼───┐
    │   BaseAdapter (Abstract Interface)           │
    │   - initialize()                             │
    │   - transcribe()                             │
    │   - getTranscript()                          │
    │   - createErrorResponse()                    │
    └──────────────────────────────────────────────┘

Base Adapter Interface

export abstract class BaseAdapter {
  abstract name: TranscriptionProvider
  abstract capabilities: ProviderCapabilities

  protected config?: ProviderConfig

  initialize(config: ProviderConfig): void {
    this.config = config
  }

  abstract transcribe(
    audio: AudioInput,
    options?: TranscribeOptions
  ): Promise<UnifiedTranscriptResponse>

  abstract getTranscript(
    transcriptId: string
  ): Promise<UnifiedTranscriptResponse>

  protected createErrorResponse(
    error: unknown,
    statusCode?: number
  ): UnifiedTranscriptResponse {
    // Error handling logic
  }
}

Implementing an Adapter

Step 1: Import Generated Types

import type { Transcript } from "../generated/gladia/schema/transcript"
import type { InitTranscriptionRequest } from "../generated/gladia/schema/initTranscriptionRequest"
import type { PreRecordedResponse } from "../generated/gladia/schema/preRecordedResponse"

Step 2: Create Adapter Class

export class GladiaAdapter extends BaseAdapter {
  readonly name = "gladia" as const
  readonly capabilities: ProviderCapabilities = {
    streaming: true,
    diarization: true,
    wordTimestamps: true,
  }

  private client?: AxiosInstance
  private baseUrl = "https://api.gladia.io/v2"

  initialize(config: ProviderConfig): void {
    super.initialize(config)

    this.client = axios.create({
      baseURL: config.baseUrl || this.baseUrl,
      headers: {
        "x-gladia-key": config.apiKey,
      },
    })
  }
}

Step 3: Implement Transcribe Method

async transcribe(
  audio: AudioInput,
  options?: TranscribeOptions
): Promise<UnifiedTranscriptResponse> {
  this.validateConfig()

  try {
    const payload = this.buildTranscriptionRequest(audio, options)
    const response = await this.client!.post<PreRecordedResponse>(
      "/transcription",
      payload
    )
    return this.normalizeResponse(response.data)
  } catch (error) {
    return this.createErrorResponse(error)
  }
}

Unified Types

All providers normalize to this format:

export interface UnifiedTranscriptResponse {
  success: boolean
  provider: TranscriptionProvider
  data?: {
    id: string
    text: string
    confidence?: number
    status: "queued" | "processing" | "completed" | "error"
    language?: string
    duration?: number
    speakers?: Array<{ id: string; label: string }>
    words?: Array<{
      text: string
      start: number
      end: number
      confidence?: number
      speaker?: string
    }>
    utterances?: Array<{
      text: string
      start: number
      end: number
      speaker?: string
      confidence?: number
      words: Array<{ text: string; start: number; end: number }>
    }>
    summary?: string
    metadata?: Record<string, unknown>
  }
  error?: {
    code: string
    message: string
    statusCode?: number
  }
  raw?: unknown
}

Phase 4: SDK Compilation

Build Configuration

We use tsup for bundling TypeScript into distributable formats:

import { defineConfig } from "tsup"

export default defineConfig({
  entry: { index: "src/index.ts" },
  format: ["cjs", "esm"],
  dts: true,
  splitting: false,
  sourcemap: true,
  clean: true,
  minify: false,
  target: "es2020",
  outDir: "dist",
})

Output Structure

dist/
├── index.js          # CommonJS bundle (~64 KB)
├── index.js.map      # CJS source map
├── index.mjs         # ES Module bundle (~62 KB)
├── index.mjs.map     # ESM source map
├── index.d.ts        # TypeScript declarations (~301 KB)
└── index.d.mts       # ESM TypeScript declarations

Package Exports

{
  "main": "dist/index.js",
  "module": "dist/index.mjs",
  "types": "dist/index.d.ts",
  "exports": {
    ".": {
      "types": "./dist/index.d.ts",
      "import": "./dist/index.mjs",
      "require": "./dist/index.js"
    }
  }
}

This ensures:

Node.js uses CommonJS (require)
Modern bundlers use ES Modules (import)
TypeScript gets type definitions

Phase 5: Documentation Generation

TypeDoc Pipeline

TypeDoc generates documentation by:

Parsing TypeScript source files
Extracting JSDoc comments
Analyzing types via TypeScript compiler
Generating Markdown via typedoc-plugin-markdown
Writing files to output directory

From JSDoc to Markdown

Input (TypeScript source):

/**
 * Submit audio for transcription
 *
 * @param audio - Audio input (URL, file buffer, or stream)
 * @param options - Transcription options
 * @returns Normalized transcription response
 * @throws {Error} If audio type is not supported
 *
 * @example
 * const result = await adapter.transcribe({
 *   type: 'url',
 *   url: 'https://example.com/audio.mp3'
 * });
 */
async transcribe(
  audio: AudioInput,
  options?: TranscribeOptions
): Promise<UnifiedTranscriptResponse>

Output (Generated Markdown):

### transcribe()

> **transcribe**(`audio`, `options?`): `Promise<UnifiedTranscriptResponse>`

Submit audio for transcription

#### Parameters

| Parameter | Type | Description |
|-----------|------|-------------|
| `audio` | `AudioInput` | Audio input (URL, file buffer, or stream) |
| `options?` | `TranscribeOptions` | Transcription options |

#### Returns

`Promise<UnifiedTranscriptResponse>` - Normalized transcription response

Phase 6: Publishing

Pre-Publish Checklist

Version bump in package.json
Build SDK: pnpm build:all
Run tests: pnpm test
Lint code: pnpm lint
Generate docs: pnpm build:docs
Commit changes: Git commit
Tag version: git tag v6.0.0

Publishing Process

# Automated publish (runs prepublishOnly)
pnpm publish

# Manual steps
pnpm lint:fix
pnpm build
npm publish --access public

Adding a New Provider

Step-by-Step Guide

1. Add OpenAPI Config

// orval.config.ts
newProviderApi: {
  input: {
    target: "https://api.newprovider.com/openapi.json",
  },
  output: {
    target: "./src/generated/newprovider/schema/",
    mode: "single",
    client: "axios",
    clean: true,
  },
}

2. Generate Types

pnpm openapi:generate

3. Create Adapter

import { BaseAdapter } from "./base-adapter"
import type { Transcript } from "../generated/newprovider/schema/transcript"

export class NewProviderAdapter extends BaseAdapter {
  readonly name = "newprovider" as const
  readonly capabilities = { /* ... */ }

  async transcribe(audio, options) {
    // Implementation
  }

  async getTranscript(id) {
    // Implementation
  }
}

4. Update Types

// src/router/types.ts
export type TranscriptionProvider =
  | "gladia"
  | "assemblyai"
  | "newprovider"

5. Export Adapter

// src/adapters/index.ts
export * from "./newprovider-adapter"

6. Test

pnpm build
pnpm test

Development Commands

# Start development
pnpm dev                 # Watch mode for fast iteration

# Generate types after OpenAPI changes
pnpm openapi:generate

# Build everything
pnpm build:all           # Types + Bundle + Docs

# Quick build (no type generation)
pnpm build               # Bundle + Docs

# Test
pnpm test                # Run all tests

# Lint
pnpm lint                # Check code
pnpm lint:fix            # Auto-fix issues

# Clean slate
pnpm clean && pnpm build:all

Directory Structure

voice-router-sdk/
├── src/
│   ├── adapters/                   # Provider implementations
│   │   ├── base-adapter.ts
│   │   ├── gladia-adapter.ts
│   │   ├── assemblyai-adapter.ts
│   │   └── index.ts
│   ├── generated/                  # Auto-generated types
│   │   ├── gladia/schema/          # 260 TypeScript files
│   │   └── assemblyai/schema/      # 161 TypeScript files
│   ├── router/
│   │   ├── voice-router.ts         # Main router class
│   │   └── types.ts                # Unified types
│   └── index.ts                    # Main entry point
├── dist/                           # Compiled output
├── docs/                           # Documentation
├── orval.config.ts                 # Type generation config
├── tsup.config.ts                  # Build config
└── typedoc.*.config.mjs            # Documentation configs

Type Safety Chain

OpenAPI Spec (Provider)
    ↓ orval
Generated Types (Provider-specific)
    ↓ adapter imports
Adapter Implementation (Type-safe)
    ↓ normalizer
Unified Types (Provider-agnostic)
    ↓ router
User Code (Type-safe SDK usage)

Summary

The Voice Router SDK generation workflow:

OpenAPI Fetching: Download provider API specs via HTTP
Type Generation: Convert OpenAPI → TypeScript with Orval
Adapter Implementation: Write provider-specific code using generated types
SDK Compilation: Bundle TypeScript → JavaScript/ESM/CJS with tsup
Documentation: Generate Markdown from JSDoc with TypeDoc
Publishing: Package and publish to npm

Everything runs automatically with:

pnpm build:all

This single command generates types, compiles TypeScript, bundles for distribution, and generates documentation—producing a production-ready, type-safe, multi-provider SDK.

How It Works

SDK Generation Pipeline

The Journey

Key Technologies

Pipeline Overview

Pipeline Stages

Phase 1: OpenAPI Specification Fetching

What Are OpenAPI Specifications?

Provider OpenAPI URLs

How Fetching Works

Phase 2: Type Generation with Orval

What Is Orval?

Orval Configuration

Generated Type Example

Output Structure

Phase 3: Adapter Implementation

The Three Layers

Base Adapter Interface

Implementing an Adapter

Unified Types

Phase 4: SDK Compilation

Build Configuration

Output Structure

Package Exports

Phase 5: Documentation Generation

TypeDoc Pipeline

From JSDoc to Markdown

Phase 6: Publishing

Pre-Publish Checklist

Publishing Process

Adding a New Provider

Step-by-Step Guide

Development Commands

Directory Structure

Type Safety Chain

Summary

On this page