VoiceRouter
Architecture

How It Works

Complete SDK generation workflow from OpenAPI specs to production-ready package

SDK Generation Pipeline

The Voice Router SDK is a multi-provider transcription SDK that provides a unified interface for multiple Speech-to-Text APIs (Gladia, AssemblyAI, Deepgram, etc.). This document explains the complete workflow from raw OpenAPI specifications to a production-ready, documented SDK.

The Journey

OpenAPI Specs → Type Generation → Adapter Code → SDK Build → Documentation → npm Package

Key Technologies

TechnologyPurposeStage
OpenAPI/SwaggerAPI specification formatInput
OrvalOpenAPI → TypeScript generatorType Generation
TypeScriptType-safe SDK codeDevelopment
tsupTypeScript bundlerBuild
TypeDocDocumentation generatorDocumentation
pnpmPackage manager & scriptsOrchestration

Pipeline Overview

Mini Map
SDK Generation Pipeline
102 nodes • 38 edges

Pipeline Stages

  1. OpenAPI Fetching: Download provider API specifications
  2. Type Generation: Convert OpenAPI to TypeScript types
  3. Adapter Implementation: Write provider-specific adapters
  4. Bridge Layer: Implement unified VoiceRouter interface
  5. SDK Compilation: Bundle TypeScript → JavaScript/ESM/CJS
  6. Documentation: Generate markdown from JSDoc comments
  7. Publishing: Package and publish to npm

Phase 1: OpenAPI Specification Fetching

What Are OpenAPI Specifications?

OpenAPI (formerly Swagger) is a standard format for describing REST APIs. Providers like Gladia and AssemblyAI publish their API definitions as OpenAPI JSON files.

Example OpenAPI Structure:

{
  "openapi": "3.0.0",
  "info": {
    "title": "Gladia API",
    "version": "1.0.0"
  },
  "paths": {
    "/transcription": {
      "post": {
        "summary": "Create transcription",
        "requestBody": {
          "content": {
            "application/json": {
              "schema": {
                "$ref": "#/components/schemas/TranscriptionRequest"
              }
            }
          }
        }
      }
    }
  },
  "components": {
    "schemas": {
      "TranscriptionRequest": {
        "type": "object",
        "properties": {
          "audio_url": { "type": "string" }
        }
      }
    }
  }
}

Provider OpenAPI URLs

We fetch OpenAPI specs directly from provider URLs:

// orval.config.ts
export default {
  gladiaApi: {
    input: {
      target: "https://api.gladia.io/openapi.json",
    },
    output: {
      target: "./src/generated/gladia/schema/",
      mode: "single",
    }
  },
  assemblyaiApi: {
    input: {
      target: "https://raw.githubusercontent.com/AssemblyAI/assemblyai-api-spec/main/openapi.json",
    },
    output: {
      target: "./src/generated/assemblyai/schema/",
      mode: "single",
    }
  }
}

How Fetching Works

When you run pnpm openapi:generate:

  1. Orval reads the configuration file (orval.config.ts)
  2. HTTP GET request to each provider's OpenAPI URL
  3. Download JSON specification
  4. Parse and validate the OpenAPI schema
  5. Generate TypeScript types from the schema

Phase 2: Type Generation with Orval

What Is Orval?

Orval is a code generator that converts OpenAPI specifications into TypeScript code. It creates:

  • TypeScript interfaces for request/response types
  • Zod schemas for runtime validation (optional)
  • Type-safe API client code (optional)

Orval Configuration

import { defineConfig } from "orval"

export default defineConfig({
  gladiaApi: {
    input: {
      target: "https://api.gladia.io/openapi.json",
    },
    output: {
      target: "./src/generated/gladia/schema/",
      mode: "single",
      client: "axios",
      clean: true,
      prettier: true,
    },
  },
  gladiaZod: {
    input: {
      target: "https://api.gladia.io/openapi.json",
    },
    output: {
      target: "./src/generated/gladia/zod/",
      mode: "single",
      client: "zod",
    },
  },
})

Generated Type Example

From OpenAPI:

{
  "Transcript": {
    "type": "object",
    "properties": {
      "id": { "type": "string" },
      "text": { "type": "string" },
      "confidence": { "type": "number" }
    },
    "required": ["id"]
  }
}

To TypeScript:

/**
 * Generated by orval v7.17.0 🍺
 * Do not edit manually.
 * Gladia API
 */

export interface Transcript {
  /** Unique identifier */
  id: string;
  /** Transcribed text */
  text?: string;
  /** Confidence score */
  confidence?: number;
}

Output Structure

After generation:

src/generated/
├── gladia/
│   ├── schema/                    # TypeScript types (260 files)
│   │   ├── index.ts
│   │   ├── transcript.ts
│   │   ├── initTranscriptionRequest.ts
│   │   └── ...
│   └── zod/                       # Zod schemas (optional)
└── assemblyai/
    ├── schema/                    # TypeScript types (161 files)
    └── zod/

Phase 3: Adapter Implementation

The Three Layers

┌─────────────────────────────────────────┐
│   VoiceRouter (Bridge Layer)            │  ← User-facing API
│   - Provider selection                  │
│   - Unified interface                   │
└──────────────┬──────────────────────────┘

    ┌──────────┴──────────┬──────────────────┐
    │                     │                   │
┌───▼─────────────┐  ┌───▼──────────────┐  ┌─▼────────────┐
│ GladiaAdapter   │  │ AssemblyAIAdapter│  │ DeepgramA... │
│ (Provider impl) │  │ (Provider impl)  │  │ (Provider..  │
└────────┬────────┘  └──────┬───────────┘  └──┬───────────┘
         │                  │                  │
    ┌────▼──────────────────▼──────────────────▼───┐
    │   BaseAdapter (Abstract Interface)           │
    │   - initialize()                             │
    │   - transcribe()                             │
    │   - getTranscript()                          │
    │   - createErrorResponse()                    │
    └──────────────────────────────────────────────┘

Base Adapter Interface

export abstract class BaseAdapter {
  abstract name: TranscriptionProvider
  abstract capabilities: ProviderCapabilities

  protected config?: ProviderConfig

  initialize(config: ProviderConfig): void {
    this.config = config
  }

  abstract transcribe(
    audio: AudioInput,
    options?: TranscribeOptions
  ): Promise<UnifiedTranscriptResponse>

  abstract getTranscript(
    transcriptId: string
  ): Promise<UnifiedTranscriptResponse>

  protected createErrorResponse(
    error: unknown,
    statusCode?: number
  ): UnifiedTranscriptResponse {
    // Error handling logic
  }
}

Implementing an Adapter

Step 1: Import Generated Types

import type { Transcript } from "../generated/gladia/schema/transcript"
import type { InitTranscriptionRequest } from "../generated/gladia/schema/initTranscriptionRequest"
import type { PreRecordedResponse } from "../generated/gladia/schema/preRecordedResponse"

Step 2: Create Adapter Class

export class GladiaAdapter extends BaseAdapter {
  readonly name = "gladia" as const
  readonly capabilities: ProviderCapabilities = {
    streaming: true,
    diarization: true,
    wordTimestamps: true,
  }

  private client?: AxiosInstance
  private baseUrl = "https://api.gladia.io/v2"

  initialize(config: ProviderConfig): void {
    super.initialize(config)

    this.client = axios.create({
      baseURL: config.baseUrl || this.baseUrl,
      headers: {
        "x-gladia-key": config.apiKey,
      },
    })
  }
}

Step 3: Implement Transcribe Method

async transcribe(
  audio: AudioInput,
  options?: TranscribeOptions
): Promise<UnifiedTranscriptResponse> {
  this.validateConfig()

  try {
    const payload = this.buildTranscriptionRequest(audio, options)
    const response = await this.client!.post<PreRecordedResponse>(
      "/transcription",
      payload
    )
    return this.normalizeResponse(response.data)
  } catch (error) {
    return this.createErrorResponse(error)
  }
}

Unified Types

All providers normalize to this format:

export interface UnifiedTranscriptResponse {
  success: boolean
  provider: TranscriptionProvider
  data?: {
    id: string
    text: string
    confidence?: number
    status: "queued" | "processing" | "completed" | "error"
    language?: string
    duration?: number
    speakers?: Array<{ id: string; label: string }>
    words?: Array<{
      text: string
      start: number
      end: number
      confidence?: number
      speaker?: string
    }>
    utterances?: Array<{
      text: string
      start: number
      end: number
      speaker?: string
      confidence?: number
      words: Array<{ text: string; start: number; end: number }>
    }>
    summary?: string
    metadata?: Record<string, unknown>
  }
  error?: {
    code: string
    message: string
    statusCode?: number
  }
  raw?: unknown
}

Phase 4: SDK Compilation

Build Configuration

We use tsup for bundling TypeScript into distributable formats:

import { defineConfig } from "tsup"

export default defineConfig({
  entry: { index: "src/index.ts" },
  format: ["cjs", "esm"],
  dts: true,
  splitting: false,
  sourcemap: true,
  clean: true,
  minify: false,
  target: "es2020",
  outDir: "dist",
})

Output Structure

dist/
├── index.js          # CommonJS bundle (~64 KB)
├── index.js.map      # CJS source map
├── index.mjs         # ES Module bundle (~62 KB)
├── index.mjs.map     # ESM source map
├── index.d.ts        # TypeScript declarations (~301 KB)
└── index.d.mts       # ESM TypeScript declarations

Package Exports

{
  "main": "dist/index.js",
  "module": "dist/index.mjs",
  "types": "dist/index.d.ts",
  "exports": {
    ".": {
      "types": "./dist/index.d.ts",
      "import": "./dist/index.mjs",
      "require": "./dist/index.js"
    }
  }
}

This ensures:

  • Node.js uses CommonJS (require)
  • Modern bundlers use ES Modules (import)
  • TypeScript gets type definitions

Phase 5: Documentation Generation

TypeDoc Pipeline

TypeDoc generates documentation by:

  1. Parsing TypeScript source files
  2. Extracting JSDoc comments
  3. Analyzing types via TypeScript compiler
  4. Generating Markdown via typedoc-plugin-markdown
  5. Writing files to output directory

From JSDoc to Markdown

Input (TypeScript source):

/**
 * Submit audio for transcription
 *
 * @param audio - Audio input (URL, file buffer, or stream)
 * @param options - Transcription options
 * @returns Normalized transcription response
 * @throws {Error} If audio type is not supported
 *
 * @example
 * const result = await adapter.transcribe({
 *   type: 'url',
 *   url: 'https://example.com/audio.mp3'
 * });
 */
async transcribe(
  audio: AudioInput,
  options?: TranscribeOptions
): Promise<UnifiedTranscriptResponse>

Output (Generated Markdown):

### transcribe()

> **transcribe**(`audio`, `options?`): `Promise<UnifiedTranscriptResponse>`

Submit audio for transcription

#### Parameters

| Parameter | Type | Description |
|-----------|------|-------------|
| `audio` | `AudioInput` | Audio input (URL, file buffer, or stream) |
| `options?` | `TranscribeOptions` | Transcription options |

#### Returns

`Promise<UnifiedTranscriptResponse>` - Normalized transcription response

Phase 6: Publishing

Pre-Publish Checklist

  1. Version bump in package.json
  2. Build SDK: pnpm build:all
  3. Run tests: pnpm test
  4. Lint code: pnpm lint
  5. Generate docs: pnpm build:docs
  6. Commit changes: Git commit
  7. Tag version: git tag v6.0.0

Publishing Process

# Automated publish (runs prepublishOnly)
pnpm publish

# Manual steps
pnpm lint:fix
pnpm build
npm publish --access public

Adding a New Provider

Step-by-Step Guide

1. Add OpenAPI Config

// orval.config.ts
newProviderApi: {
  input: {
    target: "https://api.newprovider.com/openapi.json",
  },
  output: {
    target: "./src/generated/newprovider/schema/",
    mode: "single",
    client: "axios",
    clean: true,
  },
}

2. Generate Types

pnpm openapi:generate

3. Create Adapter

import { BaseAdapter } from "./base-adapter"
import type { Transcript } from "../generated/newprovider/schema/transcript"

export class NewProviderAdapter extends BaseAdapter {
  readonly name = "newprovider" as const
  readonly capabilities = { /* ... */ }

  async transcribe(audio, options) {
    // Implementation
  }

  async getTranscript(id) {
    // Implementation
  }
}

4. Update Types

// src/router/types.ts
export type TranscriptionProvider =
  | "gladia"
  | "assemblyai"
  | "newprovider"

5. Export Adapter

// src/adapters/index.ts
export * from "./newprovider-adapter"

6. Test

pnpm build
pnpm test

Development Commands

# Start development
pnpm dev                 # Watch mode for fast iteration

# Generate types after OpenAPI changes
pnpm openapi:generate

# Build everything
pnpm build:all           # Types + Bundle + Docs

# Quick build (no type generation)
pnpm build               # Bundle + Docs

# Test
pnpm test                # Run all tests

# Lint
pnpm lint                # Check code
pnpm lint:fix            # Auto-fix issues

# Clean slate
pnpm clean && pnpm build:all

Directory Structure

voice-router-sdk/
├── src/
│   ├── adapters/                   # Provider implementations
│   │   ├── base-adapter.ts
│   │   ├── gladia-adapter.ts
│   │   ├── assemblyai-adapter.ts
│   │   └── index.ts
│   ├── generated/                  # Auto-generated types
│   │   ├── gladia/schema/          # 260 TypeScript files
│   │   └── assemblyai/schema/      # 161 TypeScript files
│   ├── router/
│   │   ├── voice-router.ts         # Main router class
│   │   └── types.ts                # Unified types
│   └── index.ts                    # Main entry point
├── dist/                           # Compiled output
├── docs/                           # Documentation
├── orval.config.ts                 # Type generation config
├── tsup.config.ts                  # Build config
└── typedoc.*.config.mjs            # Documentation configs

Type Safety Chain

OpenAPI Spec (Provider)
    ↓ orval
Generated Types (Provider-specific)
    ↓ adapter imports
Adapter Implementation (Type-safe)
    ↓ normalizer
Unified Types (Provider-agnostic)
    ↓ router
User Code (Type-safe SDK usage)

Summary

The Voice Router SDK generation workflow:

  1. OpenAPI Fetching: Download provider API specs via HTTP
  2. Type Generation: Convert OpenAPI → TypeScript with Orval
  3. Adapter Implementation: Write provider-specific code using generated types
  4. SDK Compilation: Bundle TypeScript → JavaScript/ESM/CJS with tsup
  5. Documentation: Generate Markdown from JSDoc with TypeDoc
  6. Publishing: Package and publish to npm

Everything runs automatically with:

pnpm build:all

This single command generates types, compiles TypeScript, bundles for distribution, and generates documentation—producing a production-ready, type-safe, multi-provider SDK.

On this page