Voice Streaming API

This API allows to programmatically initiate an outgoing voice stream call, connecting a user's mobile number to a media WebSocket for real-time interaction.

The service consists of two main components:

  1. A REST API for initiating calls and checking service health

  2. A WebSocket interface for real-time bidirectional audio streaming

Important: In this architecture, your application hosts the WebSocket server, and Alohaa our Voice Stream service hosts web socket client.

API Reference

POST https://ari-voice-stream.alohaa.ai/v1/voice-stream/call

Parameters

Request Headers

Field
Value
Description
Mandatory

Content-type

application/json

Specifies the content type of the request

Yes

x-metro-api-key

*************************

Your API key for authentication purposes.

Yes

Request Body

Field
Data Type
Description
Mandatory

mobileNo

String

Phone number of the user to be called. Must be a valid 10 digit mobile number.

Yes

did

String

Direct Inward Dialing (DID) number to be used for placing the call. Must be a valid 10 digit DID number.

Yes

wsUrl

String

WebSocket URL where the call's audio will be streamed in real-time.

Yes

greetingType

Boolean

Must be "audio". Indicates the type of greeting. Currently only audio files are supported.

Yes

greetingContent

String

Public URL of the audio file that will be played as a greeting at the start of the call.

Yes

webhook_details

Boolean

Webhook configuration to receive call lifecycle events. Must be a stringified JSON. url and request_type are mandatory. api_key and api_value are optional.

No

Sample Request

{
    x-metro-api-key: ****************,
}

Responses

Success Response

{
  "success": true,
  "response": {
    "callId": "6829acf1da1fe68100b6XXXX"
  }
}

Failure Response

{
    "success": false,
    "error": {
        "code": 1022,
        "reason": "Organisation does not exists"
    }
}

Please go through the diagram to understand the overall flow:

Status Codes

Code

200 OK

Request successful. Call initiation in progress.

400 Bad Request

Invalid parameters

500 Internal Server Error

Server encountered an error


WebSocket Protocol

Connection Architecture

Important: In this architecture, your application hosts the WebSocket server, and our Voice Stream service connects to it as a client.

  1. You host a WebSocket server at a publicly accessible URL

  2. You provide this URL in the wsUrl parameter when initiating a call

  3. Our Voice Stream service connects to your WebSocket server as a client

  4. Real-time bidirectional audio streaming occurs through this WebSocket connection

WebSocket Server Requirements:

  • Must be accessible via the public internet

  • Must support secure WebSocket connections (WSS)

  • Must implement the protocol defined in this document

Message Format: All messages sent and received through the WebSocket connection use JSON format, except for binary audio data which is encoded according to the audio specification.

Protocol Flow

1

You host a WebSocket server at a publicly accessible URL.

2

You initiate a call via our REST API, providing your WebSocket server URL

3

Our service connects to your WebSocket server

4

Our service registers with your server by sending a register event

5

Your server confirms registration by responding with a register.success event

6

Exchange Media (Bidirectional)

7

Connection Termination (When either party ends the call)

Events

1. Register (Our Service → Your Server)

Registers a new voice session with a unique callId and the mobile number to be dialed.

{
  "event": "register",
  "callId": "call_123456789",
  "mobileNo": "4155551234"
}

2. Register Success (Your Server → Our Service)

Acknowledges the successful registration of the session.

{
  "event": "register.success",
  "data": {
    "callId": "call_123456789"
  }
}

3. Media (Bidirectional)

Streams raw audio data in real-time, including a timestamp for synchronization or logging.

From our Service to your server (Voice from the phone call):

This is continuous, even when the customer is not speaking. This ensures uninterrupted data flow and silent packets must still be transmitted to maintain the temporal integrity of the session.

{
  "event": "media",
  "callId": "call_123456789",
  "payload": "<Buffer>",
  "timestamp": 1649433600000
}

From your server to our service (Voice to be transmitted to the phone call):

This is not continuous. The server accumulates the audio data and sends it as a complete voice chunk after detecting speech boundaries (e.g., via silence detection). This chunk is sent in a batched format for downstream STT → LLM → TTS processing.

{
  "event": "media",
  "callId": "call_123456789",
  "payload": {
    "type": "Buffer",
    "data": [12, 34, 56, 78]
  },
  "timestamp": 1649433600000
}
  1. Interrupt (Your Server → Our Service)

Signals the client to stop sending audio — typically triggered when the system detects end of user speech.

{
  "event": "interrupt",
  "callId": "call_123456789"
}

5. Client → Server: Close WebSocket Connection

Closes the active session. Typically called once the conversation is complete.

client.close();

Audio Configuration:

The audio data sent through the WebSocket must meet these requirements for compatibility:

  • Audio data should match the format expected by the system

  • The underlying protocol uses G.711 μ-law (PCMU) codec

  • Sample rate: 8kHz

Client shall use TTS with the configurations:

{
  "audioEncoding": "MULAW",
  "sampleRateHertz": 8000,
  "pitch": 0,
  "speakingRate": 1.0
}

WebSocket Audio Data Guidelines

  • When receiving audio from our service, the RTP header is present for each packet. RTP header is 12 bytes in size. This helps to ordering the packets.

  • When sending audio to our service, you should send audio data μ-law format with .wav headers.

Connection Errors

The WebSocket connection may close with specific close codes:

Code
Interpretation

1000

Normal closure (call ended)

1001

Server going down or client navigating away

1002

Protocol error

1008

Policy violation (e.g., authentication failure)

1011

Server error

Limitations and Constraints

  • Maximum WebSocket message size: 1MB

  • Inactive connections (no messages for 5 minutes) are automatically terminated

  • All WebSocket connections must use secure WebSockets (WSS)

  • WebSocket connections without registration confirmation within 10 seconds are automatically closed


Integration Guide

Prerequisites

  • API credentials (contact support to obtain these)

  • A publicly accessible WebSocket server

  • Basic understanding of REST APIs and WebSockets

Integration Steps

1

Set up your WebSocket server to handle the protocol described above

2

Initiate a call using our REST API, providing your WebSocket server URL

3

Handle the registration when our service connects to your WebSocket server

4

Process incoming audio from the phone call

5

Send outgoing audio to be transmitted to the phone call

6

Send interrupt signals when needed

7

Handle connection closure when the call ends

Code Snippet

1

Initiate API Call

// Sample code to initiate a call
async function initiateCall(mobileNo, did, wsUrl, greetingContent, agent) {
  try {
    const response = await fetch('
https://ari-voice-stream.alohaa.ai/v1/voice-stream/call
 ', {
      method: 'POST',
      headers: {
        'Content-Type': 'application/json',
        'Authorization': 'x-metro-api-key'
      },
      body: JSON.stringify({
        mobileNo,
        did,
        wsUrl,         // URL of your WebSocket server
        greetingType,
        greetingContent,
      })
    });
    
    const data = await response.json();
    return data; // Contains call status information
  } catch (error) {
    // Handle error
    console.error('Error initiating call:', error);
  }
}
2

WebSocket Server Implementation

// Sample code for implementing your WebSocket server
const WebSocket = require('ws');

class VoiceStreamServer {
  constructor(port) {
    this.port = port;
    this.server = new WebSocket.Server({ port: this.port });
    this.calls = new Map(); // Track active calls
    
    this.setupServerHandlers();
    console.log(`WebSocket server started on port ${this.port}`);
  }
  
  setupServerHandlers() {
    this.server.on('connection', (ws) => {
      console.log('New connection established');
      
      ws.on('message', (data) => {
        try {
          const message = JSON.parse(data);
          this.handleMessage(ws, message);
        } catch (error) {
          console.error('Error parsing message:', error);
        }
      });
      
      ws.on('close', (code, reason) => {
        console.log(`Connection closed: ${code} ${reason}`);
        this.handleConnectionClose(ws);
      });
      
      ws.on('error', (error) => {
        console.error('WebSocket error:', error);
      });
    });
  }
  
  handleMessage(ws, message) {
    switch (message.event) {
      case 'register':
        this.handleRegister(ws, message);
        break;
        
      case 'media':
        this.handleIncomingAudio(ws, message);
        break;
        
      default:
        console.warn('Unknown event type:', message.event);
    }
  }
  
  handleRegister(ws, message) {
    const { callId, mobileNo, agent } = message;
    
    console.log(`Registration received for call: ${callId}`);
    
    // Store call information
    this.calls.set(callId, {
      ws,
      mobileNo,
      agent,
      startTime: Date.now()
    });
    
    // Confirm registration
    ws.send(JSON.stringify({
      event: 'register.success'
    }));
  }
  
  handleIncomingAudio(ws, message) {
    const { callId, payload, timestamp } = message;
    
    // Process incoming audio from the phone call
    this.processAudio(callId, payload);
  }
  
  handleConnectionClose(ws) {
    // Clean up any resources associated with this connection
    for (const [callId, call] of this.calls.entries()) {
      if (call.ws === ws) {
        this.calls.delete(callId);
        console.log(`Call ${callId} removed from active calls`);
        break;
      }
    }
  }
  
  sendAudio(callId, audioData) {
    const call = this.calls.get(callId);
    
    if (!call) {
      console.warn(`Call ${callId} not found`);
      return false;
    }
    
    if (call.ws.readyState !== WebSocket.OPEN) {
      console.warn(`WebSocket for call ${callId} is not open`);
      return false;
    }
    
    // Send audio to be transmitted to the phone call
    call.ws.send(JSON.stringify({
      event: 'media',
      callId,
      payload: audioData // Binary audio data
    }));
    
    return true;
  }
  
  sendInterrupt(callId) {
    const call = this.calls.get(callId);
    
    if (!call || call.ws.readyState !== WebSocket.OPEN) {
      return false;
    }
    
    // Send interrupt signal
    call.ws.send(JSON.stringify({
      event: 'interrupt',
      callId
    }));
    
    return true;
  }
}

// Usage
const server = new VoiceStreamServer(8080);

// Pseudocode for audio processing in your WebSocket server
function processAudio(callId, audioData) {
  text = STT(audioData);
  llMResponse = LLM(text)
  audioData = TTS(llMResponse);

  if(audioData.type == "partial sentance") {
    sendInterrupt(callId);
  } else if(audioData.type == "full sentance") {
    sendAudio(callId, audioData)
  }
}

Error Handling

Implement robust error handling in your WebSocket server:

// Error handling in your WebSocket server
class ErrorHandler {
  constructor(server) {
    this.server = server;
    this.setupErrorMonitoring();
  }
  
  setupErrorMonitoring() {
    // Monitor WebSocket server errors
    this.server.on('error', (error) => {
      console.error('WebSocket server error:', error);
      this.attemptRecovery();
    });
    
    // Set up process error handling
    process.on('uncaughtException', (error) => {
      console.error('Uncaught exception:', error);
      this.logError(error);
      // Decide whether to attempt recovery or restart
    });
    
    process.on('unhandledRejection', (reason, promise) => {
      console.error('Unhandled rejection at:', promise, 'reason:', reason);
      this.logError(reason);
    });
  }
  
  logError(error) {
    // Log error to your monitoring/logging system
    // Implementation depends on your logging infrastructure
  }
  
  attemptRecovery() {
    // Implement recovery logic based on the error type
    // This might involve restarting the server or specific connections
  }
}

Last updated