Voice Streaming API

This API allows to programmatically initiate an outgoing voice stream call, connecting a user's mobile number to a media WebSocket for real-time interaction.

We offer two kind of voice streaming services to be integrated with bots:

Outgoing Voice Streaming

The service consists of two main components:

  1. A REST API for initiating calls and checking service health

  2. A WebSocket interface for real-time bidirectional audio streaming

Important: In this architecture, your application hosts the WebSocket server, and Alohaa our Voice Stream service hosts web socket client.

API Reference

POST https://voice-stream.alohaa.ai/v1/voice-stream/call

Parameters

Request Headers

Field
Value
Description
Mandatory

Content-type

application/json

Specifies the content type of the request

Yes

x-metro-api-key

*************************

Your API key for authentication purposes.

Yes

Request Body

Field
Data Type
Description
Mandatory

mobile_number

String

Phone number of the user to be called. Must be a valid 10 digit mobile number.

Yes

did

String

Direct Inward Dialing (DID) number to be used for placing the call. Must be a valid 10 digit DID number.

Yes

ws_url

String

WebSocket URL where the call's audio will be streamed in real-time.

Yes

webhook_details

Boolean

Webhook configuration to receive call lifecycle events. Must be a stringified JSON. url and request_type are mandatory. api_key and api_value are optional.

No

Sample Request

curl --location 'https://voice-stream.alohaa.ai/v1/voice-stream/call' \
  --header 'x-metro-api-key: API_KEY' \
  --header 'Content-Type: application/json' \
  --data '{
    "mobile_number": "77XXXXXXXX",
    "did": "8645XXXXXX",
    "ws_url": "wss://voicebot.dev.alohaa.ai/media",
    "webhook_details": {
      "url": "CALLBACK_URL"
    }
  }'

Responses

Success Response

{
  "success": true,
  "response": {
    "message": "Call setup in progress",
    "status": "initiated",
    "callId": "a47f91c2e8d44b77b65f13f9"
  }
}

Failure Response

{
    "success": false,
    "error": {
        "code": 1022,
        "reason": "Organisation does not exists"
    }
}

Please go through the diagram to understand the overall flow:

Status Codes

Code

200 OK

Request successful. Call initiation in progress.

400 Bad Request

Invalid parameters

500 Internal Server Error

Server encountered an error


WebSocket Protocol

Connection Architecture

Important: In this architecture, your application hosts the WebSocket server, and our Voice Stream service connects to it as a client.

  1. You host a WebSocket server at a publicly accessible URL

  2. You provide this URL in the wsUrl parameter when initiating a call

  3. Our Voice Stream service connects to your WebSocket server as a client

  4. Real-time bidirectional audio streaming occurs through this WebSocket connection

WebSocket Server Requirements:

  • Must be accessible via the public internet

  • Must support secure WebSocket connections (WSS)

  • Must implement the protocol defined in this document

Message Format: All messages sent and received through the WebSocket connection use JSON format, except for binary audio data which is encoded according to the audio specification.

Protocol Flow

1

You host a WebSocket server at a publicly accessible URL.

2

You initiate a call via our REST API, providing your WebSocket server URL

3

Our service connects to your WebSocket server

4

Our service registers with your server by sending a "connected" event

5

Your server confirms registration by responding with a "connected" event

6

Greeting event: the service plays the greeting audio file at the start of the call. (Optional)

7

Exchange Media (Bidirectional)

8

Connection Termination (When either party ends the call)

Events

1. Connected [Websocket Client(Alohaa application) → WebSocket Server (Customer application)]

Registers a new voice session with a unique callId and the mobile number to be dialed.

{
  "event": "connected",
  "callId": "call_123456789",
  "mobileNo": "4155551234"
}

2. Connected [WebSocket Server (Customer application) → Websocket Client(Alohaa application)]

Acknowledges the successful registration of the session.

{
  "event": "connected",
  "data": {
    "callId": "call_123456789"
  }
}

3. Greeting event (Optional)

A greeting event indicates that the customer application (Websocket Server) needs to send Websocket client (Alohaa application) a greeting audio to be played when the call is answered.

{
  "event": "greeting",
  "payload": "audioBuffer"
}

4. Media (Bidirectional)

Streams raw audio data in real-time, including a timestamp for synchronization or logging.

From our Service to your server (Voice from the phone call):

This is continuous, even when the customer is not speaking. This ensures uninterrupted data flow and silent packets must still be transmitted to maintain the temporal integrity of the session.

{
  "event": "media",
  "callId": "call_123456789",
  "payload": "<Buffer>",
  "timestamp": 1649433600000
}

From your server to our service (Voice to be transmitted to the phone call):

This is not continuous. The server accumulates the audio data and sends it as a complete voice chunk after detecting speech boundaries (e.g., via silence detection). This chunk is sent in a batched format for downstream STT → LLM → TTS processing.

{
  "event": "media",
  "callId": "call_123456789",
  "payload": {
    "type": "Buffer",
    "data": [12, 34, 56, 78]
  },
  "timestamp": 1649433600000
}
  1. Interrupt (Client Server → Alohaa Application)

Signals the client to stop sending audio — typically triggered when the system detects end of user speech.

{
  "event": "interrupt",
  "callId": "call_123456789"
}

6. Client → Server: Close WebSocket Connection

Closes the active session. Typically called once the conversation is complete.

client.close();

Audio Configuration:

The audio data sent through the WebSocket must meet these requirements for compatibility:

  • Audio data should match the format expected by the system

  • The underlying protocol uses G.711 μ-law (PCMU) codec

  • Sample rate: 8kHz

Client shall use TTS with the configurations:

{
  "audioEncoding": "MULAW",
  "sampleRateHertz": 8000,
  "pitch": 0,
  "speakingRate": 1.0
}

WebSocket Audio Data Guidelines

  • When receiving audio from our service, the RTP header is present for each packet. RTP header is 12 bytes in size. This helps to ordering the packets.

  • When sending audio to our service, you should send audio data μ-law format with .wav headers.

Connection Errors

The WebSocket connection may close with specific close codes:

Code
Interpretation

1000

Normal closure (call ended)

1001

Server going down or client navigating away

1002

Protocol error

1008

Policy violation (e.g., authentication failure)

1011

Server error

Limitations and Constraints

  • Maximum WebSocket message size: 1MB

  • Inactive connections (no messages for 5 minutes) are automatically terminated

  • All WebSocket connections must use secure WebSockets (WSS)

  • WebSocket connections without registration confirmation within 10 seconds are automatically closed


Integration Guide

Prerequisites

  • API credentials (contact support to obtain these)

  • A publicly accessible WebSocket server

  • Basic understanding of REST APIs and WebSockets

Integration Steps

1

Set up your WebSocket server to handle the protocol described above

2

Initiate a call using our REST API, providing your WebSocket server URL

3

Handle the registration when our service connects to your WebSocket server

4

Process incoming audio from the phone call

5

Send outgoing audio to be transmitted to the phone call

6

Send interrupt signals when needed

7

Handle connection closure when the call ends

Code Snippet

1

Initiate API Call

async function initiateCall() {
  try {
    const response = await fetch("https://voice-stream.alohaa.ai/v1/voice-stream/call", {
      method: "POST",
      headers: {
        "Content-Type": "application/json",
        "x-metro-api-key": "abcdXXXXapikey"
      },
      body: JSON.stringify({
        mobile_number: "775689XXXX",
        did: "8645XXXXXX",
        ws_url: "wss://voicebot.yourdomain.com/media",
        webhook_details: {
          url: "https://callback.server/XXXX"
        }
      })
    });

    const data = await response.json();
    console.log("Call response:", data);
    return data;
  } catch (error) {
    console.error("Error initiating call:", error);
    throw error;
  }
}
2

WebSocket Server Implementation

const WebSocket = require('ws');

class VoiceStreamServer {
  constructor(port) {
    this.port = port;
    this.server = new WebSocket.Server({ port: this.port });
    this.calls = new Map(); // Track active calls
    
    this.setupServerHandlers();
    console.log(`WebSocket server started on port ${this.port}`);
  }
  
  setupServerHandlers() {
    this.server.on('connection', (ws) => {
      console.log('New connection established');
      
      ws.on('message', (data) => {
        try {
          const message = JSON.parse(data);
          this.handleMessage(ws, message);
        } catch (error) {
          console.error('Error parsing message:', error);
        }
      });
      
      ws.on('close', (code, reason) => {
        console.log(`Connection closed: ${code} ${reason}`);
        this.handleConnectionClose(ws);
      });
      
      ws.on('error', (error) => {
        console.error('WebSocket error:', error);
      });
    });
  }
  
  handleMessage(ws, message) {
    switch (message.event) {
      case 'connected':
        this.handleConnected(ws, message);
        break;
        
      case 'media':
        this.handleIncomingAudio(ws, message);
        break;
        
      default:
        console.warn('Unknown event type:', message.event);
    }
  }
  
  handleConnected(ws, message) {
    const { callId, mobileNo, agent } = message;
    
    console.log(`Connection established for call: ${callId}`);
    
    // Store call information
    this.calls.set(callId, {
      ws,
      mobileNo,
      agent,
      startTime: Date.now()
    });
    
    // Confirm connection
    ws.send(JSON.stringify({
      event: 'connected'
    }));
  }
  
  handleIncomingAudio(ws, message) {
    const { callId, payload, timestamp } = message;
    
    // Process incoming audio from the phone call
    this.processAudio(callId, payload);
  }
  
  handleConnectionClose(ws) {
    // Clean up any resources associated with this connection
    for (const [callId, call] of this.calls.entries()) {
      if (call.ws === ws) {
        this.calls.delete(callId);
        console.log(`Call ${callId} removed from active calls`);
        break;
      }
    }
  }
  
  sendAudio(callId, audioData) {
    const call = this.calls.get(callId);
    
    if (!call) {
      console.warn(`Call ${callId} not found`);
      return false;
    }
    
    if (call.ws.readyState !== WebSocket.OPEN) {
      console.warn(`WebSocket for call ${callId} is not open`);
      return false;
    }
    
    // Send audio to be transmitted to the phone call
    call.ws.send(JSON.stringify({
      event: 'media',
      callId,
      payload: audioData // Binary audio data
    }));
    
    return true;
  }
  
  sendInterrupt(callId) {
    const call = this.calls.get(callId);
    
    if (!call || call.ws.readyState !== WebSocket.OPEN) {
      return false;
    }
    
    // Send interrupt signal
    call.ws.send(JSON.stringify({
      event: 'interrupt',
      callId
    }));
    
    return true;
  }
}

// Usage
const server = new VoiceStreamServer(8080);

// Pseudocode for audio processing in your WebSocket server
function processAudio(callId, audioData) {
  text = STT(audioData);
  llMResponse = LLM(text)
  audioData = TTS(llMResponse);

  if(audioData.type == "partial sentance") {
    sendInterrupt(callId);
  } else if(audioData.type == "full sentance") {
    sendAudio(callId, audioData)
  }
}

Error Handling

Implement robust error handling in your WebSocket server:

// Error handling in your WebSocket server
class ErrorHandler {
  constructor(server) {
    this.server = server;
    this.setupErrorMonitoring();
  }
  
  setupErrorMonitoring() {
    // Monitor WebSocket server errors
    this.server.on('error', (error) => {
      console.error('WebSocket server error:', error);
      this.attemptRecovery();
    });
    
    // Set up process error handling
    process.on('uncaughtException', (error) => {
      console.error('Uncaught exception:', error);
      this.logError(error);
      // Decide whether to attempt recovery or restart
    });
    
    process.on('unhandledRejection', (reason, promise) => {
      console.error('Unhandled rejection at:', promise, 'reason:', reason);
      this.logError(reason);
    });
  }
  
  logError(error) {
    // Log error to your monitoring/logging system
    // Implementation depends on your logging infrastructure
  }
  
  attemptRecovery() {
    // Implement recovery logic based on the error type
    // This might involve restarting the server or specific connections
  }
}

Incoming Voice Streaming

An Incoming Voicebot allows your system to automatically handle incoming calls through a WebSocket connection. When an incoming call hits your DID, it is routed to the configured bot, which connects to the specified WebSocket URL to handle call media and logic.

Step 1: Create a Voicebot

1

Navigate to Building Blocks → Voicebot in the left-hand menu.

2

Click Add Voicebot. Enter the following details:

Bot Name: A unique name to identify your bot (e.g., Sales Assistant Bot, Support IVR Bot).

WebSocket URL: The WebSocket endpoint where call media and events will be sent.

3

Click Save to finalize the bot.

Step 2: Assign Voicebot to a Number (DID)

After creating your bot, you need to link it to a specific DID (phone number) so that incoming calls can be routed correctly.

1

Navigate to Building Blocks → Numbers.

2

Against a DID, click Assign next to that DID.

3

A pop-up will show a list of available voicebots. Select your Voicebot from the list.

4

Click Assign to complete the assignment.

Last updated