Voice Streaming API
This API allows to programmatically initiate an outgoing voice stream call, connecting a user's mobile number to a media WebSocket for real-time interaction.
We offer two kind of voice streaming services to be integrated with bots:
Outgoing Voice Streaming
The service consists of two main components:
A REST API for initiating calls and checking service health
A WebSocket interface for real-time bidirectional audio streaming
Important: In this architecture, your application hosts the WebSocket server, and Alohaa our Voice Stream service hosts web socket client.
API Reference
POST https://voice-stream.alohaa.ai/v1/voice-stream/call
Parameters
Request Headers
Content-type
application/json
Specifies the content type of the request
Yes
x-metro-api-key
*************************
Your API key for authentication purposes.
Yes
Request Body
mobile_number
String
Phone number of the user to be called. Must be a valid 10 digit mobile number.
Yes
did
String
Direct Inward Dialing (DID) number to be used for placing the call. Must be a valid 10 digit DID number.
Yes
ws_url
String
WebSocket URL where the call's audio will be streamed in real-time.
Yes
webhook_details
Boolean
Webhook configuration to receive call lifecycle events. Must be a stringified JSON. url and request_type are mandatory. api_key and api_value are optional.
No
Sample Request
curl --location 'https://voice-stream.alohaa.ai/v1/voice-stream/call' \
--header 'x-metro-api-key: API_KEY' \
--header 'Content-Type: application/json' \
--data '{
"mobile_number": "77XXXXXXXX",
"did": "8645XXXXXX",
"ws_url": "wss://voicebot.dev.alohaa.ai/media",
"webhook_details": {
"url": "CALLBACK_URL"
}
}'Responses
Success Response
{
"success": true,
"response": {
"message": "Call setup in progress",
"status": "initiated",
"callId": "a47f91c2e8d44b77b65f13f9"
}
}Failure Response
{
"success": false,
"error": {
"code": 1022,
"reason": "Organisation does not exists"
}
}Please go through the diagram to understand the overall flow:

Status Codes
200 OK
Request successful. Call initiation in progress.
400 Bad Request
Invalid parameters
500 Internal Server Error
Server encountered an error
WebSocket Protocol
Connection Architecture
Important: In this architecture, your application hosts the WebSocket server, and our Voice Stream service connects to it as a client.
You host a WebSocket server at a publicly accessible URL
You provide this URL in the
wsUrlparameter when initiating a callOur Voice Stream service connects to your WebSocket server as a client
Real-time bidirectional audio streaming occurs through this WebSocket connection
Protocol Flow
You host a WebSocket server at a publicly accessible URL.
You initiate a call via our REST API, providing your WebSocket server URL
Our service connects to your WebSocket server
Our service registers with your server by sending a "connected" event
Your server confirms registration by responding with a "connected" event
Greeting event: the service plays the greeting audio file at the start of the call. (Optional)
Exchange Media (Bidirectional)
Connection Termination (When either party ends the call)
Events
1. Connected [Websocket Client(Alohaa application) → WebSocket Server (Customer application)]
Registers a new voice session with a unique callId and the mobile number to be dialed.
{
"event": "connected",
"callId": "call_123456789",
"mobileNo": "4155551234"
}2. Connected [WebSocket Server (Customer application) → Websocket Client(Alohaa application)]
Acknowledges the successful registration of the session.
{
"event": "connected",
"data": {
"callId": "call_123456789"
}
}3. Greeting event (Optional)
A greeting event indicates that the customer application (Websocket Server) needs to send Websocket client (Alohaa application) a greeting audio to be played when the call is answered.
{
"event": "greeting",
"payload": "audioBuffer"
}4. Media (Bidirectional)
Streams raw audio data in real-time, including a timestamp for synchronization or logging.
From our Service to your server (Voice from the phone call):
{
"event": "media",
"callId": "call_123456789",
"payload": "<Buffer>",
"timestamp": 1649433600000
}From your server to our service (Voice to be transmitted to the phone call):
{
"event": "media",
"callId": "call_123456789",
"payload": {
"type": "Buffer",
"data": [12, 34, 56, 78]
},
"timestamp": 1649433600000
}Interrupt (Client Server → Alohaa Application)
Signals the client to stop sending audio — typically triggered when the system detects end of user speech.
{
"event": "interrupt",
"callId": "call_123456789"
}6. Client → Server: Close WebSocket Connection
Closes the active session. Typically called once the conversation is complete.
client.close();
Audio Configuration:
The audio data sent through the WebSocket must meet these requirements for compatibility:
Audio data should match the format expected by the system
The underlying protocol uses G.711 μ-law (PCMU) codec
Sample rate: 8kHz
Client shall use TTS with the configurations:
{
"audioEncoding": "MULAW",
"sampleRateHertz": 8000,
"pitch": 0,
"speakingRate": 1.0
}WebSocket Audio Data Guidelines
When receiving audio from our service, the RTP header is present for each packet. RTP header is 12 bytes in size. This helps to ordering the packets.
When sending audio to our service, you should send audio data μ-law format with .wav headers.
Connection Errors
The WebSocket connection may close with specific close codes:
1000
Normal closure (call ended)
1001
Server going down or client navigating away
1002
Protocol error
1008
Policy violation (e.g., authentication failure)
1011
Server error
Limitations and Constraints
Maximum WebSocket message size: 1MB
Inactive connections (no messages for 5 minutes) are automatically terminated
All WebSocket connections must use secure WebSockets (WSS)
WebSocket connections without registration confirmation within 10 seconds are automatically closed
Integration Guide
Prerequisites
API credentials (contact support to obtain these)
A publicly accessible WebSocket server
Basic understanding of REST APIs and WebSockets
Integration Steps
Set up your WebSocket server to handle the protocol described above
Initiate a call using our REST API, providing your WebSocket server URL
Handle the registration when our service connects to your WebSocket server
Process incoming audio from the phone call
Send outgoing audio to be transmitted to the phone call
Send interrupt signals when needed
Handle connection closure when the call ends
Code Snippet
Disclaimer: The provided code is for sample/reference purposes only and should be modified to suit your specific integration requirements.
Initiate API Call
async function initiateCall() {
try {
const response = await fetch("https://voice-stream.alohaa.ai/v1/voice-stream/call", {
method: "POST",
headers: {
"Content-Type": "application/json",
"x-metro-api-key": "abcdXXXXapikey"
},
body: JSON.stringify({
mobile_number: "775689XXXX",
did: "8645XXXXXX",
ws_url: "wss://voicebot.yourdomain.com/media",
webhook_details: {
url: "https://callback.server/XXXX"
}
})
});
const data = await response.json();
console.log("Call response:", data);
return data;
} catch (error) {
console.error("Error initiating call:", error);
throw error;
}
}WebSocket Server Implementation
const WebSocket = require('ws');
class VoiceStreamServer {
constructor(port) {
this.port = port;
this.server = new WebSocket.Server({ port: this.port });
this.calls = new Map(); // Track active calls
this.setupServerHandlers();
console.log(`WebSocket server started on port ${this.port}`);
}
setupServerHandlers() {
this.server.on('connection', (ws) => {
console.log('New connection established');
ws.on('message', (data) => {
try {
const message = JSON.parse(data);
this.handleMessage(ws, message);
} catch (error) {
console.error('Error parsing message:', error);
}
});
ws.on('close', (code, reason) => {
console.log(`Connection closed: ${code} ${reason}`);
this.handleConnectionClose(ws);
});
ws.on('error', (error) => {
console.error('WebSocket error:', error);
});
});
}
handleMessage(ws, message) {
switch (message.event) {
case 'connected':
this.handleConnected(ws, message);
break;
case 'media':
this.handleIncomingAudio(ws, message);
break;
default:
console.warn('Unknown event type:', message.event);
}
}
handleConnected(ws, message) {
const { callId, mobileNo, agent } = message;
console.log(`Connection established for call: ${callId}`);
// Store call information
this.calls.set(callId, {
ws,
mobileNo,
agent,
startTime: Date.now()
});
// Confirm connection
ws.send(JSON.stringify({
event: 'connected'
}));
}
handleIncomingAudio(ws, message) {
const { callId, payload, timestamp } = message;
// Process incoming audio from the phone call
this.processAudio(callId, payload);
}
handleConnectionClose(ws) {
// Clean up any resources associated with this connection
for (const [callId, call] of this.calls.entries()) {
if (call.ws === ws) {
this.calls.delete(callId);
console.log(`Call ${callId} removed from active calls`);
break;
}
}
}
sendAudio(callId, audioData) {
const call = this.calls.get(callId);
if (!call) {
console.warn(`Call ${callId} not found`);
return false;
}
if (call.ws.readyState !== WebSocket.OPEN) {
console.warn(`WebSocket for call ${callId} is not open`);
return false;
}
// Send audio to be transmitted to the phone call
call.ws.send(JSON.stringify({
event: 'media',
callId,
payload: audioData // Binary audio data
}));
return true;
}
sendInterrupt(callId) {
const call = this.calls.get(callId);
if (!call || call.ws.readyState !== WebSocket.OPEN) {
return false;
}
// Send interrupt signal
call.ws.send(JSON.stringify({
event: 'interrupt',
callId
}));
return true;
}
}
// Usage
const server = new VoiceStreamServer(8080);
// Pseudocode for audio processing in your WebSocket server
function processAudio(callId, audioData) {
text = STT(audioData);
llMResponse = LLM(text)
audioData = TTS(llMResponse);
if(audioData.type == "partial sentance") {
sendInterrupt(callId);
} else if(audioData.type == "full sentance") {
sendAudio(callId, audioData)
}
}
Error Handling
Implement robust error handling in your WebSocket server:
// Error handling in your WebSocket server
class ErrorHandler {
constructor(server) {
this.server = server;
this.setupErrorMonitoring();
}
setupErrorMonitoring() {
// Monitor WebSocket server errors
this.server.on('error', (error) => {
console.error('WebSocket server error:', error);
this.attemptRecovery();
});
// Set up process error handling
process.on('uncaughtException', (error) => {
console.error('Uncaught exception:', error);
this.logError(error);
// Decide whether to attempt recovery or restart
});
process.on('unhandledRejection', (reason, promise) => {
console.error('Unhandled rejection at:', promise, 'reason:', reason);
this.logError(reason);
});
}
logError(error) {
// Log error to your monitoring/logging system
// Implementation depends on your logging infrastructure
}
attemptRecovery() {
// Implement recovery logic based on the error type
// This might involve restarting the server or specific connections
}
}
Incoming Voice Streaming
An Incoming Voicebot allows your system to automatically handle incoming calls through a WebSocket connection. When an incoming call hits your DID, it is routed to the configured bot, which connects to the specified WebSocket URL to handle call media and logic.
Step 1: Create a Voicebot
Navigate to Building Blocks → Voicebot in the left-hand menu.

Click Add Voicebot. Enter the following details:
Bot Name: A unique name to identify your bot (e.g., Sales Assistant Bot, Support IVR Bot).
WebSocket URL: The WebSocket endpoint where call media and events will be sent.

Click Save to finalize the bot.
Step 2: Assign Voicebot to a Number (DID)
After creating your bot, you need to link it to a specific DID (phone number) so that incoming calls can be routed correctly.
Navigate to Building Blocks → Numbers.

Against a DID, click Assign next to that DID.
A pop-up will show a list of available voicebots. Select your Voicebot from the list.

Click Assign to complete the assignment.
Last updated