Voice Streaming API
This API allows to programmatically initiate an outgoing voice stream call, connecting a user's mobile number to a media WebSocket for real-time interaction.
The service consists of two main components:
A REST API for initiating calls and checking service health
A WebSocket interface for real-time bidirectional audio streaming
Important: In this architecture, your application hosts the WebSocket server, and Alohaa our Voice Stream service hosts web socket client.
API Reference
POST
https://ari-voice-stream.alohaa.ai/v1/voice-stream/call
Parameters
Request Headers
Content-type
application/json
Specifies the content type of the request
Yes
x-metro-api-key
*************************
Your API key for authentication purposes.
Yes
Request Body
mobileNo
String
Phone number of the user to be called. Must be a valid 10 digit mobile number.
Yes
did
String
Direct Inward Dialing (DID) number to be used for placing the call. Must be a valid 10 digit DID number.
Yes
wsUrl
String
WebSocket URL where the call's audio will be streamed in real-time.
Yes
greetingType
Boolean
Must be "audio"
. Indicates the type of greeting. Currently only audio files are supported.
Yes
greetingContent
String
Public URL of the audio file that will be played as a greeting at the start of the call.
Yes
webhook_details
Boolean
Webhook configuration to receive call lifecycle events. Must be a stringified JSON. url
and request_type
are mandatory. api_key
and api_value
are optional.
No
Sample Request
{
x-metro-api-key: ****************,
}
Responses
Success Response
{
"success": true,
"response": {
"callId": "6829acf1da1fe68100b6XXXX"
}
}
Failure Response
{
"success": false,
"error": {
"code": 1022,
"reason": "Organisation does not exists"
}
}
Please go through the diagram to understand the overall flow:

Status Codes
200 OK
Request successful. Call initiation in progress.
400 Bad Request
Invalid parameters
500 Internal Server Error
Server encountered an error
WebSocket Protocol
Connection Architecture
Important: In this architecture, your application hosts the WebSocket server, and our Voice Stream service connects to it as a client.
You host a WebSocket server at a publicly accessible URL
You provide this URL in the
wsUrl
parameter when initiating a callOur Voice Stream service connects to your WebSocket server as a client
Real-time bidirectional audio streaming occurs through this WebSocket connection
Protocol Flow
You host a WebSocket server at a publicly accessible URL.
You initiate a call via our REST API, providing your WebSocket server URL
Our service connects to your WebSocket server
Our service registers with your server by sending a register event
Your server confirms registration by responding with a register.success event
Exchange Media (Bidirectional)
Connection Termination (When either party ends the call)
Events
1. Register (Our Service → Your Server)
Registers a new voice session with a unique callId
and the mobile number to be dialed.
{
"event": "register",
"callId": "call_123456789",
"mobileNo": "4155551234"
}
2. Register Success (Your Server → Our Service)
Acknowledges the successful registration of the session.
{
"event": "register.success",
"data": {
"callId": "call_123456789"
}
}
3. Media (Bidirectional)
Streams raw audio data in real-time, including a timestamp for synchronization or logging.
From our Service to your server (Voice from the phone call):
{
"event": "media",
"callId": "call_123456789",
"payload": "<Buffer>",
"timestamp": 1649433600000
}
From your server to our service (Voice to be transmitted to the phone call):
{
"event": "media",
"callId": "call_123456789",
"payload": {
"type": "Buffer",
"data": [12, 34, 56, 78]
},
"timestamp": 1649433600000
}
Interrupt (Your Server → Our Service)
Signals the client to stop sending audio — typically triggered when the system detects end of user speech.
{
"event": "interrupt",
"callId": "call_123456789"
}
5. Client → Server: Close WebSocket Connection
Closes the active session. Typically called once the conversation is complete.
client.close();

Audio Configuration:
The audio data sent through the WebSocket must meet these requirements for compatibility:
Audio data should match the format expected by the system
The underlying protocol uses G.711 μ-law (PCMU) codec
Sample rate: 8kHz
Client shall use TTS with the configurations:
{
"audioEncoding": "MULAW",
"sampleRateHertz": 8000,
"pitch": 0,
"speakingRate": 1.0
}
WebSocket Audio Data Guidelines
When receiving audio from our service, the RTP header is present for each packet. RTP header is 12 bytes in size. This helps to ordering the packets.
When sending audio to our service, you should send audio data μ-law format with .wav headers.
Connection Errors
The WebSocket connection may close with specific close codes:
1000
Normal closure (call ended)
1001
Server going down or client navigating away
1002
Protocol error
1008
Policy violation (e.g., authentication failure)
1011
Server error
Limitations and Constraints
Maximum WebSocket message size: 1MB
Inactive connections (no messages for 5 minutes) are automatically terminated
All WebSocket connections must use secure WebSockets (WSS)
WebSocket connections without registration confirmation within 10 seconds are automatically closed
Integration Guide
Prerequisites
API credentials (contact support to obtain these)
A publicly accessible WebSocket server
Basic understanding of REST APIs and WebSockets
Integration Steps
Set up your WebSocket server to handle the protocol described above
Initiate a call using our REST API, providing your WebSocket server URL
Handle the registration when our service connects to your WebSocket server
Process incoming audio from the phone call
Send outgoing audio to be transmitted to the phone call
Send interrupt signals when needed
Handle connection closure when the call ends
Code Snippet
Disclaimer: The provided code is for sample/reference purposes only and should be modified to suit your specific integration requirements.
Initiate API Call
// Sample code to initiate a call
async function initiateCall(mobileNo, did, wsUrl, greetingContent, agent) {
try {
const response = await fetch('
https://ari-voice-stream.alohaa.ai/v1/voice-stream/call
', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': 'x-metro-api-key'
},
body: JSON.stringify({
mobileNo,
did,
wsUrl, // URL of your WebSocket server
greetingType,
greetingContent,
})
});
const data = await response.json();
return data; // Contains call status information
} catch (error) {
// Handle error
console.error('Error initiating call:', error);
}
}
WebSocket Server Implementation
// Sample code for implementing your WebSocket server
const WebSocket = require('ws');
class VoiceStreamServer {
constructor(port) {
this.port = port;
this.server = new WebSocket.Server({ port: this.port });
this.calls = new Map(); // Track active calls
this.setupServerHandlers();
console.log(`WebSocket server started on port ${this.port}`);
}
setupServerHandlers() {
this.server.on('connection', (ws) => {
console.log('New connection established');
ws.on('message', (data) => {
try {
const message = JSON.parse(data);
this.handleMessage(ws, message);
} catch (error) {
console.error('Error parsing message:', error);
}
});
ws.on('close', (code, reason) => {
console.log(`Connection closed: ${code} ${reason}`);
this.handleConnectionClose(ws);
});
ws.on('error', (error) => {
console.error('WebSocket error:', error);
});
});
}
handleMessage(ws, message) {
switch (message.event) {
case 'register':
this.handleRegister(ws, message);
break;
case 'media':
this.handleIncomingAudio(ws, message);
break;
default:
console.warn('Unknown event type:', message.event);
}
}
handleRegister(ws, message) {
const { callId, mobileNo, agent } = message;
console.log(`Registration received for call: ${callId}`);
// Store call information
this.calls.set(callId, {
ws,
mobileNo,
agent,
startTime: Date.now()
});
// Confirm registration
ws.send(JSON.stringify({
event: 'register.success'
}));
}
handleIncomingAudio(ws, message) {
const { callId, payload, timestamp } = message;
// Process incoming audio from the phone call
this.processAudio(callId, payload);
}
handleConnectionClose(ws) {
// Clean up any resources associated with this connection
for (const [callId, call] of this.calls.entries()) {
if (call.ws === ws) {
this.calls.delete(callId);
console.log(`Call ${callId} removed from active calls`);
break;
}
}
}
sendAudio(callId, audioData) {
const call = this.calls.get(callId);
if (!call) {
console.warn(`Call ${callId} not found`);
return false;
}
if (call.ws.readyState !== WebSocket.OPEN) {
console.warn(`WebSocket for call ${callId} is not open`);
return false;
}
// Send audio to be transmitted to the phone call
call.ws.send(JSON.stringify({
event: 'media',
callId,
payload: audioData // Binary audio data
}));
return true;
}
sendInterrupt(callId) {
const call = this.calls.get(callId);
if (!call || call.ws.readyState !== WebSocket.OPEN) {
return false;
}
// Send interrupt signal
call.ws.send(JSON.stringify({
event: 'interrupt',
callId
}));
return true;
}
}
// Usage
const server = new VoiceStreamServer(8080);
// Pseudocode for audio processing in your WebSocket server
function processAudio(callId, audioData) {
text = STT(audioData);
llMResponse = LLM(text)
audioData = TTS(llMResponse);
if(audioData.type == "partial sentance") {
sendInterrupt(callId);
} else if(audioData.type == "full sentance") {
sendAudio(callId, audioData)
}
}
Error Handling
Implement robust error handling in your WebSocket server:
// Error handling in your WebSocket server
class ErrorHandler {
constructor(server) {
this.server = server;
this.setupErrorMonitoring();
}
setupErrorMonitoring() {
// Monitor WebSocket server errors
this.server.on('error', (error) => {
console.error('WebSocket server error:', error);
this.attemptRecovery();
});
// Set up process error handling
process.on('uncaughtException', (error) => {
console.error('Uncaught exception:', error);
this.logError(error);
// Decide whether to attempt recovery or restart
});
process.on('unhandledRejection', (reason, promise) => {
console.error('Unhandled rejection at:', promise, 'reason:', reason);
this.logError(reason);
});
}
logError(error) {
// Log error to your monitoring/logging system
// Implementation depends on your logging infrastructure
}
attemptRecovery() {
// Implement recovery logic based on the error type
// This might involve restarting the server or specific connections
}
}
Last updated