Real-time Contextual Media Streaming

Bring Your Own AI to Avaya Infinity

Real-time Contextual Media Streaming is an open WebSocket-based protocol that lets you connect your own AI services directly to Avaya Infinity. Build a WebSocket server, point Avaya Infinity at it, and your AI is live on the call — hearing, speaking, and exchanging events in real time.

No proprietary SDKs. No middleware. Just a WebSocket connection carrying bidirectional audio and structured JSON messages.

RCMS Media Flows — Single Endpoint

What You Can Do Today

Virtual Agent Integration

Deploy your own conversational AI as a first responder in your Avaya Infinity contact center. Whether you're building on OpenAI, Anthropic, or your own models, Real-time Contextual Media Streaming gives your bot a direct line into the live call.

How it works:

  1. Build your WebSocket server to the Real-time Contextual Media Streaming specification. It receives real-time audio, processes it through your AI, and sends audio and events back.

  2. Configure your endpoint in the Avaya Infinity admin console — just a WebSocket URL and credentials.

  3. Select your Virtual Agent profile in your Avaya Infinity workflow, alongside IVR, queues, and skill-based routing.

  4. Connect — when a call hits your Virtual Agent step, Avaya Infinity opens a live WebSocket connection. Audio flows both ways instantly. Your AI is on the call.

  5. Escalate — when a human is needed, your AI sends a handoff event with the full transcript, sentiment, extracted entities, and any custom context. The agent picks up with complete intelligence. No one repeats themselves.

The single WebSocket connection carries two streams simultaneously: bidirectional audio and bidirectional events. Your AI hears and speaks in real time — it's not processing a recording after the fact.


Media Flows

A single WebSocket connection between Avaya Infinity and your Real-time Contextual Media Streaming server carries media for multiple endpoints (participants) on a call. Each endpoint has two optional, independent media flows:

  • Egress (out) (Avaya Infinity → Your Server) — live audio from the endpoint, streamed to your server for processing. Used by bots, recording, ASR, and agent assist.
  • Ingress (in) (Your Server → Avaya Infinity) — audio generated by your server, played back to the endpoint. Used by bots (AI responses) and TTS.

A Virtual Agent bot typically uses both flows: it receives the customer's speech (egress) and sends AI-generated audio back (ingress). A recording service uses egress only. A TTS service uses ingress only. Your server controls which flows are active per endpoint.

RCMS Media Flows — Two Endpoints

All flows are multiplexed over the same WebSocket connection. Avaya Infinity handles codec negotiation, session multiplexing, and routing — your server simply reads and writes audio frames tagged by endpoint ID.


Coming Soon

Real-time Contextual Media Streaming is designed as an extensible protocol. The same foundation that powers Virtual Agent will support additional services:

ServiceDescriptionDirection
Agent AssistReal-time AI support for live agents — transcription, knowledge suggestions, smart replies, summarizationAvaya Infinity → Your Server
RecordingBring your own recording provider with built-in media buffering for resilienceAvaya Infinity → Your Server
Text-to-SpeechCustom TTS providers for branded voice experiencesYour Server → Avaya Infinity
Speech RecognitionCustom ASR engines optimized for your language and domainAvaya Infinity → Your Server
TranscriptionLive speech-to-text for agents, supervisors, and complianceAvaya Infinity → Your Server
TranslationReal-time translation between participants speaking different languagesBidirectional

All services use the same protocol and can run multiplexed over a single WebSocket connection.


Technical Overview

Architecture

Avaya Infinity initiates all connections and acts as the WebSocket client. Your server listens for incoming connections. No inbound firewall rules required on your side — Avaya Infinity reaches out to you.

Transport Modes

Real-time Contextual Media Streaming supports two transport modes:

ModeSignalingMediaBest For
avaya-wssWebSocket (TLS)Binary frames over WebSocketSimplicity, broad compatibility
avaya-wss-rtpWebSocket (TLS)SRTP over UDPUltra-low latency environments

Both modes use JSON for control messages and support dynamic codec negotiation.

Security

  • TLS 1.2+ encryption on all connections
  • Public CA certificates required (no self-signed)
  • JWT authentication — Avaya Infinity sends a signed token, your server validates it
  • Session isolation with per-connection security context

Media

  • Codecs: PCMU (8 kHz), PCMA (8 kHz), G.722 (16 kHz)
  • Frame size: 20ms default, configurable per session
  • Channels: Customer audio, agent audio, or both — negotiated at session setup

Protocol

  • JSON message envelope with version, type, session ID, sequence number, and timestamp
  • Built-in keep-alive, graceful shutdown, and automatic session recovery
  • Message batching for reduced setup latency
  • Support for multiplexing multiple services on one connection

Get Started

1. Read the Protocol Specification

The complete Real-time Contextual Media Streaming specification covers session lifecycle, message definitions for all services, media encoding, security requirements, error handling, and wire trace examples. It's comprehensive and structured for use with AI coding assistants — feed it to your tool of choice and start building.

Download Real-time Contextual Media Streaming Protocol Specification v1.1 (PDF)

2. Try the Sample Server

A reference implementation demonstrating a production-ready Real-time Contextual Media Streaming server for the Virtual Agent use case. Includes session management, audio streaming, event handling, and error recovery patterns.

View Sample Server on Github


Questions?

For integration support, contact your Avaya representative or visit the Avaya Infinity developer community. We're here to help you get your AI live on the call.