Get Started with Media Stream

Introduction

Stream real-time audio from phone calls.

Media Stream

Voxology’s Media Stream service enables you to easily stream real-time audio from phone calls to a user-defined endpoint via a WebSocket. Media Stream is supported on phone calls placed or received on Voxology’s platform, regardless of whether you are using the Programmable Voice API or SIP Trunks. In addition to the audio from the call, the WebSocket Messages will also include metadata such as phone numbers, call direction, call_ids, and any user defined stream_data.

Now that automated transcription services are reliably transcribing speech to text in real time, (Google Speech-to-text, Amazon Transcribe, Microsoft Azure), machine learning can be more easily applied to develop communications-related artificial intelligence (AI). Communication that occurs over the phone can now be transcribed and analyzed to help businesses with compliance, understand how to better serve their customers, and create insights into the effectiveness of their communications.

Stuff you can build

  • Real-time Call Transcription // Stream phone call audio to a real-time speech-to-text transcription service.
  • Real-time Call Analytics // Stream phone call audio to a machine learning platform to provide predictive analytics.
  • Speech-based Emotion Recognition or Detection // Stream phone call audio to an AI platform to recognize emotion through speech.
  • Compliance-based Call Monitoring // Stream phone call audio to an AI engine to analyze speech for compliance purposes.
  • AI-enabled Speech Coaching // Stream phone call audio to an AI engine to provide real-time coaching to agents.
  • Call Recording Storage // Stream phone call audio to your own call recording storage service.

How it Works

Set up Media Stream on your inbound or outbound SIP Trunk, or add the STREAM Action to an existing call flow using the Programmable Voice API. When a call is placed or received, Voxology will send a series of WebSocket Messages to the endpoint you designate that will include the media in chunks as well as other information about the phone call.

Example Websocket Message
{
  "stream_id":117179174,
  "type":"live_track",
  "stream_sequence_number":30,
  "track_sequence_number":28,
  "media":"/8fnz+/nv8////fn67/P8+fn78fn19/nv8//2+fv1+fn39/P/+fXx+fX58/n9+ff/+/37//v39//7...",
  "call_id":"326065@3423993409",
  "name":"far"
}
SIP Trunk Configuration

Media Stream can be configured on an Inbound (Origination) or Outbound (Termination) SIP Trunk in the Portal. Once you designate a URL to where Voxology can send WebSocket Messages, add it to the SIP Trunk settings, and each call that is made or received on the Trunk will initiate a sequence of media stream messages.

See Getting Started Guide for Media Stream SIP Trunks.

Programmable Voice Call Flows API

Media Stream can be initiated via the STREAM action in a Call Flow via the Programmable Voice API. The STREAM action will initiate a sequence of WebSocket Messages to the URL that you designate.

See Getting Started Guide for Media Stream Programmable Voice API.

WebSocket Message Sequence

When a stream is initiated via a phone call on a SIP Trunk or on the Programmable Voice API, Voxology will send a sequence of WebSocket Messages to the URL you designate. The messages will include media chunks for each track, as well as other information about the call. Below is an example sequence of eight (8) WebSocket Messages.

See our API Reference for complete documentation.

1. Example Init Stream WebSocket Message
{
  "stream_id":117179172,
  "type":"init_stream",
  "stream_sequence_number":1,
  "call_id":"326065@3423993407",
  "app_id":"421f7113-df23-4fe6-ba05-6bb943bf9ce3",
  "subaccount_id":67891,
  "direction":"outbound",
  "stream_name":"stream1",
  "stream_data":null,
  "tracks":[
    {
      "name":"caller"
    },
    {
      "name":"transfer"
    }
  ],
  "service":"programmable_voice"
}
2. Example Start Track WebSocket Message // First Leg (caller)
{
  "stream_id":117179172,
  "type":"start_track",
  "stream_sequence_number":2,
  "call_id":"326065@3423993407",
  "api_no":"+19495551212",
  "caller_no":"+17145551212",
  "track_sequence_number":1,
  "media_format":{
    "encoding":"ulaw",
    "type":"text",
    "sample_rate":8000
  },
  "name":"caller"
}
3. Example Start Track WebSocket Message // Second Leg (transfer)
{
  "stream_id":117179172,
  "type":"start_track",
  "stream_sequence_number":3,
  "call_id":"326065@3423993407",
  "api_no":"+19495551212",
  "caller_no":"+17145551212",
  "track_sequence_number":1,
  "media_format":{
    "encoding":"ulaw",
    "type":"text",
    "sample_rate":8000
  },
  "name":"transfer"
}
4. Example Live Track WebSocket Message // First Leg (caller)
{
  "stream_id":117179172,
  "type":"live_track",
  "stream_sequence_number":27,
  "track_sequence_number":26,
  "media":"/v5+fv19/v3//v7+fn59fn7+/v9+ff97e378fn7+//59/f7+fv7///7+//5+/37/fn19fX5+/n3/fv...",
  "call_id":"326065@3423993407",
  "name":"caller"
}
5. Example Live Track WebSocket Message // Second Leg (transfer)
{
  "stream_id":117179172,
  "type":"live_track",
  "stream_sequence_number":28,
  "track_sequence_number":26,
  "media":"/9fnz+/nv9////fn79/P7+fn78fn19/nv8//3+fv1+fn39/P/+fXx+fX58/n7+ff/+/37//v39//7...",
  "call_id":"326065@3423993407",
  "name":"transfer"
}
6. Example Stop Track WebSocket Message // First Leg (caller)
{
  "stream_id":117179172,
  "type":"stop_track",
  "stream_sequence_number":31,
  "call_id":"326065@3423993407",
  "api_no":"+19495551212",
  "caller_no":"+17145551212",
  "track_sequence_number":30,
  "duration":4928,
  "start_time":"2020-12-02T17:39:32.588Z",
  "end_time":"2020-12-02T17:39:37.516Z",
  "name":"caller"
}
7. Example Stop Track WebSocket Message // Second Leg (transfer)
{
  "stream_id":117179172,
  "type":"stop_track",
  "stream_sequence_number":32,
  "call_id":"326065@3423993407",
  "api_no":"+19495551212",
  "caller_no":"+17145551212",
  "track_sequence_number":30,
  "duration":4920,
  "start_time":"2020-12-02T17:39:32.588Z",
  "end_time":"2020-12-02T17:39:37.508Z",
  "name":"transfer"
}
8. Example End Stream WebSocket Message
{
  "stream_id":117179172,
  "type":"end_stream",
  "stream_sequence_number":33,
  "call_id":"326065@3423993407",
  "app_id":"421f7113-df23-4fe6-ba05-6bb943bf9ce3",
  "subaccount_id":67891,
  "service":"programmable_voice",
  "stream_name":"stream1",
  "tracks":[
    {
      "name":"caller",
      "start_time":"2020-12-02T17:39:32.588Z",
      "end_time":"2020-12-02T17:39:37.508Z",
      "duration":4920
    },
    {
      "name":"transfer",
      "start_time":"2020-12-02T17:39:32.588Z",
      "end_time":"2020-12-02T17:39:37.516Z",
      "duration":4928
    }
  ],
  "error_code":null,
  "error":null,
  "stream_data":null
}

Getting Started

Media Stream SIP Trunking

This tutorial is a step-by-step guide to stream media from a phone call using a Voxology SIP Trunk.

Prerequisites

Tutorial

1. Set Up A Service To Receive WebSocket Messages

Build and configure a server that can receive and parse WebSocket Messages. The media will be delivered in Base64 encoded audio chunks, wrapped in JSON messages.

2. Configure Media Stream Settings On SIP Trunk

To configure SIP Trunk Media Stream settings, go the Manage SIP Trunks page in the Portal, select the desired SIP Trunk, click on the Media Stream tab, and add the URL and desired track information: near end, far end, or both, and press Save.

3. Make Or Receive A Call

Simply place a call to a Voxology number (if you have an inbound trunk) or make a call via the SIP Trunk and a series of WebSocket Messages will be sent to the URL you designate.

Media Stream Programmable Voice API

This tutorial is a step-by-step guide to stream media from a phone call using Voxology’s Programmable Voice API.

Prerequisites

Tutorial

1. Set Up A Service To Receive WebSocket Messages

Build and configure a server that can receive and parse WebSocket Messages. The media will be delivered in Base64 encoded audio chunks, wrapped in JSON messages.

2. Add STREAM Action To A Call Flow

Include a STREAM Action at the beginning of an existing Call Flow.

Example Call Flow Script
{
  "actions":[
    {
      "type":"STREAM",
      "params":{
        "url":"ws://test-u.rl",
        "action":"start",
        "tracks":[
          "caller",
          "transfer"
        ],
        "stream_data":{
          "test_key":"test_value"
        }
      }
    },
    {
      "type":"SAY",
      "params":{
        "text":"This is a call flow to test the media streaming action."
      }
    },
    {
      "type":"PAUSE",
      "params":{
        "seconds":10
      }
    },
    {
      "type":"STREAM",
      "params":{
        "url":"ws://test-u.rl",
        "action":"stop",
        "tracks":[
          "caller",
          "transfer"
        ],
        "stream_data":{
          "your_key":"your_value"
        }
      }
    },
    {
      "type":"SAY",
      "params":{
        "text":"The stream has ended"
      }
    },
    {
      "type":"HANGUP"
    }
  ]
}

3. Make Or Receive A Call

Simply make a call from or receive a call to your Voxology number and the media from the call will be streamed to the URL you designate.