Transcribe Your Voicemails with Python, Flask, and Twilio

April 14, 2020
Written by
James Putterman
Contributor
Opinions expressed by Twilio contributors are their own

Transcribe Your Voicemails with Python, Flask, and Twilio

Voice-to-Text (VTT) or Speech Recognition is a relatively new feature of many different software systems in business today. It allows spoken words to be automatically transcribed and entered into a given system, changing raw words into data. This data can then be acted upon by the business for any number of uses: storage and analysis, automatic responses, or even having the messages transcribed and sent out via SMS like we’re going to do.

In this tutorial we’ll set up a voicemail phone line, where each incoming call is recorded and transcribed. The transcriptions will be sent by SMS to the number of your choice.

By the end of this tutorial you’ll be able to:

  • Set up a free Twilio Account
  • Set up a phone number linked to a Python application that records voicemails and sends their transcriptions by SMS
  • Start working with TwiML markup and the Twilio suite of APIs

Requirements:

Set up your Twilio account

After you set up your free Twilio account using the link in the Requirements section, you can access your Twilio Console and provision a phone number. Click the “Get a Trial Number” button and follow the prompts to choose and activate your new test number.

If you already have a Twilio account, you can use a number that you already have, or otherwise provision a new one by navigating to the “Phone Numbers” section of the Console. Click the ellipsis on the left side and then click “Phone Numbers” to change your number or create a new one:

twilio console

phone numbers menu option

Your Twilio dashboard should now show a phone number attached to your account:

twilio dashboard

Create a Python virtual environment

Now let’s create a Python virtual environment for our project. Here we’ll install the Flask framework, and the Twilio helper library.

For Linux/MacOS

$ mkdir VoiceToText
$ cd VoiceToText
$ python3 -m venv voicetotext-venv
$ source voicetotext-venv/bin/activate
(voicetotext-venv) $ pip3 install twilio flask

For Windows/PowerShell:

$ md VoiceToText
$ cd VoiceToText
$ python -m venv voicetotext-venv
$ voicetotext-venv\Scripts\activate
(voicetotext-venv) $ pip install twilio flask

Record an incoming call

Before we can get started, you’ll need your Twilio Account SID and Auth Token from the Twilio Console:

twilio account sid and auth token

You should store them securely as environment variables before you proceed. This is an important step as the code below relies on authenticating to the Twilio REST API via these credentials. Once you have the credentials stored in your TWILIO_ACCOUNT_SID and TWILIO_AUTH_TOKEN environment variables, you’re ready to get started.

The Twilio Programmable Voice service will be configured to notify our application when an incoming call is placed, and in the application will respond to the call with a greeting and then record a message. We will be using the Twilio Markup Language (TwiML), an XML-based language, to tell Twilio how to handle the incoming call. Doing so is very simple using the Flask framework. Put the following code in a file named record_incoming_voice.py:

from flask import Flask, request
from twilio.twiml.voice_response import VoiceResponse

app = Flask(__name__)


@app.route("/record", methods=["POST"])
def record():
    """ Returns TwiML which prompts the caller to record a message"""
    # Define TwiML response object
    response = VoiceResponse()

    # Make sure this is the first call to our URL and record and transcribe
    if 'RecordingUrl' not in request.form:
        # Use <Say> verb to provide an outgoing message
        response.say("Hello, please leave your message after the tone.")

        # Use <Record> verb to record incoming message and set the transcribe arguement to true
        response.record(transcribe=True)
        
        #return status message
        print(str(response))
    else:
        # Hang up the call
        print("Hanging up...")
        response.hangup()
    return str(response)


if __name__ == '__main__':
    app.run()

The above is a simple, yet complete Flask server. We import our libraries, set our application and credential variables, and define a Flask view, /record that takes a POST call. Within that view we have a function, record() that creates a TwiML <Response> object using the VoiceResponse() helper class from the Twilio library. We then use TwiML verbs say,record and hangup to control the call flow.

You’ll notice we have a conditional statement to parse the request to our view. That’s because Twilio will invoke our endpoint twice. The first invocation occurs when a call is received. At this point we return a TwiML response that plays a greeting via text-to-speech and then records a message. Twilio will invoke the endpoint a second time when the recording ends at which point we just hang up the call. We’re looking at the contents of request.form to determine if we are being invoked at the start or end of a call. The RecordingUrl field contains a URL for the recorded call, and is obviously available only on the second invocation. See the docs here for more information.

You can execute the application from within your IDE (if it is linked to a Python interpreter), and see something like this:

Flask application running in IDE

Or if you prefer you can run the Python file from the command line after activating your virtual environment:

(venv) $ python record_incoming_voice.py
 * Serving Flask app "record_incoming_voice" (lazy loading)
 * Environment: production
   WARNING: This is a development server. Do not use it in a production deployment.
   Use a production WSGI server instead.
 * Debug mode: off
 * Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)

Now open another command prompt or terminal window and activate ngrok as follows:

ngrok http 5000

If you are successful, you should see a window like below:

ngrok screenshot

We now have our application running a development Flask server on port 5000 of the local system and ngrok exposing that port to the Internet. We can quickly test that everything is working by navigating to the forwarding link that starts with https:// in a web browser:

not found error

You should get some version of the above 404/Not Found error. You can also see in the ngrok terminal window that a request came in but we didn’t have a route setup so the 404 error was returned.

ngrok screenshot with 404 error

This happens because our Flask application does not have any views associated with the root URL (our only view is mapped to the /record URL), so getting the Not Found error here is the expected result and just means that you have configured everything correctly.

Let’s tell Twilio about our endpoint. Navigate to your Twilio Console and hit the ellipsis button on the left side:

twilio console menu

Scroll to your “Phone Numbers” and select your number:

phone numbers menu option

Scroll down the page to the “Voice and Fax” section and paste the link ngrok generated under the field “A Call Comes In”. Make sure to append “/record” to the URL so the call can be routed to our endpoint. For example the full URL would look like: https://122fd757.ngrok.io/record.

configure webhook

Click “Save” to store this change.

We’re now ready to run an initial test on our app. Grab your smartphone and call your Twilio phone number. You’ll hear a robotic voice ask you to leave a message after the beep. Do so!

You should also see the call come in on both ngrok and within your application. If you get a code besides 200, you’ve got an error somewhere. Check your webhook link and utilize the Debug Console to figure out where the break is. You’ll know you’re good when you see this:

successful requests in ngrok

successful requests in flask

Retrieve message transcriptions from the Twilio REST API

Next, let’s write a function to retrieve our message from the API. You can put the code below in a file named message.py:

from twilio.rest import Client


def message():
    """ Creates a client object and retrieves the latest transcription"""
    #create a Client object 
    client = Client()

    #return the most recent transcription ID from Twilios REST API
    transcription = client.transcriptions.list(limit=1)

    #get the sid of the voicemail and store it for our retrieval call
    sid = transcription[0].sid

    #fetch the transcription and assign it
    t = client.transcriptions(sid).fetch()

    #print the last message
    print(t.transcription_text)

    return str(sid)


if __name__ == '__main__':
    message()

Here we create a Twilio client object, then call the transcription’s list method to return the most recent transcription (limit=1). Then we get the SID identifier of the transcript and pass it the transcriptions.fetch() method and print the transcription_text string.

Note that the TWILIO_ACCOUNT_SID and TWILIO_AUTH_TOKEN environment variables must be set for the Twilio client to authenticate with the service.

Calling the function, whether in your IDE or via the command line should show the transcription for the message you left:

(venv) $ python message.py
Hello this is a test message bye bye.

Disclaimer: Voice to Text, while it has come a long way in recent years, can still get parts of the message wrong depending on how clear you speak, your dialect, background noise, etc. Your mileage may vary, but it worked pretty well for me.

Putting it all together

Now we’ve got our Flask server and a message retrieval system for the transcript. Let’s put it all together and send the transcriptions of the recordings that we receive as SMS messages to ourselves.

First we need to add a few things to our initial view and then create a new one and extract our message function to it. Below you can see the updated version of our record_incoming_voice.py file:

from flask import Flask, request
from twilio.twiml.voice_response import VoiceResponse
from twilio.rest import Client

app = Flask(__name__)


@app.route("/record", methods=["POST"])
def record():
    """ Returns TwiML which prompts the caller to record a message"""
    # Define TwiML response object
    response = VoiceResponse()

    # Make sure this is the first call to our URL and record and transcribe
    if 'RecordingSid' not in request.form:
        # Use <Say> verb to provide an outgoing message
        response.say("Hello, please leave your message after the tone.")

        # Use <Record> verb to record incoming message and set the transcribe argument to true
        response.record(transcribe_callback="/message")
                        #transcribe=True)

        #return status message
        print(str(response))
    else:
        # Hang up the call
        print("Hanging up...")
        response.hangup()
    return str(response)


@app.route("/message", methods=["POST"])
def message():
    """ Creates a client object and returns the transcription text to an SMS message"""
    #create a client object and pass it our secure authentication variables
    client = Client()

    #return the most recent transcription ID from Twilios REST API
    transcription = client.transcriptions.list(limit=1)

    #get the sid of the voicemail and store it for our retrieval call
    sid = transcription[0].sid

    #fetch the transcription and assign it
    t = client.transcriptions(sid).fetch()
    
    #create a text message and send ourselves the text
    m = client.messages \
        .create(
            body = str(t.transcription_text),
            from_='<your-twilio-phone-number>',
            to='<your-personal-phone-number>'
        )
    print(t.transcription_text)
    print(m.sid)
    return str(sid)


if __name__ == "__main__":
    app.run()

There’s a few things going on so let’s break it down:

  • We’ve populated the response object’s record verb a new TwiML attribute: transcribe_callback. We also commented out transcribe because it becomes implied when your pass an argument to transcribe_callback
  • The transcribe_callback attribute is set to the newly created /message view. Now our server will receive an asynchronous POST call from Twilio’s API when the transcription is ready and we can activate the retrieval process
  • We have incorporated our message retrieval code from earlier into the server, with a tweak at the end where we create an SMS message object and pass it the retrieved text. Make sure you enter your Twilio and personal phone numbers in the from_ and to arguments of that final call to send the SMS. Use the E.164 format for all phone numbers.

Run the server once again and make sure ngrok is still running. If for any reason you need to restart ngrok, keep in mind that it will assign a different URL, so you will need to go back to the Twilio Console and update your webhook.

If everything goes right now whenever someone leaves a message on your Twilio phone line you will receive a text message with the transcription. Try it out!

sms with transcription

Success!

Wrap up and further resources

Be aware that there are legal implications to recording someone’s voice. Make sure you take that into account before entering into any Production scenario.

I hope you had as much fun making this as I did. We really only scratched the surface of what the Twilio Voice API is capable of. You can retrieve multiple transcriptions at once, write more views that take parameters and respond to incoming calls and messages, define special logic and call handling, or just gather general analytics about traffic as you send recordings out. Whatever you do, working with a robust suite of well-documented APIs enables rapid, performant application development. Head over to the docs and start building!

James Putterman is an Integration Data Architect and Full Stack Developer in Kansas City, reach out on LinkedIn to connect!