Guided Lab: Converting Text to Speech with Amazon Polly
Description
Amazon Polly is a cloud-based service that converts text into lifelike speech using advanced deep learning technologies. Polly provides various natural-sounding voices, which can be customized to read text in multiple languages and with different speaking styles.
In this guided lab, you will learn how to use Amazon Polly to automate the text-to-speech conversion. The text input will be provided through an S3 bucket, and the resulting audio file will be stored in another S3 bucket.
Prerequisites
This lab assumes you have a basic understanding of Amazon S3 and AWS Lambda Services.
If you find any gaps in your knowledge, consider taking the following lab:
- Creating an Amazon S3 bucket
- Creating an AWS Lambda function
- Automated File Processing with S3 Event Notifications and Lambda function
Objectives
In this lab, you will learn how to:
- Explore Amazon Polly’s text-to-speech conversion capabilities.
- Set up an S3 bucket for storing text files and audio outputs.
- Create and deploy a Lambda function to automate the text-to-speech conversion process.
- Verify and play the generated audio file from the S3 bucket.
Lab Steps
Explore Amazon Polly
1. Navigate to the Amazon Polly service.
2. Click on Try Polly
3. Test Amazon Polly:
- Select Standard for the Engine. Take your time to read the description of each Engine.
Take note that we will only be using Standard Engine for this lab.
- For Language, select English, US, and Ivy as the voice from the Voice list.
- Modify the sample text from the text box or leave it as is.
a. Click on “Listen” to hear the generated speech.
Create an S3 Bucket and Folders
1. Navigate to the S3 service.
2. Create a new S3 bucket:
- Create a new bucket with a unique name (e.g., my-polly-bucket-3000).
- Use default settings and click on Create bucket.
3. Create two folders:
- polly-text-input
- polly-audio-output
Create a Lambda Function
1. Navigate to the Lambda service
2. Create a new function with the following configuration
- Function name:
myLambdaFunction
Runtime:
Python 3.8
or higher- Execution role:
- Use an Existing role PlayCloud-Sandbox
- Click Create function.
3. Replace the default code with the following Python script:
import json
import boto3
# Initialize S3 and Polly clients
s3_client = boto3.client('s3')
polly_client = boto3.client('polly')
def lambda_handler(event, context):
# Extract bucket and key (file name) from the S3 event
bucket = event['Records'][0]['s3']['bucket']['name']
key = event['Records'][0]['s3']['object']['key']
# Get the text content from the uploaded file in S3
response = s3_client.get_object(Bucket=bucket, Key=key)
text = response['Body'].read().decode('utf-8')
# Convert the text to speech using Polly
polly_response = polly_client.synthesize_speech(
Engine='standard',
Text=text,
OutputFormat='mp3',
VoiceId='Joanna' # You can choose a different voice from Polly
)
# Save the audio file to a different S3 bucket
output_key = 'polly-audio-output/' + key.split('/')[-1].replace('.txt', '') + '.mp3'
s3_client.put_object(
Bucket=bucket,
Key=output_key,
Body=polly_response['AudioStream'].read()
)
return {
'statusCode': 200,
'body': json.dumps('Text to speech conversion completed successfully.')
}
Take your time to review the code:
- Boto3 clients: We initialized two clients using Boto3 to interact with S3 and Polly services.
- S3 Event Handling: The Lambda function is triggered by an S3 event when a text file is uploaded, and it fetches the file content using get_object().
- Polly Synthesize Speech: The text is sent to Polly, which generates an MP3 audio stream using the synthesize_speech() function.
- Audio Upload to S3: The resulting MP3 file is uploaded to the specified output S3 bucket folder.
4. Deploy the function.
5. Adjust the Timeout to 1 minute in the Configuration tab > General configuration > Timeout
Add S3 Trigger to Lambda
1. Go back to the S3 bucket created in the previous step
2. In the Properties tab, create an Event Notification with the following settings:
- Event name: text-upload-event
- Prefix: polly-text-input/
- Event type: Put
- Destination: Lambda Function
- Choose myLambdaFunction or paste its ARN.
- Save changes.
Test the Lambda Function
1. Upload a text file to the S3 bucket polly-text-input/ folder.
Here is a text file you can upload:
https://media.tutorialsdojo.com/public/td-pc-lab-sample-text.txt
2. Navigate to the polly-audio-output S3 bucket. Verify that an MP3 file has been created with the same name as your text file.
3. Download the MP3 file and play it to verify the text-to-speech conversion.
Congratulations! You have successfully set up an automated text-to-speech process using Amazon Polly, S3 buckets, and a Lambda function. The generated speech is now stored as an audio file and is ready for use in various applications.
This lab introduces Amazon Polly’s capabilities, providing a foundation for more advanced text-to-speech workflows. Happy learning, and enjoy exploring further with Polly!