Ends in
00
days
00
hrs
00
mins
00
secs
SHOP NOW

PRE-BLACK FRIDAY SALE - GET 20% OFF ALL REVIEWERS

Guided Lab: Sentiment Analysis of Text Files with Amazon Comprehend

Description

Amazon Comprehend is a natural language processing (NLP) service that uses machine learning to uncover valuable insights from text. It can perform sentiment analysis, entity recognition, key phrase extraction, and more, allowing you to gain deeper insights into unstructured text data.

In this lab, you will automate sentiment analysis on text files using Amazon Comprehend. When a text file is uploaded to an S3 bucket, a Lambda function will be triggered to analyze the sentiment and store the results in another S3 bucket.

Prerequisites

This lab assumes you have a basic understanding of Amazon S3 and AWS Lambda Services.

If you find any gaps in your knowledge, consider taking the following lab:

Objectives

In this lab, you will learn how to:

  • Explore the capabilities of Amazon Comprehend for natural language processing.
  • Use the Amazon Comprehend API to perform sentiment analysis on text files.
  • Integrate AWS Lambda and S3 with Amazon Comprehend to automate sentiment analysis for uploaded text files.

Lab Steps

Explore Amazon Comprehend

1. Navigate to Amazon Comprehend.

2. Click on Launch Amazon Comprehend.

3. You will be redirected to the Real-time analysis dashboard.

a. Scroll down and take your time to review the Input data. There should be an existing text in the Input text Textbox.

b. Move to the Insights section. Take your time with each mini tab’s results. From the Entities tab to the Syntax tab.


In Amazon Comprehend, a confidence score is a numerical value that indicates the level of certainty or probability the service has about its analysis or classification results. A higher confidence score generally means the results are more reliable, while a lower score suggests less certainty.



    • Entities:
      • Description: Identifies and classifies entities within the text, such as people, organizations, locations, dates, and other relevant items.
      • Use Case: Useful for extracting specific details from unstructured text, such as identifying names in documents or organizations in news articles.


    • Key Phrases:
      • Description: Extract and highlight the most relevant phrases from the text that convey the main points or concepts.
      • Use Case: Helps summarize content and understand the key topics discussed in the text.


    • Language:
      • Description: Detects the language in which the text is written.
      • Use Case: Ensures proper text processing in multilingual environments by identifying the language, which can be useful for translation or further language-specific analysis.


    • PII (Personally Identifiable Information):
      • Description: Identifies and redacts or flags sensitive information related to individuals, such as names, addresses, phone numbers, and social security numbers.
      • Use Case: It is important for data privacy and compliance, ensuring that sensitive information is handled appropriately in various applications.


    • Sentiment:
      • Description: Analyze the overall sentiment of the text, categorizing it as positive, negative, neutral, or mixed.
      • Use Case: Useful for understanding customer feedback, social media posts, or reviews to gauge general sentiment and customer satisfaction.


    • Targeted Sentiment:
      • Description: Provides sentiment analysis specific to particular aspects or targets within the text, such as a product feature or service.
      • Use Case: Allows for more granular sentiment analysis related to specific topics or entities, helping businesses understand nuanced opinions about particular aspects.


    • Syntax:
      • Description: Analyzes the grammatical structure of the text, including parts of speech (nouns, verbs, adjectives) and sentence structure.
      • Use Case: Useful for deeper linguistic analysis, such as building chatbots or improving text readability and understanding.

4. Scroll up to the Input data. Enter the following sample text to analyze the Input text textbox.

“My experience with AWS has been fantastic! The services are intuitive, and the support team is very responsive. Tutorials Dojo has been a great learning resource, and I highly recommend it to anyone. If you want to contact me, my email is john.doe@example.com, and my phone number is (555) 123-4567. Looking forward to learning more and collaborating with others in this space!”

a. Click on Analyze

5. Similar to the previous steps, take your time to review the analysis results in the Insights section.

Set Up S3 Bucket

1. Navigate to the S3 service:

2. Create a new S3 bucket:

  • Use a unique name (e.g., my-comprehend-bucket-3000).
  • Use default settings and click Create bucket.

3. Create two folders within the bucket:

  • comprehend-text-input
  • comprehend-analysis-output
Create a Lambda Function

1. Navigate to the Lambda service

2. Create a new function with the following configuration

  • Function name: myLambdaFunction
  • Runtime: Python 3.8 or higher
  • Execution role:
    • Use an Existing role PlayCloud-Sandbox

  • Click Create function.

3. Replace the default code with the following Python script:

import json
import boto3
 
# Initialize S3 and Comprehend clients
s3_client = boto3.client('s3')
comprehend_client = boto3.client('comprehend')
 
def lambda_handler(event, context):
    # Extract bucket and key (file name) from the S3 event
    bucket = event['Records'][0]['s3']['bucket']['name']
    key = event['Records'][0]['s3']['object']['key']
     
    # Get the text content from the uploaded file in S3
    response = s3_client.get_object(Bucket=bucket, Key=key)
    text = response['Body'].read().decode('utf-8')
     
    # Perform sentiment analysis using Comprehend
    sentiment_response = comprehend_client.detect_sentiment(
        Text=text,
        LanguageCode='en'  # Set the language of the text
    )
     
    # Save the sentiment analysis result to the output bucket
    output_key = 'comprehend-analysis-output/' + key.split('/')[-1].replace('.txt', '') + '-sentiment.json'
     
    s3_client.put_object(
        Bucket=bucket,
        Key=output_key,
        Body=json.dumps(sentiment_response, indent=4)
    )
     
    return {
        'statusCode': 200,
        'body': json.dumps('Sentiment analysis completed successfully.')
    }

Take your time to review the code:

  • Initializing Clients: Initializes S3 and Comprehend clients using boto3 to interact with S3 and perform sentiment analysis.
  • Handling S3 Event: The Lambda function is triggered when a text file is uploaded to the S3 bucket. The bucket name and file key are extracted from the event.
  • Reading Text File: The function retrieves the content of the uploaded file from S3 using get_object() and decodes the text.
  • Sentiment Analysis: The text is analyzed for sentiment using detect_sentiment() from the Comprehend API. The sentiment result includes the overall sentiment and confidence scores.
  • Saving Results: The sentiment analysis results are saved as a JSON file in the output S3 bucket.

4. Deploy the function.

5. Adjust the Timeout to 1 minute in the Configuration tab > General configuration > Timeout

Add S3 Trigger to Lambda

1. Go back to the S3 bucket created in the previous step.

2. In the Properties tab, create an Event Notification with the following settings:

  • Event name: text-upload-event
  • Prefix: comprehend-text-input/
  • Event type: Put
  • Destination: Lambda FunctionChoose myLambdaFunction or paste its ARN.
  • Save changes.

Test the Lambda Function

1. Upload a text file to the S3 bucket comprehend-text-input folder.

Here is a text file you can upload:

https://media.tutorialsdojo.com/public/td-pc-lab-comprehend-sample-text.txt

2. Navigate to the comprehend-analysis-output S3 bucket. Verify that the sentiment analysis result has created a new JSON file.

3. Download and review the sentiment analysis result, which includes the detected sentiment and confidence scores.

Congratulations! You have successfully set up an automated sentiment analysis process using Amazon Comprehend, S3 buckets, and a Lambda function. This lab introduces Amazon Comprehend’s capabilities and sets the foundation for further text analysis workflows. Happy learning!