Guided Lab: Extracting Text from Documents with Amazon Textract
Description
Amazon Textract is an AWS service that automatically extracts text, handwriting, and other data from scanned documents. It uses machine learning to read and process any document and extract data with high accuracy, such as form fields, tables, and key-value pairs.
In this guided lab, the document text extraction process will be automated using Amazon Textract and an AWS Lambda function. The automation will be triggered when a document is uploaded to an S3 bucket, and the results will be saved back into the bucket.
Prerequisites
This lab assumes you have a basic understanding of Amazon S3 and AWS Lambda Services.
If you find any gaps in your knowledge, consider taking the following lab:
- Creating an Amazon S3 bucket
- Creating an AWS Lambda function
- Automated File Processing with S3 Event Notifications and Lambda function
Objectives
In this lab, you will learn how to:
- Explore Amazon Textract’s capabilities in document text extraction.
- Set up an S3 bucket with folders for document uploads and extraction outputs.
- Create and deploy a Lambda function to automate document text extraction.
- Verify and analyze the extracted output.
Logging In to the Amazon Web Services Console
Copy the Launch URL below and paste it into a new tab. Supply the Username and Password on the login page.