Guided Lab: Querying Data with Amazon Athena and AWS Glue Crawler Integration


Data analytics has become an indispensable part of business strategy and decision-making. Amazon Web Services (AWS) provides a suite of scalable and flexible services designed for data analytics. Among these services, Amazon S3, Athena, and Glue (for data cataloging and data crawling) stand out for their ability to store massive datasets, query data directly in place, and organize data across various data stores efficiently.

Overview of Steps:

  1. Setting Up Amazon S3 Bucket: Your data needs a place to reside. Amazon S3 serves as the foundation, providing a secure, scalable, and durable storage solution. Here, you’ll store the raw data files that Athena will query.
  2. Creating a Database in AWS Glue Data Catalog: Think of the database as a container or namespace within which you’ll organize your data. It doesn’t store data itself but acts as a logical grouping mechanism for your tables, which represent different datasets or aspects of your data.
  3. Adding Tables to the Database: Tables define the schema or structure of your data (such as columns and data types) and point to the actual data stored in S3. This step is crucial because it tells Athena how to interpret the raw data during queries. You can create tables manually by defining the schema or automatically using crawlers that scan your data in S3 and infer the schema. In this lab, we will create tables using Glue Crawler.
  4. Querying Data with Amazon Athena: With your data in S3, a database to organize your tables, and tables to define your data schema, you’re now ready to use Athena to run SQL queries directly against your data. Athena’s serverless nature means you don’t manage any infrastructure, focusing solely on analyzing your data.


This lab assumes you have experience creating an Amazon S3 bucket and are familiar with its basic components.

If you find any gaps in your knowledge, consider taking the following labs:

  • Creating an Amazon S3 bucket.


In this lab, you will:

  • Learn how to query data directly from S3 using Amazon Athena.
  • Use AWS Glue to create a data catalog (database and tables) for organizing data from Amazon S3.