Ends in
00
days
00
hrs
00
mins
00
secs
SHOP NOW

đź“š eBook Sale - Get eBooks as LOW as $2.99 USD ONLY!

Guided Lab: Retrieving Data using Amazon S3 Select

Description

Amazon S3 Select is a powerful feature that enhances the capabilities of Amazon Simple Storage Service (S3) by offering efficient and selective data retrieval. S3 Select allows you to retrieve data from objects stored in your S3 buckets without downloading and processing the entire file. You can apply SQL-like queries to semi-structured data in JSON, CSV, and Parquet, enabling you to filter, transform, and aggregate data on the fly. This makes S3 Select an ideal tool for extracting valuable insights from massive datasets and improving data analytics.

S3 Select has substantial performance benefits as it minimizes data transfer and processing overhead, which reduces costs and speeds up data retrieval and analysis tasks. Whether working with log files, sensor data, or large datasets, Amazon S3 Select empowers you to access and process only the data you need efficiently, improving your data analytics and reducing the time and resources required for complex data manipulation tasks. It’s also seamlessly integrated with AWS Glue and Amazon Athena, extending its utility for a comprehensive and streamlined data analysis experience.

In this hands-on lab, you’ll explore the capabilities of Amazon S3 Select. This powerful feature efficiently retrieves, filters, and processes data from your S3 objects, making data analysis faster, cost-effective, and more precise. This practical experience will empower you to streamline your data workflows and easily extract valuable insights from your datasets.

Prerequisite

To guarantee a successful completion of this lab, you must possess prior experience creating Amazon S3 buckets and have a solid understanding of their core components. If you believe that your knowledge in this regard is lacking, we strongly advise you to consider taking the following to acquire the required proficiency:

  • Creating an Amazon S3 bucket

Objectives

In this lab, you will:

  • Execute SQL-like queries on your S3 data using S3 Select
  • Understand the benefits of using S3 Select over traditional data retrieval methods

Lab Steps

Creating an S3 Bucket

1. Create an S3 bucket using the following configurations:

  • Add a Bucket Name
    Note: Provide a unique and descriptive name for your bucket. Remember that bucket names must be globally unique across all AWS accounts. Try a different combination if you receive an error message regarding your selected name. After that, set the AWS Region to N. Virginia to place the bucket to be created there.

2. Leave other settings at their default value.

3. Click on the “Create bucket” button. You should see a green notification that your bucket was created successfully.

Uploading a file into the S3 Bucket

1. Download and save the following JSON file into your computer and upload it to your S3 bucket.

marvel_heroes.json

2. Scroll down to the page’s bottom and click Upload to initiate the file upload process. 

Retrieving Data using Amazon S3 Select

1. In the left navigation pane, choose Buckets.

2. Choose the bucket containing the object that you want to select content from, and then choose the name of the object JSON file that you’ve uploaded. Then, on the Actions drop-down, click Query with S3 Select.

3. Select JSON as the Input and Output format.

4. To extract records from the chosen object, under SQL query, run the SELECT SQL query provided by the s3.

SELECT * FROM s3object s LIMIT 5

The result should display the first 5 rows of your .csv file.

Using Aggregate Functions in Amazon S3 Select Query

Amazon S3 Select supports the following aggregate functions. You can use these aggregate functions in combination with the SELECT statement to process and analyze your data directly in Amazon S3 without having to retrieve the entire dataset.

1. Now, let’s explore the Aggregate functions by using SQL Query. The first function we will use is COUNT, which will count the total number of records of the S3 file. Use the following query to run the function:

SELECT COUNT(s.gender) AS "Your Alias" FROM s3object s

The expected result should be as follows:

2. The next function is MAX, which will get the maximum value of a given attribute (e.g. weight) and also you can combine it with the WHERE clause to make it more specific. Use the following query to run the function:

SELECT MAX(s.your_attribute) AS "Your Alias" FROM s3object s WHERE condition

The expected result should be as follows:

3. The third function is MIN, which will get the minimum value of a given attribute (e.g. height). Use the following query to run the function:

SELECT MIN(s.your_attribute) AS "Your Alias" FROM s3object s WHERE condition

The expected result should be as follows:

5. The fourth function is AVG, which will return the average number of your selected attribute. Use the following query to run the function:

SELECT AVG (s.your_attribute) AS "Your Alias" FROM s3object s

The expected result should be as follows:

5. Lastly, we will use the CAST conversion function with the AVG aggregate function, which will calculate the average Human Marvel hero weight and convert it to an INTEGER data type. Use the following query to run the function:

SELECT CAST(AVG(s.your_attribute) AS INT) AS "Your Alias" FROM s3object s WHERE condition

And the expected result should be as follows:

Skip to content