Choosing Dynamo db over S3 – Question in time mode set-4 - AWS Certified Solutions Architect Professional

Choosing Dynamo db over S3 – Question in time mode set-4

Kenneth-Samonte-Tutorials-Dojo updated 2 years, 8 months ago 2 Members · 2 Posts
AWS Certified Solutions Architect Professional
meera-k

Member
August 17, 2021 at 10:39 pm

A data analytics company has recently adopted a hybrid cloud infrastructure with AWS. They are in the business of collecting and processing vast amounts of data. Each data set generates up to several thousands of files which can range from 10 MB to 1 GB in size. The archived data is rarely restored and in case there is a request to retrieve it, the company has a maximum of 24 hours to send the files. The data sets can be searched using its file ID, set name, authors, tags, and other criteria.

Which of the following options provides the most cost-effective architecture to meet the above requirements?

The answer to this question is provided as “For each completed data set, compress and concatenate all of the files into a single Glacier archive. 2. Store the associated archive ID for the compressed files along with other search metadata in a DynamoDB table. 3. For retrieving the data, query the DynamoDB table for files that match the search criteria and then restore the files from the retrieved archive ID.”

Isn’t dynamo db priced more than S3? what is the math behind to conclude that dynamo db + Glacier is the better option when compared to “Store individually compressed files to an S3 bucket. Also, store the search metadata and the S3 object key of the files in a separate S3 bucket. 2. Create a lifecycle rule to move the data from an S3 Standard class to Glacier after a certain month. 3. For retrieving the data, query the S3 bucket for files matching the search criteria and then retrieve the file from the other S3 bucket”
Kenneth-Samonte-Tutorials-Dojo

Member
August 18, 2021 at 9:38 pm

Hello meera-k,

Thank you for your feedback.

The question would like to know the “most cost-effective architecture to meet the above requirement”. So let’s compare the two options.

1. Store individual compressed files to an S3 bucket. Also store the search metadata and the S3 object key of the files in a separate S3 bucket.

2. Create a lifecycle rule to move the data from an S3 Standard class to Glacier after a certain a month.

3. For retrieving the data, query the S3 bucket for files matching the search criteria and then retrieve the file from the other S3 bucket.

> If your selected this, let’s compare this to the cost of the correct answer. If you store files on S3 and store object keys file on another S3 bucket, you pay for the storage you use with the S3 standard pricing. If you wait a certain time (or month) to move the objects to Glacier, you still incur a significant amount of cost while the object is in S3. Plus the metadata files stays on S3 which incur costs too.

1. For each completed data set, compress and concatenate all of the files into a single Glacier archive.

2. Store the associated archive ID for the compressed files along with other search metadata in a DynamoDB table.

3. For retrieving the data, query the DynamoDB table for files that match the search criteria and then restore the files from the retrieved archive ID.

> If you choose this, you incur a smaller cost because the files are on Glacier immediately, which is a lower cost than standard S3. You don’t have to use standard S3 because the data is rarely accessed. Storing the metadata on DynamoDB is cost effective too because you pay very little for storage and when you retrieve the metadata which is very rarely accessed, you can be on the free tier on DynamoDB. On S3, when you request an object, you pay for each request.

Also storing metadata on DynamoDB is recommended practice as it is easier to maintain compared to storing metadata on S3.

https://aws.amazon.com/blogs/big-data/building-and-maintaining-an-amazon-s3-metadata-index-without-servers/

I understand that you have chosen the most cost-effective solution that the questions ask for. However, there is a difference between the cost-effectiveness and the recommended solution that is cost-effective (from AWS perspective).

This is an AWS Exam and AWS wants you to select the recommended service that they design for particular situations. Therefore, we select DynamoDB as it is the recommended service of AWS for storing an index of your files that are in your S3 bucket.

Think of it this way: EC2 instances with MySQL installed is cheaper compared to RDS instance. However, you will never see an answer on the exam to choose EC2 with MySQL.

The cost-effective solution would always be to use the RDS because it is the service designed by AWS for this situation, it would recommend that you use it.

Hope this helps.

Let us know if you need further assistance. The Tutorials Dojo team is dedicated to helping you pass your AWS exam!

Regards,

Kenneth Samonte @ Tutorials Dojo

Viewing 1 - 2 of 2 replies

Choosing Dynamo db over S3 – Question in time mode set-4

meera-k

Kenneth-Samonte-Tutorials-Dojo