Ends in

$2 OFF ALL AWS, Azure, Google Cloud & Kubernetes Practice Exams!

Find answers, ask questions, and connect with our
community around the world.

Home Forums AWS AWS Certified Data Analytics – Specialty Quiz – Ambigous questions for your review

  • Quiz – Ambigous questions for your review

  • PGC

    April 2, 2022 at 5:56 am

    Hello Comment on three qns due to potential ambiguity:

    Q1 A company wants to analyze the data inside a GZIP-compressed comma-separated values (CSV) file that they generate every month. The file is 150 MB in size with 25,000 data records and is currently archived in Amazon S3 Glacier. The data analytics team needs to query a subset of the data and extract the first ten columns for the records that match a specific condition.

    >>>Here issue is S3 select vs Athena. But S3 select can ONLY read in a given object. But if the record has to match a condition, it can be quickly selected by Athene. Kindly review your answer.

    Q2 – A company uses Amazon Athena Workgroups to isolate query executions among its Data Analytics team. The company wants to limit the amount of data that can be scanned from queries made within each workgroup to enforce cost controls. Any queries that exceed the recommended threshold must be canceled immediately.

    >> Ambiguity here: You use plural; The per-query is for single long running query. So the above gives impression you want to track all the querys in a workgroup exceeding a byte count. Pl check wording

    Q3 – A recruitment agency scans millions of physical application forms as JPEG files and stores them in an Amazon S3 bucket. The forms contain different information, such as the applicant’s full name, home address, phone number, job position, and relevant skill sets. The agency uses Amazon Textract to extract the metadata values from the scanned forms. The agency wants to build a solution that will enable its data analysts to drive insights by analyzing and screening applications based on the extracted text information. The JPEG files should also be downloadable. The agency prioritizes query performance over cost reduction.

    >>> Here you only say “Insights” not visual insight which is needed to rule out Redshift. Insights need not be visual as it can be tabular and Redshift excels in query speed (but not visualization). Pl check your wording here.

    Noticed a few more ambiguities but just pointing it out

  • Carlo-TutorialsDojo

    April 5, 2022 at 4:55 am

    Hello PGC,

    Thanks for your feedback. Please find the answers to your questions below.


    >> S3 Select allows you to filter data using SQL expressions. Because we’re only interested in retrieving data from a few specific columns, it’s easier and more suitable to use S3 Select over Athena. And, unlike Athena, we can do all of this without setting up a database.


    >> I don’t quite agree with you here. Any =/ all. ‘Any’ could be a subset of data while ‘all’ means every available data. The scope of the per-query control limit spans each query made within a workgroup. Any queries could simply mean one or more queries running in a workgroup.


    >> Thanks for pointing out our mistake here. We will revise this item.


    Carlo @ Tutorials Dojo

  • PGC

    April 5, 2022 at 7:08 am

    Thanks Carlo. I still feel Q1 and Q2 are not of clear questions – for Q1 how do you choose the object that is the point – then S3 select can work.

    Further, for Q2 second its the English – changing Any queries (which is grammatically wrong anyway) to Any query should do. Thanks for prompt response and very nice questions.

Viewing 1 - 3 of 3 replies

The forum ‘AWS Certified Data Analytics – Specialty’ is closed to new discussions and replies.

Original Post
0 of 0 posts June 2018