Home › Forums › AWS › AWS Certified Machine Learning – Specialty › Incorrect answers?
-
Incorrect answers?
JR-TutorialsDojo updated 6 months, 4 weeks ago
2 Members
·
2
Posts
-
Hello, I would like to confirm the answer of the following question. I tthink the answer is B, but given answer is C
A Data Engineer is designing a solution for customer data analysis using Amazon Athena. An on-premises application produces the data as CSV files in near real-time. The Engineer needs to convert the data to Apache Parquet format before saving it on an Amazon S3 bucket.
Which method provides the LEAST configuration overhead?
A. Configure an Amazon EMR cluster with Apache Spark Structured Streaming to consume and transform the customer data into Apache Parquet.
B. Use Amazon Kinesis Data Streams to ingest customer data and configure a Firehose stream as a consumer to convert the data into Apache Parquet.
C. Use Amazon Kinesis Data Streams to consume customer data. Create a streaming ETL job in AWS Glue to convert data into Apache Parquet.
D. Configure an Amazon EC2 instance with Apache Kafka to consume the customer data. Export the data to the S3 bucket in Parquet format with Kafka Connect S3 sink connector.
-
Hello Ziwei Gao,
Thank you for your feedback.
Could you please elaborate on why Option B is considered correct?
From the explanation provided:
“The option that says: Use Amazon Kinesis Data Streams to ingest customer data and configure a Firehose stream as a consumer to convert the data into Apache Parquet is incorrect. Although this could be a valid solution, it typically entails more development effort as Data Firehose does not support converting CSV files directly into Apache Parquet, unlike JSON.”
This seems to suggest that Option B is not ideal due to format conversion limitations and added development overhead. I’d appreciate your insights to better understand the rationale.
Best regards
JR @ Tutorials Dojo
Log in to reply.