EMR Cost optimization

EMR Cost optimization

Carlo-TutorialsDojo updated 2 years, 8 months ago 2 Members · 2 Posts
AWS Certified Data Analytics – Specialty
Klimok

Member
August 15, 2021 at 8:41 am

A company has moved its data transformation job to an Amazon EMR cluster with Apache Pig. The cluster uses On-Demand instances to process large datasets, and the output is critical to operations. It typically takes around 1 hour to complete the job. Even so, the company must ensure that the whole process strictly adheres to the service level agreement (SLA) of 2 hours. The company is looking for a solution that will provide cost reduction and negligibly impacts availability.

Which combination of solutions should be implemented to meet these requirements? (Select TWO.)

—-

I get your suggested right answer but… there is an EMR Best Practices video where an AWS rep explicitly tells that you need to structure your cluster with resilient nodes to deliver within SLA, but then add cheap spot task nodes to finish the cluster job faster and save $

<font face=”inherit”>So I suggest to rescope the question somehow to explicitly eliminate one of options, maybe by </font>indicating<font face=”inherit”> that the cluster is long-running</font>
Carlo-TutorialsDojo

Member
August 16, 2021 at 10:13 am

Hello Klimok,

Thanks for sharing your thoughts on this.

I had another thing in mind when I wrote this question, but as the video suggests, using an instance fleet (spot + on-demand for resiliency) for core nodes and spot instances as task nodes to help with the processing seems to be the better setup for the scenario. We’ll try to revise this question so that it better fits that design pattern/best practice.

Regards,

Carlo @ Tutorials Dojo

Viewing 1 - 2 of 2 replies

EMR Cost optimization

Klimok

Carlo-TutorialsDojo