Find answers, ask questions, and connect with our
community around the world.

Home Forums AWS AWS Certified Data Analytics – Specialty EMR Cost optimization

  • Klimok

    August 15, 2021 at 8:41 am

    A company has moved its data transformation job to an Amazon EMR cluster with Apache Pig. The cluster uses On-Demand instances to process large datasets, and the output is critical to operations. It typically takes around 1 hour to complete the job. Even so, the company must ensure that the whole process strictly adheres to the service level agreement (SLA) of 2 hours. The company is looking for a solution that will provide cost reduction and negligibly impacts availability.

    Which combination of solutions should be implemented to meet these requirements? (Select TWO.)


    I get your suggested right answer but… there is an EMR Best Practices video where an AWS rep explicitly tells that you need to structure your cluster with resilient nodes to deliver within SLA, but then add cheap spot task nodes to finish the cluster job faster and save $

    <font face=”inherit”>So I suggest to rescope the question somehow to explicitly eliminate one of options, maybe by </font>indicating<font face=”inherit”> that the cluster is long-running</font>

  • Carlo-TutorialsDojo

    August 16, 2021 at 10:13 am

    Hello Klimok,

    Thanks for sharing your thoughts on this.

    I had another thing in mind when I wrote this question, but as the video suggests, using an instance fleet (spot + on-demand for resiliency) for core nodes and spot instances as task nodes to help with the processing seems to be the better setup for the scenario. We’ll try to revise this question so that it better fits that design pattern/best practice.


    Carlo @ Tutorials Dojo

Viewing 1 - 2 of 2 replies

The forum ‘AWS Certified Data Analytics – Specialty’ is closed to new discussions and replies.

Original Post
0 of 0 posts June 2018