Home › Forums › AWS › AWS Certified Data Analytics – Specialty › EMR Cost optimization
-
A company has moved its data transformation job to an Amazon EMR cluster with Apache Pig. The cluster uses On-Demand instances to process large datasets, and the output is critical to operations. It typically takes around 1 hour to complete the job. Even so, the company must ensure that the whole process strictly adheres to the service level agreement (SLA) of 2 hours. The company is looking for a solution that will provide cost reduction and negligibly impacts availability.
Which combination of solutions should be implemented to meet these requirements? (Select TWO.)
—-
I get your suggested right answer but… there is an EMR Best Practices video where an AWS rep explicitly tells that you need to structure your cluster with resilient nodes to deliver within SLA, but then add cheap spot task nodes to finish the cluster job faster and save $
<font face=”inherit”>So I suggest to rescope the question somehow to explicitly eliminate one of options, maybe by </font>indicating<font face=”inherit”> that the cluster is long-running</font>
-
Hello Klimok,
Thanks for sharing your thoughts on this.
I had another thing in mind when I wrote this question, but as the video suggests, using an instance fleet (spot + on-demand for resiliency) for core nodes and spot instances as task nodes to help with the processing seems to be the better setup for the scenario. We’ll try to revise this question so that it better fits that design pattern/best practice.
Regards,
Carlo @ Tutorials Dojo
Log in to reply.