Home › Forums › AWS › AWS Certified Generative AI Developer – Professional AIP-C01 › Review Mode Bonus Set 3 — incorrect answers › Reply To: Review Mode Bonus Set 3 — incorrect answers
-
Hi Dinaya,
I hope you’re doing well!
Thank you for your feedback, and I understand your point about the ordering of the steps. I wanted to clarify the reasoning behind the corrected order and why it’s structured the way it is for the model evaluation workflow.
Here’s why the order works:
-
Define clear evaluation criteria that measure attributes such as correctness, relevance, and linguistic quality.
- This step should always come first because it establishes the standards by which you will evaluate the model’s performance. Without a defined evaluation framework, you wouldn’t know how to assess the success of the new model effectively. It’s essential to set the evaluation criteria upfront to ensure that every subsequent step is aligned with your goals.
-
Assemble a benchmark dataset containing varied prompts, rare scenarios, and regulatory edge cases.
- Once you have the evaluation criteria in place, you need to gather the right data. The benchmark dataset should be designed to cover all the scenarios (including rare ones) and regulatory edge cases, ensuring a comprehensive evaluation. This dataset should be curated to ensure it aligns with the criteria you’ve just set.
-
Run controlled A/B testing to directly compare outputs from the new FM and the current production model.
- After defining the criteria and assembling the dataset, it’s time to perform A/B testing. This step involves running both the new and current models against the benchmark dataset to measure and compare their performance. It’s critical to run this test in a controlled environment to observe the results without introducing bias.
-
Use AWS Step Functions to enforce automated approval checkpoints before advancing the workflow.
- Following the A/B testing, AWS Step Functions are used to automate the workflow and enforce approvals before moving forward. This step ensures that all stakeholders have signed off on the results of the A/B test and that there is a structured process for advancing the model deployment.
-
Review the comparison data, summarize findings, and generate a formal evaluation report with performance conclusions.
- The final step is to analyze the comparison data, summarize the findings from the A/B tests, and document the results in a formal evaluation report. This is where you draw conclusions about whether the new model performs better than the current production model and whether it meets the desired performance criteria.
Why this order makes sense:
Steps 1 and 2 logically set the stage for Step 3 (A/B testing). You can’t effectively compare the models without first defining what success looks like and gathering the data that will test those criteria. Step 4 ensures that the workflow is automated and that appropriate approvals are in place, helping to maintain governance and compliance standards, which is crucial before any changes are made in a production environment. Finally, Step 5 wraps everything up by reviewing the results and making a formal decision based on the evaluation.
I hope this clears up the reasoning behind the order! If you have any more questions or need further clarification, please feel free to reach out.
Best regards,
Nikee @ Tutorials Dojo
-
Define clear evaluation criteria that measure attributes such as correctness, relevance, and linguistic quality.