Quiz Summary
0 of 24 questions completed
Questions:
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
Information
You have already completed the quiz before. Hence you can not start it again.
Quiz is loading…
You must sign in or sign up to start the quiz.
You must first complete the following:
Results
Results
0 of 24 questions answered correctly
Your time:
Time has elapsed
You have reached 0 of 0 point(s), (0)
Earned Point(s): 0 of 0, (0)
0 Essay(s) Pending (Possible Point(s): 0)
Categories
- DAS – Analysis and Visualization 0%
- DAS – Collection 0%
- DAS – Processing 0%
- DAS – Security 0%
- DAS – Storage and Data Management 0%
-
Sorry, you failed the test. Carefully read our detailed explanations including the references and cheat sheets then try again. 🙂
To view your record of all previous attempts:
Visit our FAQ page for more information on the site’s features.
-
Congratulations! You passed the test. We still highly encourage you to carefully read our detailed explanations including the references and cheat sheets. 🙂
To view your record of all previous attempts:
Visit our FAQ page for more information on the site’s features.
-
Awesome! Perfect score! We still highly encourage you to carefully read our detailed explanations including the references and cheat sheets. 🙂
To view your record of all previous attempts:
Visit our FAQ page for more information on the site’s features.
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- ✔️ Answered
- 🏳️ For Review
-
Question 1 of 24
1. Question
A company needs to load streaming data directly into a data store to analyze the price movements of various financial products. Occasionally, the data will be modified using SQL for custom processing. A Data Analyst has been instructed to create a solution that can aggregate data, run complex analytic queries, and publish the results to an interactive dashboard.
Which of the following is the MOST suitable solution that the Data Analyst should implement for this scenario?
CorrectIncorrect -
Question 2 of 24
2. Question
A US-based software company has a hybrid cloud architecture. Its Data Analyst plans to use Amazon QuickSight to create and publish interactive dashboards that will be accessed by their on-premises users. The company uses Amazon S3 and Amazon Redshift as its data lake and data warehouse respectively. The manager instructed the Data Analyst to ensure secure access from the on-premises Active Directory to Amazon QuickSight.
Which of the following must be done to meet the above requirement?
CorrectIncorrect -
Question 3 of 24
3. Question
A startup is using Apache HBase in Amazon EMR with a single master node to process its mission-critical workloads. The stored data in its Hadoop Distributed File System (HDFS) is over 8 TB. The Data Analyst has been instructed to provide a solution that provides the highest level of availability to the EMR cluster.
Which is the MOST cost-effective solution that the Data Analyst should implement to fulfill this requirement?
CorrectIncorrect -
Question 4 of 24
4. Question
A company hosts its web application in an Auto Scaling group of Amazon EC2 instances. The data analytics team needs to create a solution that will collect and analyze the logs from all of the EC2 instances running in production. The solution must be highly accessible and allows the viewing of the new log information in near real-time.
Which of the following is the most suitable solution to meet the requirement?
CorrectIncorrect -
Question 5 of 24
5. Question
A company is using AWS Glue to perform ETL jobs in a 120 GB dataset. The created job is triggered to run with a
Standard
worker type. A Data Analyst noticed that the job is still running after waiting for 2 hours, and there were no errors found in the logs. Three hours later, the ETL job had finally completed the processing. The Data Analyst needs to improve the job execution time in AWS Glue.Which of the following should be implemented to achieve this requirement?
CorrectIncorrect -
Question 6 of 24
6. Question
A company is using Amazon S3 to store financial data in CSV format. An AWS Glue crawler is used to populate the AWS Glue Data Catalog and create the tables and schema. The Data Analyst launched an AWS Glue job that processes the data from the tables and writes it to Amazon Redshift tables. After running several jobs, the Data Analyst noticed that duplicate records exist in the Amazon Redshift table. The analyst needs to ensure that the Redshift table must not have any duplicates when jobs were rerun.
Which of the following is the best approach to satisfy this requirement?
CorrectIncorrect -
Question 7 of 24
7. Question
A company launched a streaming application that reads hundreds of shards from Amazon Kinesis Data Streams then directly stores the results to an Amazon S3 bucket every 15 seconds. The data is then analyzed by the data analytics team using Amazon Athena. The team noticed that the query performance in Athena degrades overtime.
How can the data analytics team improve the performance of Amazon Athena in the most cost-effective way?
CorrectIncorrect -
Question 8 of 24
8. Question
A company has a clickstream analytics solution using Amazon OpenSearch Service. The solution ingests 2 TB of data from Amazon Kinesis Data Firehose and stores the latest data collected within 24 hours in an Amazon OpenSearch cluster. The cluster is running on a single index that has 12 data nodes and 3 dedicated master nodes. The cluster is configured with 3,000 shards and each node has 3 TB of EBS storage attached. The Data Analyst noticed that the query performance of OpenSearch is sluggish, and some intermittent errors are produced by the Kinesis Data Firehose when it tries to write to the index. Upon further investigation, there were occasional
JVMMemoryPressure
errors found in Amazon OpenSearch logs.What should be done to improve the performance of the Amazon OpenSearch Service cluster?
CorrectIncorrect -
Question 9 of 24
9. Question
A company collects sensor data from hundreds of IoT devices and stores 1 TB worth of the data in an Amazon Redshift cluster every day. Based on the ingestion rate, the cluster will run out of storage capacity after a few months. The majority of the queries only use the most recent 12 months of data. The other queries are for the quarterly reports that use the historical data generated from the past 5 years. The data analytics team has been tasked to develop a long-term solution that is both cost-effective and has less administrative effort.
Which of the following is the most suitable solution to meet the requirement?
CorrectIncorrect -
Question 10 of 24
10. Question
A company is using Amazon Kinesis Data Firehose to buffer the incoming data before delivering it to Amazon S3. To update the AWS Glue data catalog’s schema, a Data Analyst created a scheduled AWS Glue crawler that runs every 4 hours. The data will be analyzed using Amazon Elastic MapReduce and Spark SQL integrated with AWS Glue Data Catalog as its metastore. The Data Analyst noticed that there are times when the data being received is stale.
What must be done to fetch the most recent data consistently?
CorrectIncorrect -
Question 11 of 24
11. Question
A company provides insights into user behaviors of its social media platform using Amazon Athena. The Data Analysts from different teams run ad-hoc queries on the data stored on Amazon S3 buckets. However, some data contains sensitive information that must adhere to certain security policies. The query history and execution must be separated among different users and teams for compliance purposes.
Which of the following should be implemented to meet the above requirements?
CorrectIncorrect -
Question 12 of 24
12. Question
A company runs multiple Apache Spark jobs using Amazon EMR. Each job extracts and analyzes data from a Hadoop Distributed File System (HDFS) and then writes the results to an Amazon S3 bucket. However, some of the jobs fail with an
HTTP 503 "Slow Down" AmazonS3Exception
.Which methods could be taken to rectify the error? (SELECT TWO)
CorrectIncorrect -
Question 13 of 24
13. Question
An online grocery store sends order information to a Kinesis data stream for third-parties to analyze. The schema and format of the data received by each consumer may vary due to their individual requirements and preferences. The company wants to efficiently manage and control schema changes across its entire data stream.
Which option would be most effective in fulfilling these requirements?
CorrectIncorrect -
Question 14 of 24
14. Question
A Smart City initiative has deployed IoT sensors throughout the city to collect data on various aspects of the environment, such as air and water quality, traffic, and weather. The data is periodically sent to an Amazon Managed Streaming for Apache Kafka (MSK) cluster, where it’s processed by multiple consumers and then persisted in Amazon Redshift for analysis. The majority of queries focus on data that has been gathered in the past week.
As the data accumulates over time, the query performance of the Redshift cluster starts to degrade, and maintenance tasks such as vacuuming tables, running backups, and data archiving take longer to complete.
Which solution would address the problem most cost-effectively?
CorrectIncorrect -
Question 15 of 24
15. Question
A group of medical researchers is using computer simulations in studying the growth of cancer cells. The simulations generate millions of data points that are partitioned and stored in Amazon S3 One Zone-Infrequent Access (S3 One Zone-IA). The analytics processing for the data is performed on Amazon EMR clusters using EMRFS with consistent view enabled. The researchers noticed that the overall performance of the cluster can’t keep up with the increasing number of concurrent queries and analytics jobs running on it. It has been determined that the EMR task nodes are taking longer to list objects in Amazon S3.
Which of the following actions will most likely increase the performance of the cluster in reading Amazon S3 objects?
CorrectIncorrect -
Question 16 of 24
16. Question
A company has multiple data analytics teams that run its own Amazon EMR cluster. Each team has its own metadata for running different SQL queries using Hive. A centralized metadata layer must be created that exposes S3 objects as tables that can be used by all teams.
What should be done to fulfill this requirement?
CorrectIncorrect -
Question 17 of 24
17. Question
A Security Analyst uses AWS Web Application Firewall (WAF) to protect a web application hosted on an EC2 instance from common web exploits. The AWS WAF sends web ACL traffic logs to an Amazon Kinesis Data Firehose delivery stream for format conversion and uses an Amazon S3 bucket to store the processed logs.
The analyst is looking for a cost-efficient solution to perform infrequent log analysis and data visualizations with minimal development effort.
Which approach best fits the requirements?
CorrectIncorrect -
Question 18 of 24
18. Question
A logistics company is looking to improve the efficiency of its delivery operations. To achieve this, the company plans to build an operational intelligence dashboard that gives situational awareness to its operations team in near-real time.
The company has equipped each delivery truck with GPS devices that transmit location data every few seconds. The company will use Amazon Redshift as its data warehouse and Grafana to develop interactive dashboards. The operations team must be notified if specific threshold values are met within the dashboard.
Which solution can fulfill the requirements while minimizing the latency and the processing time?
CorrectIncorrect -
Question 19 of 24
19. Question
A company runs a Reserved Amazon EC2 instance to process ETL jobs before sending the results into an Amazon Redshift cluster. Because of scaling issues, the company eventually replaced the EC2 instance with AWS Glue and Amazon S3. Since the architecture has changed, the Data Analyst must also make necessary changes in the workflow. Part of the new process is to save the Redshift query results to an external storage for occasional analysis.
Which of the following methods is the most cost-efficient solution for the new process?
CorrectIncorrect -
Question 20 of 24
20. Question
A fitness company collects wearable device data from millions of users, accumulating 15 TB of JSON data in Amazon S3 every month. The company wants to use Amazon Athena for ad-hoc, complex analyses in identifying health-related patterns.
Which approach would enable the company to optimize query execution speed and storage use?
CorrectIncorrect -
Question 21 of 24
21. Question
A company uses a 3-node Amazon OpenSearch Service cluster for its search application. They’ve set up multiple CloudWatch alerts to invoke a Lambda function for scaling activities based on cluster health metrics. When these scaling activities are triggered, a blue/green deployment is initiated. This results in temporary spikes in cluster load and increases the latency for search and indexing operations. As a consequence, users face service disruptions during these periods.
How can the company improve the cluster’s stability?
CorrectIncorrect -
Question 22 of 24
22. Question
A company maintains a record of accounts linked to suspicious activities over time. Recently, new registrations with subtle variations in names and emails from the flagged data have been observed. Given that their security system is built for exact matches, detecting discrepancies in data becomes challenging. To address this, the company wants to build a fraud detection system that can identify when a newly registered account potentially matches a known flagged user.
Which solution would meet the company’s objective in the most cost-effective manner?
CorrectIncorrect -
Question 23 of 24
23. Question
A marketing agency conducts daily sales campaigns and subsequently stores the ad performance data in its Amazon S3 data lake. The agency plans to create a system that will enable analysts to conduct visual data profiling so they can understand the correlations between different data quality metrics.
Which set of actions can fulfill the requirements with the least amount of management overhead?
CorrectIncorrect -
Question 24 of 24
24. Question
A streaming company runs various microservices for handling tasks like content recommendation, user profile management, or feedback collection. Each microservice consumes TLS-encrypted streaming data from a designated Amazon Managed Streaming for Apache Kafka (Amazon MSK) topic.
However, after deploying a new recommendation algorithm, the content recommendation microservice began receiving data intended for feedback collection, leading to inconsistencies in user experience and skewed analytics.
Which solution would enable the company to guarantee that each microservice can only access data from its assigned topic?
CorrectIncorrect