hiexam
amazon · AWS-Certified-Machine-Learning-Engineer---Associate-MLA-C01 · Q425 · multiple_choice · topic_1

Case study - An ML engineer is developing a fraud detection model on AWS. The training dataset includes transaction log…

Case study - An ML engineer is developing a fraud detection model on AWS. The training dataset includes transaction logs, customer profiles, and tables from an on-premises MySQL database. The transaction logs and customer profiles are stored in Amazon S3. The dataset has a class imbalance that affects the learning of the model's algorithm. Additionally, many of the features have interdependencies. The algorithm is not capturing all the desired underlying patterns in the data. Which AWS service or feature can aggregate the data from the various data sources?
  • A.Amazon EMR Spark jobs
  • B.Amazon Kinesis Data Streams
  • C.Amazon DynamoDB
  • D.AWS Lake Formation
Explanation
Selected Answer: A Amazon EMR with Spark is an excellent choice for aggregating, processing, and transforming large datasets from multiple sources (e.g., Amazon S3 and on-premises MySQL database). Spark jobs can handle both structured and unstructured. While Lake Formation is great for managing data lakes, it doesn’t provide the ETL and data processing capabilities required to aggregate and transform datasets from multiple sources.

Reference: examtopics_top_comment

Practice with progress tracking

Sign in to track wrong answers, get spaced-repetition reminders, and run timed exam mode.