# AWS-Certified-Machine-Learning-Engineer---Associate-MLA-C01 — Question 425

**Type:** multiple_choice
**Topics:** topic_1

## Question

Case study -
An ML engineer is developing a fraud detection model on AWS. The training dataset includes transaction logs, customer profiles, and tables from an on-premises MySQL database. The transaction logs and customer profiles are stored in Amazon S3.
The dataset has a class imbalance that affects the learning of the model's algorithm. Additionally, many of the features have interdependencies. The algorithm is not capturing all the desired underlying patterns in the data.
Which AWS service or feature can aggregate the data from the various data sources?

## Correct Answer

_See scenario._

## Explanation

Selected Answer: A
Amazon EMR with Spark is an excellent choice for aggregating, processing, and transforming large datasets from multiple sources (e.g., Amazon S3 and on-premises MySQL database). Spark jobs can handle both structured and unstructured. While Lake Formation is great for managing data lakes, it doesn’t provide the ETL and data processing capabilities required to aggregate and transform datasets from multiple sources.

**Reference:** examtopics_top_comment

---
Source: https://hiexam.net/q/amazon/AWS-Certified-Machine-Learning-Engineer---Associate-MLA-C01/425  
Practice (tracked): https://hiexam.net/study/AWS-Certified-Machine-Learning-Engineer---Associate-MLA-C01/practice