hiexam
amazon · AWS-Certified-Big-Data---Specialty · Q424 · multiple_choice · topic_1

A data engineer in a manufacturing company is designing a data processing platform that receives a large volume of unst…

A data engineer in a manufacturing company is designing a data processing platform that receives a large volume of unstructured data. The data engineer must populate a well-structured star schema in Amazon Redshift. What is the most efficient architecture strategy for this purpose?
  • A.Transform the unstructured data using Amazon EMR and generate CSV data. COPY the CSV data into the analysis schema within Redshift.
  • B.Load the unstructured data into Redshift, and use string parsing functions to extract structured data for inserting into the analysis schema.
  • C.When the data is saved to Amazon S3, use S3 Event Notifications and AWS Lambda to transform the file contents. Insert the data into the analysis schema on Redshift.
  • D.Normalize the data using an AWS Marketplace ETL tool, persist the results to Amazon S3, and use AWS Lambda to INSERT the data into Redshift.
Explanation
A is correct. Not B - never load unstructured data to Redshift Not C - s3 event + lambda would be more suitable for incremental, continuous S3-Redshift integration. Here, we have one large bulk load, so event notifications don't make sense and lambda may not be able to handle all transformation in one call due to service limits. Not D - Normalization is the act of adjusting values on a scale, usually subtracting mean and dividing by standard deviation. That doesn't make sense here.

Reference: examtopics_top_comment

Practice with progress tracking

Sign in to track wrong answers, get spaced-repetition reminders, and run timed exam mode.