exam questions

Exam AWS Certified Machine Learning - Specialty All Questions

View all questions & answers for the AWS Certified Machine Learning - Specialty exam

Exam AWS Certified Machine Learning - Specialty topic 1 question 212 discussion

A data scientist has 20 TB of data in CSV format in an Amazon S3 bucket. The data scientist needs to convert the data to Apache Parquet format.

How can the data scientist convert the file format with the LEAST amount of effort?

  • A. Use an AWS Glue crawler to convert the file format.
  • B. Write a script to convert the file format. Run the script as an AWS Glue job.
  • C. Write a script to convert the file format. Run the script on an Amazon EMR cluster.
  • D. Write a script to convert the file format. Run the script in an Amazon SageMaker notebook.
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
jopaca1216
9 months, 2 weeks ago
B is right. Is very simple to create a conversion file JOB in AWS Glue, using just 3 workflow steps. WITH NO CODE.. CREATED AUTOMATICALLY BY GLUE (Scala or Python) (s3 - source data file) --> (Data Mapping) --> (target transformed data file)
upvoted 1 times
...
loict
9 months, 3 weeks ago
Selected Answer: B
A. NO - Crawler is to populate the data catalog B. YES - leverage serverless for distributed processing C. NO - Altough EMR can run Spark like Glue, it is not serverless D. NO - using the PySpark kernel will be single instance (running in the notebook)
upvoted 1 times
...
Mickey321
10 months, 2 weeks ago
Selected Answer: B
Option B is better than option A because option A uses an AWS Glue crawler to convert the file format. A crawler is a component of AWS Glue that scans your data sources and infers the schema, format, partitioning, and other properties of your data. A crawler can create or update a table in the AWS Glue Data Catalog that points to your data source. However, a crawler cannot change the format of your data source itself. You still need to write a script or use a tool to convert your CSV files to Parquet files.
upvoted 2 times
...
GiyeonShin
1 year, 4 months ago
Selected Answer: B
Option B. A - Glue crawler creates Glue Data Catalog from S3 buckets. It can be used to query by athena. C, D - not serverless and not generally used for etl.
upvoted 2 times
...
AjoseO
1 year, 4 months ago
Selected Answer: B
AWS Glue is a fully-managed ETL service that makes it easy to move data between data stores. AWS Glue can be used to automate the conversion of CSV files to Parquet format with minimal effort. AWS Glue supports reading data from CSV files, transforming the data, and writing the transformed data to Parquet files. Option A is incorrect because AWS Glue crawler is used to infer the schema of data stored in S3 and create AWS Glue Data Catalog tables. Option C is incorrect because while Amazon EMR can be used to process large amounts of data and perform data conversions, it requires more operational effort than AWS Glue. Option D is incorrect because Amazon SageMaker is a machine learning service, and while it can be used for data processing, it is not the best option for simple data format conversion tasks.
upvoted 2 times
...
drcok87
1 year, 4 months ago
in sagemaker notebook, you'd have to write python code but question is asking for something easy so i choose option b https://blog.searce.com/convert-csv-json-files-to-apache-parquet-using-aws-glue-a760d177b45f
upvoted 2 times
Jerry84
1 year, 4 months ago
From you link, A(Glue crawler) Should be correct.
upvoted 1 times
drcok87
1 year, 4 months ago
crawler just creates the data catalog (schema), it does not actually converts the data to another format. As per details in that article, you are creating a job where source is schema created by crawler and destination is output s3 where we store formatted data.
upvoted 3 times
...
...
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...