B is right.
Is very simple to create a conversion file JOB in AWS Glue, using just 3 workflow steps.
WITH NO CODE.. CREATED AUTOMATICALLY BY GLUE (Scala or Python)
(s3 - source data file) --> (Data Mapping) --> (target transformed data file)
A. NO - Crawler is to populate the data catalog
B. YES - leverage serverless for distributed processing
C. NO - Altough EMR can run Spark like Glue, it is not serverless
D. NO - using the PySpark kernel will be single instance (running in the notebook)
Option B is better than option A because option A uses an AWS Glue crawler to convert the file format. A crawler is a component of AWS Glue that scans your data sources and infers the schema, format, partitioning, and other properties of your data. A crawler can create or update a table in the AWS Glue Data Catalog that points to your data source. However, a crawler cannot change the format of your data source itself. You still need to write a script or use a tool to convert your CSV files to Parquet files.
Option B.
A - Glue crawler creates Glue Data Catalog from S3 buckets. It can be used to query by athena.
C, D - not serverless and not generally used for etl.
AWS Glue is a fully-managed ETL service that makes it easy to move data between data stores. AWS Glue can be used to automate the conversion of CSV files to Parquet format with minimal effort. AWS Glue supports reading data from CSV files, transforming the data, and writing the transformed data to Parquet files.
Option A is incorrect because AWS Glue crawler is used to infer the schema of data stored in S3 and create AWS Glue Data Catalog tables.
Option C is incorrect because while Amazon EMR can be used to process large amounts of data and perform data conversions, it requires more operational effort than AWS Glue.
Option D is incorrect because Amazon SageMaker is a machine learning service, and while it can be used for data processing, it is not the best option for simple data format conversion tasks.
in sagemaker notebook, you'd have to write python code but question is asking for something easy so i choose option b https://blog.searce.com/convert-csv-json-files-to-apache-parquet-using-aws-glue-a760d177b45f
crawler just creates the data catalog (schema), it does not actually converts the data to another format. As per details in that article, you are creating a job where source is schema created by crawler and destination is output s3 where we store formatted data.
A voting comment increases the vote count for the chosen answer by one.
Upvoting a comment with a selected answer will also increase the vote count towards that answer by one.
So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.
jopaca1216
9 months, 2 weeks agoloict
9 months, 3 weeks agoMickey321
10 months, 2 weeks agoGiyeonShin
1 year, 4 months agoAjoseO
1 year, 4 months agodrcok87
1 year, 4 months agoJerry84
1 year, 4 months agodrcok87
1 year, 4 months ago