Exam AWS Certified Data Analytics - Specialty All Questions

View all questions & answers for the AWS Certified Data Analytics - Specialty exam

Exam AWS Certified Data Analytics - Specialty topic 1 question 114 discussion

Exam question from Amazon's AWS Certified Data Analytics - Specialty

Question #: 114
Topic #: 1

[All AWS Certified Data Analytics - Specialty Questions]

A company receives data from its vendor in JSON format with a timestamp in the file name. The vendor uploads the data to an Amazon S3 bucket, and the data is registered into the company's data lake for analysis and reporting. The company has configured an S3 Lifecycle policy to archive all files to S3 Glacier after 5 days.
The company wants to ensure that its AWS Glue crawler catalogs data only from S3 Standard storage and ignores the archived files. A data analytics specialist must implement a solution to achieve this goal without changing the current S3 bucket configuration.
Which solution meets these requirements?

A. Use the exclude patterns feature of AWS Glue to identify the S3 Glacier files for the crawler to exclude.
B. Schedule an automation job that uses AWS Lambda to move files from the original S3 bucket to a new S3 bucket for S3 Glacier storage.
C. Use the excludeStorageClasses property in the AWS Glue Data Catalog table to exclude files on S3 Glacier storage.
D. Use the include patterns feature of AWS Glue to identify the S3 Standard files for the crawler to include.

Show Suggested Answer

Suggested Answer: C 🗳️

by Kash12345 at Nov. 14, 2021, 8:14 p.m.

Disclaimers:

- ExamTopics website is not related to, affiliated with, endorsed or authorized by Amazon.
- Trademarks, certification & product names are used for reference only and belong to Amazon.

Comments

Submit Cancel

Kash12345

Highly Voted 3 years, 7 months ago

C - https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-storage-classes.html

upvoted 12 times

...

cloudlearnerhere

Highly Voted 2 years, 8 months ago

C is the right answer To exclude Amazon S3 storage classes while creating a dynamic frame, use excludeStorageClasses in additionalOptions. AWS Glue automatically uses its own Amazon S3 Lister implementation to list and exclude files corresponding to the specified storage classes. glueContext.create_dynamic_frame.from_catalog( database = "my_database", tableName = "my_table_name", redshift_tmp_dir = "", transformation_ctx = "my_transformation_context", additional_options = { "excludeStorageClasses" : ["GLACIER", "DEEP_ARCHIVE"] } ) https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-storage-classes.html

upvoted 5 times

...

pk349

Most Recent 2 years, 2 months ago

C: I passed the test

upvoted 2 times

...

rocky48

2 years, 6 months ago

Selected Answer: C

Selected C

upvoted 1 times

...

Nubosperta

2 years, 8 months ago

Selected Answer: C

Selected C

upvoted 1 times

...

muhsin

2 years, 10 months ago

C https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-storage-classes.html

upvoted 1 times

...

Dun6

2 years, 10 months ago

C please https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-storage-classes.html#aws-glue-programming-etl-storage-classes-table

upvoted 1 times

...

rudramadhu

2 years, 10 months ago

Selected Answer: C

due to https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-storage-classes.html

upvoted 1 times

...

rudramadhu

2 years, 10 months ago

Go with C - due to https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-storage-classes.html Why A wrong? - Exclude patterns can be used to ignore the file-paths/folders which are already been crawled before(or files from which data is not required). Since this reduces the number of files that crawlers needs to list everytime, crawler runtime is also reduced accordingly

upvoted 1 times

...

alfredofmt

2 years, 11 months ago

Selected Answer: B

A - WRONG, exclude patterns are configured at the crawler level with static basic patterns, they can't be parametrized with the current date. To accomplish this, you would need to re-create the same crawler every day with a new pattern. B - CORRECT, the only available way to exclude objects to be crawled is to physically moving them to a separate prefix or bucket. C - WRONG, excludeStorageClasses applies to a Glue ETL job that reads the table, but the table has already been crawled. D - WRONG, exclude patterns are configured at the crawler level with static basic patterns, they can't be parametrized with the current date. To accomplish this, you would need to re-create the same crawler every day with a new pattern.

upvoted 1 times

...

ClementChan

2 years, 11 months ago

Answer is A It allows you to "exclude pattern" only. For Include, it is talking about "include path". https://docs.aws.amazon.com/glue/latest/dg/define-crawler.html#crawler-data-stores-exclude

upvoted 2 times

...

carbita

2 years, 11 months ago

Guys Please READ. The question is AIMING to CRAWL data not PROCESS DATA with an ETL Process. The correct answer is "A". If you want to exclude certain files in your ETL process you might be using the "C" answer.

upvoted 2 times

roymunson

1 year, 7 months ago

Answer C: The crawler can access data stores directly as the source of the crawl, or it can use existing tables in the Data Catalog as the source. If the crawler uses existing catalog tables, it crawls the data stores that are specified by those catalog tables. https://docs.aws.amazon.com/glue/latest/dg/define-crawler.html You can specify storage class exclusions to be used by an AWS Glue ETL job as a table parameter in the AWS Glue Data Catalog. You can include this parameter in the CreateTable operation using the AWS Command Line Interface (AWS CLI) or programmatically using the API. For more information, see Table Structure and CreateTable. https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-storage-classes.html

upvoted 1 times

...

MBP911

2 years, 11 months ago

The link provided talks about "You can specify storage class exclusions to be used by an ***AWS Glue ETL job ****as a table parameter in the AWS Glue Data Catalog". The question is talking about how to make the ***Glue CRAWLER*** avoid files in those classes. The fact that the Q mentions a timestamp in the file name suggests there is a filename pattern based timestamp which can be used an include pattern in the crawler config. So shouldn't it be D?

upvoted 2 times

gopi_data_guy

2 years, 5 months ago

@MBP911, We can also add exclude storage class option in Glue catalog table. So C makes sense. https://docs.amazonaws.cn/en_us/glue/latest/dg/aws-glue-programming-etl-storage-classes.html#aws-glue-programming-etl-storage-classes-table

upvoted 1 times

...

tobsam

3 years, 7 months ago

Answer is C Option A is wrong. #Exclude patterns: These enable you to exclude certain files or tables from the crawl.

upvoted 5 times

lakediver

3 years, 6 months ago

Agree C for further reading see this https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-storage-classes.html

upvoted 4 times

...