exam questions

Exam AWS Certified Data Analytics - Specialty All Questions

View all questions & answers for the AWS Certified Data Analytics - Specialty exam

Exam AWS Certified Data Analytics - Specialty topic 1 question 114 discussion

A company receives data from its vendor in JSON format with a timestamp in the file name. The vendor uploads the data to an Amazon S3 bucket, and the data is registered into the company's data lake for analysis and reporting. The company has configured an S3 Lifecycle policy to archive all files to S3 Glacier after 5 days.
The company wants to ensure that its AWS Glue crawler catalogs data only from S3 Standard storage and ignores the archived files. A data analytics specialist must implement a solution to achieve this goal without changing the current S3 bucket configuration.
Which solution meets these requirements?

  • A. Use the exclude patterns feature of AWS Glue to identify the S3 Glacier files for the crawler to exclude.
  • B. Schedule an automation job that uses AWS Lambda to move files from the original S3 bucket to a new S3 bucket for S3 Glacier storage.
  • C. Use the excludeStorageClasses property in the AWS Glue Data Catalog table to exclude files on S3 Glacier storage.
  • D. Use the include patterns feature of AWS Glue to identify the S3 Standard files for the crawler to include.
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
Kash12345
Highly Voted 3 years, 7 months ago
C - https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-storage-classes.html
upvoted 12 times
...
cloudlearnerhere
Highly Voted 2 years, 8 months ago
C is the right answer To exclude Amazon S3 storage classes while creating a dynamic frame, use excludeStorageClasses in additionalOptions. AWS Glue automatically uses its own Amazon S3 Lister implementation to list and exclude files corresponding to the specified storage classes. glueContext.create_dynamic_frame.from_catalog( database = "my_database", tableName = "my_table_name", redshift_tmp_dir = "", transformation_ctx = "my_transformation_context", additional_options = { "excludeStorageClasses" : ["GLACIER", "DEEP_ARCHIVE"] } ) https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-storage-classes.html
upvoted 5 times
...
pk349
Most Recent 2 years, 2 months ago
C: I passed the test
upvoted 2 times
...
rocky48
2 years, 6 months ago
Selected Answer: C
Selected C
upvoted 1 times
...
Nubosperta
2 years, 8 months ago
Selected Answer: C
Selected C
upvoted 1 times
...
muhsin
2 years, 10 months ago
C https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-storage-classes.html
upvoted 1 times
...
Dun6
2 years, 10 months ago
C please https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-storage-classes.html#aws-glue-programming-etl-storage-classes-table
upvoted 1 times
...
rudramadhu
2 years, 10 months ago
Selected Answer: C
due to https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-storage-classes.html
upvoted 1 times
...
rudramadhu
2 years, 10 months ago
Go with C - due to https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-storage-classes.html Why A wrong? - Exclude patterns can be used to ignore the file-paths/folders which are already been crawled before(or files from which data is not required). Since this reduces the number of files that crawlers needs to list everytime, crawler runtime is also reduced accordingly
upvoted 1 times
...
alfredofmt
2 years, 11 months ago
Selected Answer: B
A - WRONG, exclude patterns are configured at the crawler level with static basic patterns, they can't be parametrized with the current date. To accomplish this, you would need to re-create the same crawler every day with a new pattern. B - CORRECT, the only available way to exclude objects to be crawled is to physically moving them to a separate prefix or bucket. C - WRONG, excludeStorageClasses applies to a Glue ETL job that reads the table, but the table has already been crawled. D - WRONG, exclude patterns are configured at the crawler level with static basic patterns, they can't be parametrized with the current date. To accomplish this, you would need to re-create the same crawler every day with a new pattern.
upvoted 1 times
...
ClementChan
2 years, 11 months ago
Answer is A It allows you to "exclude pattern" only. For Include, it is talking about "include path". https://docs.aws.amazon.com/glue/latest/dg/define-crawler.html#crawler-data-stores-exclude
upvoted 2 times
...
carbita
2 years, 11 months ago
Guys Please READ. The question is AIMING to CRAWL data not PROCESS DATA with an ETL Process. The correct answer is "A". If you want to exclude certain files in your ETL process you might be using the "C" answer.
upvoted 2 times
roymunson
1 year, 7 months ago
Answer C: The crawler can access data stores directly as the source of the crawl, or it can use existing tables in the Data Catalog as the source. If the crawler uses existing catalog tables, it crawls the data stores that are specified by those catalog tables. https://docs.aws.amazon.com/glue/latest/dg/define-crawler.html You can specify storage class exclusions to be used by an AWS Glue ETL job as a table parameter in the AWS Glue Data Catalog. You can include this parameter in the CreateTable operation using the AWS Command Line Interface (AWS CLI) or programmatically using the API. For more information, see Table Structure and CreateTable. https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-storage-classes.html
upvoted 1 times
...
...
MBP911
2 years, 11 months ago
The link provided talks about "You can specify storage class exclusions to be used by an ***AWS Glue ETL job ****as a table parameter in the AWS Glue Data Catalog". The question is talking about how to make the ***Glue CRAWLER*** avoid files in those classes. The fact that the Q mentions a timestamp in the file name suggests there is a filename pattern based timestamp which can be used an include pattern in the crawler config. So shouldn't it be D?
upvoted 2 times
gopi_data_guy
2 years, 5 months ago
@MBP911, We can also add exclude storage class option in Glue catalog table. So C makes sense. https://docs.amazonaws.cn/en_us/glue/latest/dg/aws-glue-programming-etl-storage-classes.html#aws-glue-programming-etl-storage-classes-table
upvoted 1 times
...
...
tobsam
3 years, 7 months ago
Answer is C Option A is wrong. #Exclude patterns: These enable you to exclude certain files or tables from the crawl.
upvoted 5 times
lakediver
3 years, 6 months ago
Agree C for further reading see this https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-storage-classes.html
upvoted 4 times
...
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...