Exam AWS Certified Data Analytics - Specialty All Questions

View all questions & answers for the AWS Certified Data Analytics - Specialty exam

Exam AWS Certified Data Analytics - Specialty topic 1 question 18 discussion

Exam question from Amazon's AWS Certified Data Analytics - Specialty

Question #: 18
Topic #: 1

[All AWS Certified Data Analytics - Specialty Questions]

A company currently uses Amazon Athena to query its global datasets. The regional data is stored in Amazon S3 in the us-east-1 and us-west-2 Regions. The data is not encrypted. To simplify the query process and manage it centrally, the company wants to use Athena in us-west-2 to query data from Amazon S3 in both
Regions. The solution should be as low-cost as possible.
What should the company do to achieve this goal?

A. Use AWS DMS to migrate the AWS Glue Data Catalog from us-east-1 to us-west-2. Run Athena queries in us-west-2.
B. Run the AWS Glue crawler in us-west-2 to catalog datasets in all Regions. Once the data is crawled, run Athena queries in us-west-2.
C. Enable cross-Region replication for the S3 buckets in us-east-1 to replicate data in us-west-2. Once the data is replicated in us-west-2, run the AWS Glue crawler there to update the AWS Glue Data Catalog in us-west-2 and run Athena queries.
D. Update AWS Glue resource policies to provide us-east-1 AWS Glue Data Catalog access to us-west-2. Once the catalog in us-west-2 has access to the catalog in us-east-1, run Athena queries in us-west-2.

Show Suggested Answer

Suggested Answer: B 🗳️

by zanhsieh at Aug. 16, 2020, 2:25 p.m.

Disclaimers:

- ExamTopics website is not related to, affiliated with, endorsed or authorized by Amazon.
- Trademarks, certification & product names are used for reference only and belong to Amazon.

Comments

Submit Cancel

zanhsieh

Highly Voted 3 years, 8 months ago

B. AWS DMS is not for this purpose, so A dropped. C would be costly since it literally replicates all data. There’s no “resource policies” in AWS Glue, so D dropped.

upvoted 30 times

Huy

3 years, 7 months ago

I agree with you that D is wrong but my ideas is you shouldn't based on a property that is not available for the service. Instead, think in-depth about what is the answer actually suggest. https://docs.aws.amazon.com/glue/latest/dg/glue-resource-policies.html Here, the answer wants AWS Glue to use Data Catalog from different region which is not supported.

upvoted 2 times

...

certificationJunkie

3 years ago

glue crawler will simply generate the metadata on top of s3 files. But the Athena running in another region will still not have access to the first region files. Also, even glue crawler might not have permission to crawl in another region s3 files. Hence replication is the only option.

upvoted 2 times

certificationJunkie

3 years ago

No, glue crawler is not restricted to a region and can catalogue data in other regions. And then Athena can use the catalogue and generate results. I have seen this happening in my project

upvoted 5 times

...

JoellaLi

2 years, 7 months ago

There is 'resource policies': https://docs.aws.amazon.com/glue/latest/dg/glue-policy-examples-resource-policies.html

upvoted 3 times

...

cloudlearnerhere

Highly Voted 2 years, 7 months ago

Selected Answer: B

B is correct as AWS Glue can crawl data in different AWS Regions. When you define an Amazon S3 data store to crawl, you can choose whether to crawl a path in your account or another account. The output of the crawler is one or more metadata tables defined in the AWS Glue Data Catalog. A table is created for one or more files found in your data store. If all the Amazon S3 files in a folder have the same schema, the crawler creates one table. Also, if the Amaazon S3 object is partitioned, only one metadata table is created. A is wrong because you can't use AWS DMS with AWS Glue Data Catalog. C is incorrect because replicating the data in S3 means that your storage costs will also double. D is wrong because a resource-based policy is primarily used to provide IAM users and roles granular access to metadata definitions of databases, tables, connections, and user-defined functions, and not the actual S3 data.

upvoted 14 times

...

GCPereira

Most Recent 1 year, 5 months ago

A: DMS is not required to migrate data from one region to another. It can even be used to migrate data from an S3 bucket to another bucket in another account, but there are better and cheaper ways to do this (considering the volume of data, of course). B: It is the correct alternative. Glue crawlers can catalog data that is in different regions. It's simple to set up and not expensive. C: Cross-region works for data replication, but it will be duplicated unnecessarily. D: This type of permissions is best suited for LakeFormation and would not help catalog data that is in different regions.

upvoted 1 times

...

nroopa

1 year, 9 months ago

Option D https://aws.amazon.com/blogs/big-data/configure-cross-region-table-access-with-the-aws-glue-catalog-and-aws-lake-formation/

upvoted 1 times

...

NikkyDicky

1 year, 10 months ago

Selected Answer: B

going w B

upvoted 1 times

...

Cloudbert

2 years, 1 month ago

Selected Answer: B

B. Source: https://docs.aws.amazon.com/glue/latest/dg/crawler-data-stores.html. You can choose to crawl a path in your account or in another account. Crawlers use an AWS Identity and Access Management (IAM) role for permission to access your data stores. The role you pass to the crawler must have permission to access Amazon S3 paths and Amazon DynamoDB tables that are crawled. Another source: https://docs.aws.amazon.com/athena/latest/ug/querying-across-regions.html. Athena can query cross-region Athena supports the ability to query Amazon S3 data in an AWS Region that is different from the Region in which you are using Athena. Querying across Regions can be an option when moving the data is not practical or permissible, or if you want to query data across multiple regions. Even if Athena is not available in a particular Region, data from that Region can be queried from another Region in which Athena is available.

upvoted 1 times

...

Debi_mishra

2 years, 1 month ago

B is correct for context of this question but will be a bad implementation in real life. D can be good pattern but with help of Lakeformation.

upvoted 2 times

...

pk349

2 years, 1 month ago

B: I passed the test

upvoted 2 times

...

austinoy

2 years, 3 months ago

the data is not encrypted so moving data is not "practical or permissible"?

upvoted 1 times

...

Ashoks

2 years, 4 months ago

D should be...

upvoted 1 times

...

mulder1989

2 years, 4 months ago

A, B, D simply wouldn't work because of lacking connection to the data source. The only thing that I am not sure is about the 'lowest cost'. It can be option B if the wording implies that the connectivity exits https://aws.amazon.com/blogs/big-data/create-cross-account-and-cross-region-aws-glue-connections/

upvoted 1 times

...

Nicoben

2 years, 4 months ago

D. See: https://docs.aws.amazon.com/glue/latest/dg/cross-account-access.html

upvoted 1 times

...

Chelseajcole

2 years, 5 months ago

That's why D is wrong? Each AWS account owns a single catalog in an AWS Region whose catalog ID is the same as the AWS account ID https://docs.aws.amazon.com/glue/latest/dg/glue-resource-policies.html

upvoted 2 times

...