exam questions

Exam AWS Certified Data Analytics - Specialty All Questions

View all questions & answers for the AWS Certified Data Analytics - Specialty exam

Exam AWS Certified Data Analytics - Specialty topic 1 question 18 discussion

A company currently uses Amazon Athena to query its global datasets. The regional data is stored in Amazon S3 in the us-east-1 and us-west-2 Regions. The data is not encrypted. To simplify the query process and manage it centrally, the company wants to use Athena in us-west-2 to query data from Amazon S3 in both
Regions. The solution should be as low-cost as possible.
What should the company do to achieve this goal?

  • A. Use AWS DMS to migrate the AWS Glue Data Catalog from us-east-1 to us-west-2. Run Athena queries in us-west-2.
  • B. Run the AWS Glue crawler in us-west-2 to catalog datasets in all Regions. Once the data is crawled, run Athena queries in us-west-2.
  • C. Enable cross-Region replication for the S3 buckets in us-east-1 to replicate data in us-west-2. Once the data is replicated in us-west-2, run the AWS Glue crawler there to update the AWS Glue Data Catalog in us-west-2 and run Athena queries.
  • D. Update AWS Glue resource policies to provide us-east-1 AWS Glue Data Catalog access to us-west-2. Once the catalog in us-west-2 has access to the catalog in us-east-1, run Athena queries in us-west-2.
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
zanhsieh
Highly Voted 3 years, 8 months ago
B. AWS DMS is not for this purpose, so A dropped. C would be costly since it literally replicates all data. There’s no “resource policies” in AWS Glue, so D dropped.
upvoted 30 times
Huy
3 years, 7 months ago
I agree with you that D is wrong but my ideas is you shouldn't based on a property that is not available for the service. Instead, think in-depth about what is the answer actually suggest. https://docs.aws.amazon.com/glue/latest/dg/glue-resource-policies.html Here, the answer wants AWS Glue to use Data Catalog from different region which is not supported.
upvoted 2 times
...
glue crawler will simply generate the metadata on top of s3 files. But the Athena running in another region will still not have access to the first region files. Also, even glue crawler might not have permission to crawl in another region s3 files. Hence replication is the only option.
upvoted 2 times
No, glue crawler is not restricted to a region and can catalogue data in other regions. And then Athena can use the catalogue and generate results. I have seen this happening in my project
upvoted 5 times
...
...
JoellaLi
2 years, 7 months ago
There is 'resource policies': https://docs.aws.amazon.com/glue/latest/dg/glue-policy-examples-resource-policies.html
upvoted 3 times
...
...
cloudlearnerhere
Highly Voted 2 years, 7 months ago
Selected Answer: B
B is correct as AWS Glue can crawl data in different AWS Regions. When you define an Amazon S3 data store to crawl, you can choose whether to crawl a path in your account or another account. The output of the crawler is one or more metadata tables defined in the AWS Glue Data Catalog. A table is created for one or more files found in your data store. If all the Amazon S3 files in a folder have the same schema, the crawler creates one table. Also, if the Amaazon S3 object is partitioned, only one metadata table is created. A is wrong because you can't use AWS DMS with AWS Glue Data Catalog. C is incorrect because replicating the data in S3 means that your storage costs will also double. D is wrong because a resource-based policy is primarily used to provide IAM users and roles granular access to metadata definitions of databases, tables, connections, and user-defined functions, and not the actual S3 data.
upvoted 14 times
...
GCPereira
Most Recent 1 year, 5 months ago
A: DMS is not required to migrate data from one region to another. It can even be used to migrate data from an S3 bucket to another bucket in another account, but there are better and cheaper ways to do this (considering the volume of data, of course). B: It is the correct alternative. Glue crawlers can catalog data that is in different regions. It's simple to set up and not expensive. C: Cross-region works for data replication, but it will be duplicated unnecessarily. D: This type of permissions is best suited for LakeFormation and would not help catalog data that is in different regions.
upvoted 1 times
...
nroopa
1 year, 9 months ago
Option D https://aws.amazon.com/blogs/big-data/configure-cross-region-table-access-with-the-aws-glue-catalog-and-aws-lake-formation/
upvoted 1 times
...
NikkyDicky
1 year, 10 months ago
Selected Answer: B
going w B
upvoted 1 times
...
Cloudbert
2 years, 1 month ago
Selected Answer: B
B. Source: https://docs.aws.amazon.com/glue/latest/dg/crawler-data-stores.html. You can choose to crawl a path in your account or in another account. Crawlers use an AWS Identity and Access Management (IAM) role for permission to access your data stores. The role you pass to the crawler must have permission to access Amazon S3 paths and Amazon DynamoDB tables that are crawled. Another source: https://docs.aws.amazon.com/athena/latest/ug/querying-across-regions.html. Athena can query cross-region Athena supports the ability to query Amazon S3 data in an AWS Region that is different from the Region in which you are using Athena. Querying across Regions can be an option when moving the data is not practical or permissible, or if you want to query data across multiple regions. Even if Athena is not available in a particular Region, data from that Region can be queried from another Region in which Athena is available.
upvoted 1 times
...
Debi_mishra
2 years, 1 month ago
B is correct for context of this question but will be a bad implementation in real life. D can be good pattern but with help of Lakeformation.
upvoted 2 times
...
pk349
2 years, 1 month ago
B: I passed the test
upvoted 2 times
...
austinoy
2 years, 3 months ago
the data is not encrypted so moving data is not "practical or permissible"?
upvoted 1 times
...
Ashoks
2 years, 4 months ago
D should be...
upvoted 1 times
...
mulder1989
2 years, 4 months ago
A, B, D simply wouldn't work because of lacking connection to the data source. The only thing that I am not sure is about the 'lowest cost'. It can be option B if the wording implies that the connectivity exits https://aws.amazon.com/blogs/big-data/create-cross-account-and-cross-region-aws-glue-connections/
upvoted 1 times
...
Nicoben
2 years, 4 months ago
D. See: https://docs.aws.amazon.com/glue/latest/dg/cross-account-access.html
upvoted 1 times
...
Chelseajcole
2 years, 5 months ago
That's why D is wrong? Each AWS account owns a single catalog in an AWS Region whose catalog ID is the same as the AWS account ID https://docs.aws.amazon.com/glue/latest/dg/glue-resource-policies.html
upvoted 2 times
...
Haimett
2 years, 7 months ago
Selected Answer: B
Both B and D will work. The answer is B because option D is a bit more expensive.
upvoted 1 times
...
LukeTran3206
2 years, 8 months ago
Selected Answer: D
Must be D
upvoted 2 times
...
rav009
2 years, 8 months ago
Selected Answer: B
B is correct D is wrong because there is no resource policies but only trust policy.
upvoted 1 times
VishalSingh
2 years, 7 months ago
It has https://docs.aws.amazon.com/glue/latest/dg/glue-resource-policies.html
upvoted 2 times
...
...
Arka_01
2 years, 8 months ago
Selected Answer: B
The lowest cost option is B. All other options are involved with greater cost, as data migration between regions costs more.
upvoted 1 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...