exam questions

Exam AWS Certified Data Engineer - Associate DEA-C01 All Questions

View all questions & answers for the AWS Certified Data Engineer - Associate DEA-C01 exam

Exam AWS Certified Data Engineer - Associate DEA-C01 topic 1 question 31 discussion

A company is building an analytics solution. The solution uses Amazon S3 for data lake storage and Amazon Redshift for a data warehouse. The company wants to use Amazon Redshift Spectrum to query the data that is in Amazon S3.
Which actions will provide the FASTEST queries? (Choose two.)

  • A. Use gzip compression to compress individual files to sizes that are between 1 GB and 5 GB.
  • B. Use a columnar storage file format.
  • C. Partition the data based on the most common query predicates.
  • D. Split the data into files that are less than 10 KB.
  • E. Use file formats that are not splittable.
Show Suggested Answer Hide Answer
Suggested Answer: BC 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
GiorgioGss
Highly Voted 1 year, 1 month ago
Selected Answer: BC
https://docs.aws.amazon.com/redshift/latest/dg/c-spectrum-external-performance.html
upvoted 6 times
...
rralucard_
Highly Voted 1 year, 3 months ago
Selected Answer: BC
B. Use a columnar storage file format: This is an excellent approach. Columnar storage formats like Parquet and ORC are highly recommended for use with Redshift Spectrum. They store data in columns, which allows Spectrum to scan only the needed columns for a query, significantly improving query performance and reducing the amount of data scanned. C. Partition the data based on the most common query predicates: Partitioning data in S3 based on commonly used query predicates (like date, region, etc.) allows Redshift Spectrum to skip large portions of data that are irrelevant to a particular query. This can lead to substantial performance improvements, especially for large datasets.
upvoted 5 times
...
andrologin
Most Recent 9 months, 4 weeks ago
Selected Answer: BC
Partioning helps filter the data and columnar storage is optimised for analytical (OLAP) queries
upvoted 1 times
...
pypelyncar
10 months, 4 weeks ago
Selected Answer: BC
Redshift Spectrum is optimized for querying data stored in columnar formats like Parquet or ORC. These formats store each data column separately, allowing Redshift Spectrum to only scan the relevant columns for a specific query, significantly improving performance compared to row-oriented formats Partitioning organizes data files in S3 based on specific column values (e.g., date, region). When your queries filter or join data based on these partitioning columns (common query predicates), Redshift Spectrum can quickly locate the relevant data files, minimizing the amount of data scanned and accelerating query execution
upvoted 3 times
...
d8945a1
12 months ago
Selected Answer: BC
https://aws.amazon.com/blogs/big-data/10-best-practices-for-amazon-redshift-spectrum/
upvoted 1 times
...
certplan
1 year, 1 month ago
2. **Partitioning**: AWS documentation for Amazon Redshift Spectrum highlights the importance of partitioning data based on commonly used query predicates to improve query performance. By partitioning data, Redshift Spectrum can prune unnecessary partitions during query execution, reducing the amount of data scanned and improving overall query performance. This guidance can be found in the AWS documentation for Amazon Redshift Spectrum under "Using Partitioning to Improve Query Performance": https://docs.aws.amazon.com/redshift/latest/dg/c-using-spectrum-partitioning.html
upvoted 1 times
...
certplan
1 year, 1 month ago
1. **Columnar Storage File Format**: According to AWS documentation, columnar storage file formats like Apache Parquet and Apache ORC are recommended for optimizing query performance with Amazon Redshift Spectrum. They state that these formats are highly efficient for selective column reads, which aligns with the way analytical queries typically operate. This can be found in the AWS documentation for Amazon Redshift Spectrum under "Choosing Data Formats": https://docs.aws.amazon.com/redshift/latest/dg/c-using-spectrum.html#spectrum-columnar-storage
upvoted 1 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...
exam
Someone Bought Contributor Access for:
SY0-701
London, 1 minute ago