exam questions

Exam AWS Certified Data Analytics - Specialty All Questions

View all questions & answers for the AWS Certified Data Analytics - Specialty exam

Exam AWS Certified Data Analytics - Specialty topic 1 question 115 discussion

A company analyzes historical data and needs to query data that is stored in Amazon S3. New data is generated daily as .csv files that are stored in Amazon S3.
The company's analysts are using Amazon Athena to perform SQL queries against a recent subset of the overall data. The amount of data that is ingested into
Amazon S3 has increased substantially over time, and the query latency also has increased.
Which solutions could the company implement to improve query performance? (Choose two.)

  • A. Use MySQL Workbench on an Amazon EC2 instance, and connect to Athena by using a JDBC or ODBC connector. Run the query from MySQL Workbench instead of Athena directly.
  • B. Use Athena to extract the data and store it in Apache Parquet format on a daily basis. Query the extracted data.
  • C. Run a daily AWS Glue ETL job to convert the data files to Apache Parquet and to partition the converted files. Create a periodic AWS Glue crawler to automatically crawl the partitioned data on a daily basis.
  • D. Run a daily AWS Glue ETL job to compress the data files by using the .gzip format. Query the compressed data.
  • E. Run a daily AWS Glue ETL job to compress the data files by using the .lzo format. Query the compressed data.
Show Suggested Answer Hide Answer
Suggested Answer: CE 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
irene7
Highly Voted 3 years, 5 months ago
Selected Answer: CE
lzo format focuses on high compression & decompression speed. So C & E
upvoted 13 times
jove
1 year, 1 month ago
The bzip2 and LZO compression formats are splittable, but are not recommended if you want performance and compatibility.
upvoted 1 times
...
...
priyashri_13
Highly Voted 3 years, 3 months ago
C & D . As per the link https://aws.amazon.com/blogs/big-data/top-10-performance-tuning-tips-for-amazon-athena/ preferred compression is Gzip . For Athena, we recommend using either Apache Parquet or Apache ORC, which compress data by default and are splittable. When they are not an option, then try BZip2 or Gzip with an optimal file size.
upvoted 12 times
rudramadhu
2 years, 9 months ago
Agree with C & D. Please refer - https://aws.amazon.com/blogs/big-data/top-10-performance-tuning-tips-for-amazon-athena/ For Athena, we recommend using either Apache Parquet or Apache ORC, which compress data by default and are splittable
upvoted 1 times
...
...
akarsh17
Most Recent 1 year, 6 months ago
Selected Answer: CE
C & E are correct answers
upvoted 1 times
...
LocalHero
1 year, 6 months ago
Generrally, high compression equal low speed decompression. gzip is more high compression than lzo. so I choose E. C and E are correct I think.
upvoted 1 times
...
SKIRAR
1 year, 8 months ago
C AND D ARE THE CORRECT ANSWERS
upvoted 1 times
...
zbyroger0902
1 year, 8 months ago
Selected Answer: CD
Seems no one mentioned that LZO is splitable on text file while gzip is not. When we use parquet format, we intend to utilize its features that it is splitable, compressible. CSV are file is one example of the text file. So it seems LZO should be chosen over Gzip. Refer to https://aws.amazon.com/cn/blogs/big-data/top-10-performance-tuning-tips-for-amazon-athena/
upvoted 1 times
zbyroger0902
1 year, 8 months ago
My bad, I meant CE
upvoted 1 times
...
...
AbNada
1 year, 8 months ago
C & D is the correct Answer for me
upvoted 1 times
...
whenthan
1 year, 9 months ago
Selected Answer: CE
https://docs.aws.amazon.com/whitepapers/latest/aws-glue-best-practices-build-performant-data-pipeline/building-a-performance-efficient-data-pipeline.html
upvoted 1 times
...
wally_1995
1 year, 10 months ago
D and E would not convert the data to a column format, so I am not sure why and how it would help improve the latency, therefore only viable options are B (use Athena to unload parquet files), and C (use glue to convert to parquet and partition the data). Also the questions asks to choose two solutions.
upvoted 1 times
...
pk349
2 years ago
CE: I passed the test
upvoted 1 times
...
rags1482
2 years, 1 month ago
GZip is often a good choice for cold data, which is accessed infrequently. Snappy or LZO are a better choice for hot data, which is accessed frequently.
upvoted 2 times
...
rocky48
2 years, 5 months ago
Selected Answer: CE
Answer should be C and E.
upvoted 1 times
...
cloudlearnerhere
2 years, 6 months ago
Correct answers are C & D as AWS recommends using either Parquet or ORC columnar data stores and BZips or Gzip compression. For Athena, we recommend using either Apache Parquet or Apache ORC, which compress data by default and are splittable. When they are not an option, then try BZip2 or Gzip with an optimal file size. Option A is wrong as MySQL Workbench does not improve query performance. It instead increases operational overhead. Option B is wrong as Athena is not ideal for running batch jobs. Use Glue instead. Option E is wrong as although .lzo is supported and can provide better compression and decompression speeds, Gzip is recommended for Athena as per the AWS documentation.
upvoted 6 times
...
Nubosperta
2 years, 7 months ago
Selected Answer: CE
E over D: https://docs.aws.amazon.com/athena/latest/ug/compression-formats.html It is stated that Lzo compression format has less compression than Gzip, but this makes decompression much faster, which will result in improved query performance and since that's what's needed my answer is E. There is no cost concern in the question so it doesn't matter if files stored in s3 are bigger with Lzo compression.
upvoted 1 times
...
he11ow0rId
2 years, 8 months ago
Selected Answer: CD
cd as exp[lained by others
upvoted 1 times
...
Bik000
2 years, 12 months ago
Selected Answer: BC
Answer should be B & C
upvoted 1 times
...
Shammy45
3 years ago
Selected Answer: CD
GZIP is default compression format for Parquet
upvoted 1 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...
exam
Someone Bought Contributor Access for:
SY0-701
London, 1 minute ago