exam questions

Exam AWS Certified Big Data - Specialty All Questions

View all questions & answers for the AWS Certified Big Data - Specialty exam

Exam AWS Certified Big Data - Specialty topic 2 question 10 discussion

Exam question from Amazon's AWS Certified Big Data - Specialty
Question #: 10
Topic #: 2
[All AWS Certified Big Data - Specialty Questions]

An organization currently runs a large Hadoop environment in their data center and is in the process of creating an alternative Hadoop environment on AWS, using Amazon EMR.
They generate around 20 TB of data on a monthly basis. Also on a monthly basis, files need to be grouped and copied to Amazon S3 to be used for the Amazon
EMR environment. They have multiple S3 buckets across AWS accounts to which data needs to be copied. There is a 10G AWS Direct Connect setup between their data center and AWS, and the network team has agreed to allocate 50% of AWS Direct Connect bandwidth to data transfer. The data transfer cannot take more than two days.
What would be the MOST efficient approach to transfer data to AWS on a monthly basis?

  • A. Use an offline copy method, such as an AWS Snowball device, to copy and transfer data to Amazon S3.
  • B. Configure a multipart upload for Amazon S3 on AWS Java SDK to transfer data over AWS Direct Connect.
  • C. Use Amazon S3 transfer acceleration capability to transfer data to Amazon S3 over AWS Direct Connect.
  • D. Setup S3DistCop tool on the on-premises Hadoop environment to transfer data to Amazon S3 over AWS Direct Connect.
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
yogesh88
3 years, 6 months ago
Just attempted exam, This is typo. In the exam, s3distcp was properly mentioned. I selected D
upvoted 1 times
...
awane
3 years, 7 months ago
There are no S3DistCOP tools (It is rather called S3DistCp), so the answer is B
upvoted 1 times
...
k115
3 years, 7 months ago
D is the right answer
upvoted 1 times
...
srirampc
3 years, 7 months ago
D. since the transfer is from on-premise hadoop S3DistCp would work from hadoop to S3.
upvoted 1 times
...
viru
3 years, 7 months ago
D https://forums.aws.amazon.com/thread.jspa?threadID=120522 https://d0.awsstatic.com/whitepapers/aws-amazon-emr-best-practices.pdf
upvoted 1 times
...
Bulti
3 years, 7 months ago
Answer D: https://blog.ippon.tech/aws-white-paper-in-5-minutes-or-less-best-practices-for-amazon-emr/
upvoted 3 times
...
susan8840
3 years, 7 months ago
B makes senses. not D since S3DistCp is used to copy large amounts of data from Amazon S3 into HDFS. the issue here is from on-premise to S3
upvoted 2 times
DerekKey
3 years, 6 months ago
bot true: You can also use S3DistCp to copy data from HDFS to Amazon S3. S3DistCp is more scalable and efficient for parallel copying large numbers of objects across buckets and across AWS accounts. S3DistCp is the same as the Hadoop binary DistCp, except it takes advantage of multi-part upload to S3 for larger files. Hadoop is optimized for large file blocks, so it is usually best to use S3DistCp for copying HDFS files from an external data center or local disk to S3 to take advantage of this optimization.
upvoted 1 times
...
...
san2020
3 years, 8 months ago
my selection D
upvoted 3 times
...
bigdatalearner
3 years, 8 months ago
D. S3DistCop is the right answer and it's used for moving big data from Hadoop to s3 , S3 to hadopp or from s3 to s3 and at backend it uses map reduce job. Multipart upload is not the best choice here
upvoted 4 times
...
jlpl
3 years, 8 months ago
Vote D for now
upvoted 4 times
mattyb123
3 years, 8 months ago
Answer is D. There is a typo.
upvoted 5 times
...
...
mattyb123
3 years, 8 months ago
Confirmed. 1.https://toolstud.io/data/filesize.php?speed=5&speed_unit=Gbps&duration=12&duration_unit=hours&compare=harddisk 2.https://aws.amazon.com/snowball/faqs/#Q.3A_How_long_does_it_take_to_transfer_my_data.3F: Q: When should I consider using Snowball instead of AWS Direct Connect? AWS Direct Connect provides you with dedicated, fast connections from your premises to the AWS network. If you need to transfer large quantities of data to AWS on an ongoing basis, AWS Direct Connect might be the right choice.
upvoted 1 times
mattyb123
3 years, 8 months ago
Anyone disagree with B. I did select this answer last time but the numbers seem to add up?
upvoted 1 times
mattyb123
3 years, 8 months ago
By reviewing https://d0.awsstatic.com/whitepapers/aws-amazon-emr-best-practices.pdf looks like D is the correct answer.
upvoted 2 times
kttttt
3 years, 7 months ago
More info: http://www.thecloudxperts.co.uk/moving-large-amounts-of-data-from-hdfs-data-center-to-amazon-s3-using-s3distcp Two Important tools to move data—S3DistCp and DistCp—can help you move data stored on your local (data center) HDFS storage to Amazon S3.
upvoted 2 times
...
...
...
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...