Exam CCD-410 topic 1 question 21 discussion

Actual exam question from Cloudera's CCD-410

Question #: 21
Topic #: 1

What is the disadvantage of using multiple reducers with the default HashPartitioner and distributing your workload across you cluster?

A. You will not be able to compress the intermediate data.
B. You will longer be able to take advantage of a Combiner.
C. By using multiple reducers with the default HashPartitioner, output files may not be in globally sorted order.
D. There are no concerns with this approach. It is always advisable to use multiple reduces.

Show Suggested Answer

Suggested Answer: C 🗳️
Multiple reducers and total ordering
If your sort job runs with multiple reducers (either because mapreduce.job.reduces in mapred-site.xml has been set to a number larger than 1, or because you've used the -r option to specify the number of reducers on the command-line), then by default Hadoop will use the HashPartitioner to distribute records across the reducers. Use of the HashPartitioner means that you can't concatenate your output files to create a single sorted output file. To do this you'll need total ordering,
Reference: Sorting text files with MapReduce

by johikepe at April 5, 2025, 1:50 p.m.

Comments

Submit Cancel

Currently there are no comments in this discussion, be the first to comment!