exam questions

Exam CCD-410 All Questions

View all questions & answers for the CCD-410 exam

Exam CCD-410 topic 1 question 17 discussion

Actual exam question from Cloudera's CCD-410
Question #: 17
Topic #: 1
[All CCD-410 Questions]

Which best describes how TextInputFormat processes input files and line breaks?

  • A. Input file splits may cross line breaks. A line that crosses file splits is read by the RecordReader of the split that contains the beginning of the broken line.
  • B. Input file splits may cross line breaks. A line that crosses file splits is read by the RecordReaders of both splits containing the broken line.
  • C. The input file is split exactly at the line breaks, so each RecordReader will read a series of complete lines.
  • D. Input file splits may cross line breaks. A line that crosses file splits is ignored.
  • E. Input file splits may cross line breaks. A line that crosses file splits is read by the RecordReader of the split that contains the end of the broken line.
Show Suggested Answer Hide Answer
Suggested Answer: E 🗳️
As the Map operation is parallelized the input file set is first split to several pieces called FileSplits. If an individual file is so large that it will affect seek time it will be split to several Splits. The splitting does not know anything about the input file's internal logical structure, for example line-oriented text files are split on arbitrary byte boundaries. Then a new map task is created per FileSplit.
When an individual map task starts it will open a new output writer per configured reduce task. It will then proceed to read its FileSplit using the RecordReader it gets from the specified InputFormat. InputFormat parses the input and generates key-value pairs. InputFormat must also handle records that may be split on the
FileSplit boundary. For example TextInputFormat will read the last line of the FileSplit past the split boundary and, when reading other than the first FileSplit,
TextInputFormat ignores the content up to the first newline.
Reference: How Map and Reduce operations are actually carried out

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
Currently there are no comments in this discussion, be the first to comment!
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...