You want to build a managed Hadoop system as your data lake. The data transformation process is composed of a series of Hadoop jobs executed in sequence.
To accomplish the design of separating storage from compute, you decided to use the Cloud Storage connector to store all input data, output data, and intermediary data. However, you noticed that one Hadoop job runs very slowly with Cloud Dataproc, when compared with the on-premises bare-metal Hadoop environment (8-core nodes with 100-GB RAM). Analysis shows that this particular Hadoop job is disk I/O intensive. You want to resolve the issue. What should you do?
Rajokkiyam
Highly Voted 4 years, 9 months agoChelseajcole
3 years, 3 months agoAlasmindas
Highly Voted 4 years, 2 months agoMaxNRG
Most Recent 1 year agoMaxNRG
1 year agosquishy_fishy
1 year, 2 months agobarnac1es
1 year, 3 months agoDeepakVenkatachalam
1 year, 4 months agoDeepakVenkatachalam
1 year, 3 months agoarien_chen
1 year, 4 months agovamgcp
1 year, 5 months agozellck
2 years, 1 month agoJohn_Pongthorn
2 years, 3 months agorrr000
2 years, 4 months agorrr000
2 years, 4 months agoSoerenE
3 years agomedeis_jar
3 years agoJG123
3 years, 1 month agoRT30
3 years, 10 months agoashuchip
4 years agoharoldbenites
4 years, 4 months ago