You want to build a managed Hadoop system as your data lake. The data transformation process is composed of a series of Hadoop jobs executed in sequence.
To accomplish the design of separating storage from compute, you decided to use the Cloud Storage connector to store all input data, output data, and intermediary data. However, you noticed that one Hadoop job runs very slowly with Cloud Dataproc, when compared with the on-premises bare-metal Hadoop environment (8-core nodes with 100-GB RAM). Analysis shows that this particular Hadoop job is disk I/O intensive. You want to resolve the issue. What should you do?
[Removed]
Highly Voted 4 years, 7 months agoRajokkiyam
Highly Voted 4 years, 7 months agoChelseajcole
3 years, 1 month agoMaxNRG
Most Recent 10 months, 2 weeks agoMaxNRG
10 months, 2 weeks agosquishy_fishy
1 year agobarnac1es
1 year, 1 month agoDeepakVenkatachalam
1 year, 1 month agoDeepakVenkatachalam
1 year, 1 month agoarien_chen
1 year, 2 months agovamgcp
1 year, 3 months agozellck
1 year, 11 months agoJohn_Pongthorn
2 years, 1 month agorrr000
2 years, 2 months agorrr000
2 years, 2 months agoSoerenE
2 years, 9 months agomedeis_jar
2 years, 10 months agoJG123
2 years, 11 months agoRT30
3 years, 7 months agoashuchip
3 years, 10 months agoAlasmindas
3 years, 11 months ago