Exam Professional Cloud Database Engineer All Questions

View all questions & answers for the Professional Cloud Database Engineer exam

Exam Professional Cloud Database Engineer topic 1 question 14 discussion

Actual exam question from Google's Professional Cloud Database Engineer

Question #: 14
Topic #: 1

[All Professional Cloud Database Engineer Questions]

Your ecommerce website captures user clickstream data to analyze customer traffic patterns in real time and support personalization features on your website. You plan to analyze this data using big data tools. You need a low-latency solution that can store 8 TB of data and can scale to millions of read and write requests per second. What should you do?

A. Write your data into Bigtable and use Dataproc and the Apache Hbase libraries for analysis.
B. Deploy a Cloud SQL environment with read replicas for improved performance. Use Datastream to export data to Cloud Storage and analyze with Dataproc and the Cloud Storage connector.
C. Use Memorystore to handle your low-latency requirements and for real-time analytics.
D. Stream your data into BigQuery and use Dataproc and the BigQuery Storage API to analyze large volumes of data.

Show Suggested Answer

Suggested Answer: A 🗳️

by juancambb at Dec. 18, 2022, 9:21 p.m.

Comments

Submit Cancel

dynamic_dba

Highly Voted 1 year, 9 months ago

A. Cloud SQL could not handle the load, so B is wrong. Memorystore can scale up to 300 GB. The question mentions needing 8 TB, so C must be wrong. BigQuery could not handle the latency requirements of the question, which leaves A. Bigtable could handle the volume of writes at the speeds required.

upvoted 7 times

...

pk349

Highly Voted 2 years ago

A: Write your data into Bigtable ***** and use Dataproc and the Apache Hbase libraries for analysis.

upvoted 6 times

...

Jason_Cloud_at

Most Recent 10 months, 2 weeks ago

Selected Answer: A

Bigtable is ideal for clickstream and IOT use cases, also it can process high performance read and writes globally.

upvoted 4 times

...

ToniTovar

10 months, 3 weeks ago

Selected Answer: D

This option uses BigQuery, that has a low latency and is a big data

upvoted 2 times

...

VG1900

1 year, 1 month ago

Selected Answer: D

A is not correct because Bigtable is not designed for real-time analytics. It is a good choice for storing and retrieving small amounts of data quickly, but it is not as efficient for analyzing large volumes of data. B is not correct because it cannot support Million of Read and Write C is not correct because of storage limitation D is correct

upvoted 4 times

...

goodsport

1 year, 3 months ago

Selected Answer: A

I would opt for A.

upvoted 2 times

ArtistS

1 year ago

Why opt A? It is not real-time, and the question mentions that they want to analysis, why not use bigquery, only 8 TB

upvoted 1 times

...

learnazureportal

1 year, 3 months ago

The correct answer is D. Stream your data into BigQuery and use Dataproc and the BigQuery Storage API to analyze large volumes of data.. A is used for NOSQL

upvoted 2 times

...

CloudKida

1 year, 6 months ago

Selected Answer: A

At a high level, Bigtable is a NoSQL wide-column database. It's optimized for low latency, large numbers of reads and writes, and maintaining performance at scale. Bigtable use cases are of a certain scale or throughput with strict latency requirements, such as IoT, AdTech, FinTech, and so on. If high throughput and low latency at scale are not priorities for you, then another NoSQL database like Firestore might be a better fit.

upvoted 2 times

...

Hilab

1 year, 9 months ago

B. Normalize the data model. D. Promote high-cardinality attributes in multi-attribute primary keys. When designing a schema for Cloud Spanner, it is important to follow best practices to avoid hotspots and ensure optimal performance. Hotspots occur when too many requests are targeted at a single node or group of nodes, causing them to become overloaded and potentially impacting performance.

upvoted 1 times

Hilab

1 year, 9 months ago

Normalization is a recommended best practice in database schema design, including in Cloud Spanner. It involves breaking down large tables into smaller, more manageable tables that are linked together by relationships. This can help reduce duplication of data and improve performance by reducing the amount of data that needs to be read or written to the database. Promoting high-cardinality attributes in multi-attribute primary keys is also recommended in Cloud Spanner schema design. High-cardinality attributes are those that have a large number of distinct values, such as product IDs or customer IDs. Including these attributes in the primary key can help distribute data more evenly across nodes, reducing the likelihood of hotspots. Using an auto-incrementing value as the primary key or a bit-reverse sequential value as the primary key can result in hotspots, particularly if new data is being added at a high rate. These approaches can cause all new data to be inserted into a single node, leading to performance issues.

upvoted 1 times

Hilab

1 year, 9 months ago

The above answer is for Question #15, my mistake I put the comments here

upvoted 1 times

...

Hilab

1 year, 9 months ago

D. Stream your data into BigQuery and use Dataproc and the BigQuery Storage API to analyze large volumes of data. BigQuery is a fully managed, serverless data warehouse that allows you to store and analyze large datasets using SQL-like queries. It is designed to handle petabyte-scale data and is optimized for fast query performance. By streaming your clickstream data into BigQuery, you can store and process large amounts of data in real-time. Dataproc, on the other hand, is a fully-managed cloud service for running Apache Hadoop and Spark clusters. It provides a managed, easy-to-use environment for data processing, which can be used to analyze the data stored in BigQuery. The BigQuery Storage API allows you to directly access data stored in BigQuery from external applications, including Dataproc, which enables you to run advanced analytics on large volumes of data with low latency. This approach provides a scalable, low-latency solution for storing and analyzing large volumes of data, making it a good fit for your requirements.

upvoted 1 times

...

H_S

1 year, 10 months ago

Selected Answer: A

A. Write your data into Bigtable and use Dataproc and the Apache Hbase libraries for analysis. Most Voted

upvoted 2 times

...

Nirca

1 year, 10 months ago

Selected Answer: A

A looks like best option.

upvoted 2 times

...

GCP72

2 years ago

Selected Answer: A

A is correct answer, C wouldn't be handled 8TB data Scalable: Start with the lowest tier and smallest size and then grow your instance as needed. Memorystore provides automated scaling using APIs, and optimized node placement across zones for redundancy. Memorystore for Memcached can support clusters as large as 5 TB, enabling millions of QPS at very low latency

upvoted 3 times

...

Kloudgeek

2 years ago

Answer is A. Click stream and time series data and the size is 8TB. Read low latency with reads and writes. Correct answer is A to use BigTable for storage and use either CBT or Hbase API to interact with data.

upvoted 5 times

...

fredcaram

2 years ago

Selected Answer: A

B couldn't handle this volume of writes and read, D wouldn't be able to handle the writing and C wouldn't be suited for this.

upvoted 3 times

...

juancambb

2 years ago

Selected Answer: A

must be A

upvoted 2 times

...