exam questions

Exam Professional Cloud Database Engineer All Questions

View all questions & answers for the Professional Cloud Database Engineer exam

Exam Professional Cloud Database Engineer topic 1 question 14 discussion

Actual exam question from Google's Professional Cloud Database Engineer
Question #: 14
Topic #: 1
[All Professional Cloud Database Engineer Questions]

Your ecommerce website captures user clickstream data to analyze customer traffic patterns in real time and support personalization features on your website. You plan to analyze this data using big data tools. You need a low-latency solution that can store 8 TB of data and can scale to millions of read and write requests per second. What should you do?

  • A. Write your data into Bigtable and use Dataproc and the Apache Hbase libraries for analysis.
  • B. Deploy a Cloud SQL environment with read replicas for improved performance. Use Datastream to export data to Cloud Storage and analyze with Dataproc and the Cloud Storage connector.
  • C. Use Memorystore to handle your low-latency requirements and for real-time analytics.
  • D. Stream your data into BigQuery and use Dataproc and the BigQuery Storage API to analyze large volumes of data.
Show Suggested Answer Hide Answer
Suggested Answer: A 🗳️

Comments

Chosen Answer:
This is a voting comment (?). It is better to Upvote an existing comment if you don't have anything to add.
Switch to a voting comment New
dynamic_dba
Highly Voted 1 year, 7 months ago
A. Cloud SQL could not handle the load, so B is wrong. Memorystore can scale up to 300 GB. The question mentions needing 8 TB, so C must be wrong. BigQuery could not handle the latency requirements of the question, which leaves A. Bigtable could handle the volume of writes at the speeds required.
upvoted 7 times
...
pk349
Highly Voted 1 year, 10 months ago
A: Write your data into Bigtable ***** and use Dataproc and the Apache Hbase libraries for analysis.
upvoted 6 times
...
Jason_Cloud_at
Most Recent 8 months, 1 week ago
Selected Answer: A
Bigtable is ideal for clickstream and IOT use cases, also it can process high performance read and writes globally.
upvoted 4 times
...
ToniTovar
8 months, 2 weeks ago
Selected Answer: D
This option uses BigQuery, that has a low latency and is a big data
upvoted 2 times
...
VG1900
11 months, 4 weeks ago
Selected Answer: D
A is not correct because Bigtable is not designed for real-time analytics. It is a good choice for storing and retrieving small amounts of data quickly, but it is not as efficient for analyzing large volumes of data. B is not correct because it cannot support Million of Read and Write C is not correct because of storage limitation D is correct
upvoted 4 times
...
goodsport
1 year, 1 month ago
Selected Answer: A
I would opt for A.
upvoted 2 times
ArtistS
10 months, 3 weeks ago
Why opt A? It is not real-time, and the question mentions that they want to analysis, why not use bigquery, only 8 TB
upvoted 1 times
...
...
learnazureportal
1 year, 1 month ago
The correct answer is D. Stream your data into BigQuery and use Dataproc and the BigQuery Storage API to analyze large volumes of data.. A is used for NOSQL
upvoted 2 times
...
CloudKida
1 year, 4 months ago
Selected Answer: A
At a high level, Bigtable is a NoSQL wide-column database. It's optimized for low latency, large numbers of reads and writes, and maintaining performance at scale. Bigtable use cases are of a certain scale or throughput with strict latency requirements, such as IoT, AdTech, FinTech, and so on. If high throughput and low latency at scale are not priorities for you, then another NoSQL database like Firestore might be a better fit.
upvoted 2 times
...
Hilab
1 year, 7 months ago
B. Normalize the data model. D. Promote high-cardinality attributes in multi-attribute primary keys. When designing a schema for Cloud Spanner, it is important to follow best practices to avoid hotspots and ensure optimal performance. Hotspots occur when too many requests are targeted at a single node or group of nodes, causing them to become overloaded and potentially impacting performance.
upvoted 1 times
Hilab
1 year, 7 months ago
Normalization is a recommended best practice in database schema design, including in Cloud Spanner. It involves breaking down large tables into smaller, more manageable tables that are linked together by relationships. This can help reduce duplication of data and improve performance by reducing the amount of data that needs to be read or written to the database. Promoting high-cardinality attributes in multi-attribute primary keys is also recommended in Cloud Spanner schema design. High-cardinality attributes are those that have a large number of distinct values, such as product IDs or customer IDs. Including these attributes in the primary key can help distribute data more evenly across nodes, reducing the likelihood of hotspots. Using an auto-incrementing value as the primary key or a bit-reverse sequential value as the primary key can result in hotspots, particularly if new data is being added at a high rate. These approaches can cause all new data to be inserted into a single node, leading to performance issues.
upvoted 1 times
Hilab
1 year, 7 months ago
The above answer is for Question #15, my mistake I put the comments here
upvoted 1 times
...
...
...
Hilab
1 year, 7 months ago
D. Stream your data into BigQuery and use Dataproc and the BigQuery Storage API to analyze large volumes of data. BigQuery is a fully managed, serverless data warehouse that allows you to store and analyze large datasets using SQL-like queries. It is designed to handle petabyte-scale data and is optimized for fast query performance. By streaming your clickstream data into BigQuery, you can store and process large amounts of data in real-time. Dataproc, on the other hand, is a fully-managed cloud service for running Apache Hadoop and Spark clusters. It provides a managed, easy-to-use environment for data processing, which can be used to analyze the data stored in BigQuery. The BigQuery Storage API allows you to directly access data stored in BigQuery from external applications, including Dataproc, which enables you to run advanced analytics on large volumes of data with low latency. This approach provides a scalable, low-latency solution for storing and analyzing large volumes of data, making it a good fit for your requirements.
upvoted 1 times
...
H_S
1 year, 7 months ago
Selected Answer: A
A. Write your data into Bigtable and use Dataproc and the Apache Hbase libraries for analysis. Most Voted
upvoted 2 times
...
Nirca
1 year, 7 months ago
Selected Answer: A
A looks like best option.
upvoted 2 times
...
GCP72
1 year, 10 months ago
Selected Answer: A
A is correct answer, C wouldn't be handled 8TB data Scalable: Start with the lowest tier and smallest size and then grow your instance as needed. Memorystore provides automated scaling using APIs, and optimized node placement across zones for redundancy. Memorystore for Memcached can support clusters as large as 5 TB, enabling millions of QPS at very low latency
upvoted 3 times
...
Kloudgeek
1 year, 10 months ago
Answer is A. Click stream and time series data and the size is 8TB. Read low latency with reads and writes. Correct answer is A to use BigTable for storage and use either CBT or Hbase API to interact with data.
upvoted 5 times
...
fredcaram
1 year, 10 months ago
Selected Answer: A
B couldn't handle this volume of writes and read, D wouldn't be able to handle the writing and C wouldn't be suited for this.
upvoted 3 times
...
juancambb
1 year, 10 months ago
Selected Answer: A
must be A
upvoted 2 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...
exam
Someone Bought Contributor Access for:
SY0-701
London, 1 minute ago