Welcome to ExamTopics
ExamTopics Logo
- Expert Verified, Online, Free.

Exam Professional Data Engineer topic 2 question 78 discussion

Actual exam question from Google's Professional Data Engineer
Question #: 78
Topic #: 2
[All Professional Data Engineer Questions]

What is the recommended action to do in order to switch between SSD and HDD storage for your Google Cloud Bigtable instance?

  • A. create a third instance and sync the data from the two storage types via batch jobs
  • B. export the data from the existing instance and import the data into a new instance
  • C. run parallel instances where one is HDD and the other is SDD
  • D. the selection is final and you must resume using the same storage type
Show Suggested Answer Hide Answer
Suggested Answer: B 🗳️
When you create a Cloud Bigtable instance and cluster, your choice of SSD or HDD storage for the cluster is permanent. You cannot use the Google Cloud
Platform Console to change the type of storage that is used for the cluster.
If you need to convert an existing HDD cluster to SSD, or vice-versa, you can export the data from the existing instance and import the data into a new instance.

Alternatively, you can write -
a Cloud Dataflow or Hadoop MapReduce job that copies the data from one instance to another.
Reference: https://cloud.google.com/bigtable/docs/choosing-ssd-hdd

Comments

Chosen Answer:
This is a voting comment (?) , you can switch to a simple comment.
Switch to a voting comment New
nez15
Highly Voted 2 years, 7 months ago
QUESTION 2 Your company built a TensorFlow neutral-network model with a large number of neurons and layers. Themodel fits well for the training data. However, when tested against new data, it performs poorly. What method can you employ to address this? A. Threading B. Serialization C. Dropout Methods D. Dimensionality Reduction Correct Answer: C
upvoted 9 times
...
nez15
Highly Voted 2 years, 7 months ago
QUESTION 16 You need to store and analyze social media postings in Google BigQuery at a rate of 10,000 messages per minute in near real-time. Initially, design the application to use streaming inserts for individual postings. Your application also performs data aggregations right after the streaming inserts. You discover that the queries after streaming inserts do not exhibit strong consistency, and reports from the queries might miss in-flight data. How can you adjust your application design? A. Re-write the application to load accumulated data every 2 minutes. B. Convert the streaming insert code to batch load for individual messages. C. Load the original message to Google Cloud SQL, and export the table every hour to BigQuery via streaming inserts. D. Estimate the average latency for data availability after streaming inserts, and always run queries after waiting twice as long. Correct Answer: D
upvoted 8 times
...
anji007
Most Recent 10 months ago
Ans: B
upvoted 2 times
...
daghayeghi
1 year, 5 months ago
B: updated link: https://cloud.google.com/bigtable/docs/choosing-ssd-hdd#switching
upvoted 4 times
...
[Removed]
2 years, 4 months ago
Answer: B Description: Once storage option is created you need to create new one and move data
upvoted 4 times
...
[Removed]
2 years, 4 months ago
Answer : B https://cloud.google.com/bigtable/docs/choosing-ssd-hdd#switching
upvoted 4 times
...
nez15
2 years, 7 months ago
QUESTION 164 You have data pipelines running on BigQuery, Cloud Dataflow, and Cloud Dataproc. You need to perform health checks and monitor their behavior, and then notify the team managing the pipelines if they fail. You also need to be able to work across multiple projects. Your preference is to use managed products of features of the platform. What should you do? A. Export the information to Cloud Stackdriver, and set up an Alerting policy B. Run a Virtual Machine in Compute Engine with Airflow, and export the information to Stackdriver C. Export the logs to BigQuery, and set up App Engine to read that information and send emails if you find a failure in the logs D. Develop an App Engine application to consume logs using GCP API calls, and send emails if you find a failure in the logs Correct Answer: B
upvoted 2 times
godot
2 years, 5 months ago
"Your preference is to use managed products of features of the platform." doesn't seem to favor answer B...
upvoted 1 times
...
priyam
2 years, 4 months ago
Answer is A. Preference is managed products . Bigquery,dataflow, data proc can be configured to export logs to stack driver
upvoted 7 times
...
...
nez15
2 years, 7 months ago
QUESTION 163 C. Specify customer-supplied encryption key (CSEK) in the .boto configuration file. Use gsutil cp to upload each archival file to the Cloud Storage bucket. Save the CSEK in Cloud Memorystore as permanent storage of the secret. D. Specify customer-supplied encryption key (CSEK) in the .boto configuration file. Use gsutil cp to upload each archival file to the Cloud Storage bucket. Save the CSEK in a different project that only the security team can access. Correct Answer: B
upvoted 1 times
lcgcastro96
2 years, 5 months ago
IMO, if its an TNO approach then one should avoid KMS and supply their own key which is kept in-premises rather than in the cloud, being thus more secure aware. Between C and D, I would choose D, since saving the keys in Memorystore does not make that much sense.
upvoted 2 times
...
...
nez15
2 years, 7 months ago
QUESTION 163 You want to archive data in Cloud Storage. Because some data is very sensitive, you want to use the “Trust No One” (TNO) approach to encrypt your data to prevent the cloud provider staff from decrypting your data. What should you do? A. Use gcloud kms keys create to create a symmetric key. Then use gcloud kms encrypt to encrypt each archival file with the key and unique additional authenticated data (AAD). Use gsutil cp to upload each encrypted file to the Cloud Storage bucket, and keep the AAD outside of Google Cloud. B. Use gcloud kms keys create to create a symmetric key. Then use gcloud kms encrypt to encrypt each archival file with the key. Use gsutil cp to upload each encrypted file to the Cloud Storage bucket. Manually destroy the key previously used for encryption, and rotate the key once.
upvoted 1 times
timolo
1 year, 1 month ago
Answer is A: https://cloud.google.com/kms/docs/additional-authenticated-data
upvoted 1 times
...
...
nez15
2 years, 7 months ago
QUESTION 162 You need to choose a database to store time series CPU and memory usage for millions of computers. You need to store this data in one-second interval samples. Analysts will be performing real-time, ad hoc analytics against the database. You want to avoid being charged for every query executed and ensure that the schema design will allow for future growth of the dataset. Which database and data model should you choose? A. Create a table in BigQuery, and append the new samples for CPU and memory to the table B. Create a wide table in BigQuery, create a column for the sample value at each second, and update the row with the interval for each second C. Create a narrow table in Cloud Bigtable with a row key that combines the Computer Engine computer identifier with the sample time at each second D. Create a wide table in Cloud Bigtable with a row key that combines the computer identifier with the sample time at each minute, and combine the values for each second as column data. Correct Answer: D
upvoted 2 times
lcgcastro96
2 years, 5 months ago
Time series data seems to suggest BigTable, as well as the requirement for real-time analytics (BigTable has ~ms latency while Big Query has ~s). Narrow tables are indicated for time-series data that have large number of events per row, as we need here to encompass memory usage for millions of computers. This being said, my pick would be C
upvoted 8 times
priyam
2 years, 4 months ago
Option D is right. As an optimisation we can use short and wide tables https://cloud.google.com/bigtable/docs/schema-design-time-series
upvoted 1 times
daghayeghi
1 year, 5 months ago
but in question said "ad hoc analytics" and optimization is used for the time that we have plan for analyze not suddenly. the C is correct.
upvoted 1 times
...
...
...
...
nez15
2 years, 7 months ago
QUESTION 161 You work for a mid-sized enterprise that needs to move its operational system transaction data from an onpremises database to GCP. The database is about 20 TB in size. Which database should you choose? A. Cloud SQL B. Cloud BigTable C. Cloud Spanner D. Cloud Datastore Correct Answer: A
upvoted 3 times
rosy
2 years, 6 months ago
Should this not be Datastore?
upvoted 3 times
lcgcastro96
2 years, 5 months ago
yes, I agree. Cloud SQL does not scale well up to TB of data, BigTable is not transaction oriented and Spanner would be overkill and expensive (also, no requirements about horizontal scalability are made)
upvoted 2 times
ramukbk
2 years, 1 month ago
for transactional data shouldnt it be Cloud SQL? as it provides unto 30 TB.. bigtable is not transaction oriented and datastore is noSQL..
upvoted 1 times
lcgcastro96
1 year, 11 months ago
when the answer was given, cloud SQL did not scale. Now it does.
upvoted 1 times
...
...
...
...
...
nez15
2 years, 7 months ago
QUESTION 160 You need to choose a database for a new project that has the following requirements: Fully managed Able to automatically scale up Transactionally consistent Able to scale up to 6 TB Able to be queried using SQL Which database do you choose? A. Cloud SQL B. Cloud BigTable C. Cloud Spanner D. Cloud Datastore Correct Answer: C
upvoted 1 times
Fab451
2 years, 5 months ago
I think it's A Cloud SQL (scale up)
upvoted 2 times
godot
2 years, 5 months ago
Spanner is transactionally consistent & scallable. https://cloud.google.com/spanner (see table)
upvoted 2 times
ramukbk
2 years, 1 month ago
A. Cloud SQL.. provides upto 30 TB and ACID
upvoted 1 times
AP73998
1 year ago
It should be C since the question explicitly speaks about having Trans consistency with spanner offers. I don't see it is listed as a feature for Cloud SQL. https://cloud.google.com/sql#all-features
upvoted 1 times
...
...
...
...
...
nez15
2 years, 7 months ago
QUESTION 159 You need to deploy additional dependencies to all of a Cloud Dataproc cluster at startup using an existing initialization action. Company security policies require that Cloud Dataproc nodes do not have access to the Internet so public initialization actions cannot fetch resources. What should you do? A. Deploy the Cloud SQL Proxy on the Cloud Dataproc master B. Use an SSH tunnel to give the Cloud Dataproc cluster access to the Internet C. Copy all dependencies to a Cloud Storage bucket within your VPC security perimeter D. Use Resource Manager to add the service account used by the Cloud Dataproc cluster to the Network User role Correct Answer: D
upvoted 1 times
lcgcastro96
2 years, 5 months ago
I think the correct answer might be C instead, due to https://cloud.google.com/dataproc/docs/concepts/configuring-clusters/network#create_a_cloud_dataproc_cluster_with_internal_ip_address_only "You can create a Dataproc cluster that is isolated from the public internet whose VM instances communicate over a private IP subnetwork (the VM instances will not have public IP addresses). To do this, the subnetwork of the cluster must have Private Google Access enabled to allow cluster nodes to access Google APIs and services, such as Cloud Storage, from internal IPs."
upvoted 4 times
godot
2 years, 5 months ago
I think it's D: Provides access to a shared VPC network https://cloud.google.com/compute/docs/access/iam#compute.networkUser "Once granted, service owners can use VPC networks and subnets that belong to the host project. For example, a network user can create a VM instance that belongs to a host project network but they cannot delete or create new networks in the host project."
upvoted 1 times
godot
2 years, 5 months ago
I think it's D: https://cloud.google.com/compute/docs/access/iam#compute.networkUser " Provides access to a shared VPC network. Once granted, service owners can use VPC networks and subnets that belong to the host project. For example, a network user can create a VM instance that belongs to a host project network but they cannot delete or create new networks in the host project."
upvoted 2 times
...
...
...
...
nez15
2 years, 7 months ago
QUESTION 158 Your team is working on a binary classification problem. You have trained a support vector machine (SVM) classifier with default parameters, and received an area under the Curve (AUC) of 0.87 on the validation set. You want to increase the AUC of the model. What should you do? A. Perform hyperparameter tuning B. Train a classifier with deep neural networks, because neural networks would always beat SVMs C. Deploy the model and measure the real-world AUC; it’s always higher because of generalization D. Scale predictions you get out of the model (tune a scaling factor as a hyperparameter) in order to get the highest AUC Correct Answer: D
upvoted 1 times
lcgcastro96
2 years, 5 months ago
scaling "predictions out of the model" does not make that much sense to me... scaling/normalizing data should be applied before any ML algorithm is trained in order to improve training convergence (thus resulting in better results such as a higher AUC), as a pre-processing step, but scaling binary predictions that are originated by the model seems non-logical and useless. I would go with A, due to the fact that the SVM has "default parameters".
upvoted 6 times
...
...
nez15
2 years, 7 months ago
QUESTION 157 You are planning to migrate your current on-premises Apache Hadoop deployment to the cloud. You need to ensure that the deployment is as fault-tolerant and cost-effective as possible for long-running batch jobs. You want to use a managed service. What should you do? A. Deploy a Cloud Dataproc cluster. Use a standard persistent disk and 50% preemptible workers. Store data in Cloud Storage, and change references in scripts from hdfs:// to gs:// B. Deploy a Cloud Dataproc cluster. Use an SSD persistent disk and 50% preemptible workers. Store data in Cloud Storage, and change references in scripts from hdfs:// to gs:// C. Install Hadoop and Spark on a 10-node Compute Engine instance group with standard instances. Install the Cloud Storage connector, and store the data in Cloud Storage. Change references in scripts from hdfs:// to gs:// D. Install Hadoop and Spark on a 10-node Compute Engine instance group with preemptible instances. Store data in HDFS. Change references in scripts from hdfs:// to gs:// Correct Answer: A
upvoted 4 times
...
nez15
2 years, 7 months ago
QUESTION 156 Your company is selecting a system to centralize data ingestion and delivery. You are considering messaging and data integration systems to address the requirements. The key requirements are: The ability to seek to a particular offset in a topic, possibly back to the start of all data ever captured Support for publish/subscribe semantics on hundreds of topics Retain per-key ordering Which system should you choose? A. Apache Kafka B. Cloud Storage C. Cloud Pub/Sub D. Firebase Cloud Messaging Correct Answer: A
upvoted 6 times
...
nez15
2 years, 7 months ago
QUESTION 155 You plan to deploy Cloud SQL using MySQL. You need to ensure high availability in the event of a zone failure. What should you do? A. Create a Cloud SQL instance in one zone, and create a failover replica in another zone within the same region. B. Create a Cloud SQL instance in one zone, and create a read replica in another zone within the same region. C. Create a Cloud SQL instance in one zone, and configure an external read replica in a zone in a different region. D. Create a Cloud SQL instance in a region, and configure automatic backup to a Cloud Storage bucket in the same region. Correct Answer: C
upvoted 2 times
scarf77
2 years, 6 months ago
Read replicas do not provide failover capability; also, placing a replica in another region is overdoing (we are requested for zone failure). For Cloud SQL with (legacy) MySQL, A is correct. Note: another solution is the standby replica, which is not present in these answers.
upvoted 8 times
...
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...