Welcome to ExamTopics
ExamTopics Logo
- Expert Verified, Online, Free.

Unlimited Access

Get Unlimited Contributor Access to the all ExamTopics Exams!
Take advantage of PDF Files for 1000+ Exams along with community discussions and pass IT Certification Exams Easily.

Exam Professional Data Engineer topic 1 question 3 discussion

Actual exam question from Google's Professional Data Engineer
Question #: 3
Topic #: 1
[All Professional Data Engineer Questions]

You designed a database for patient records as a pilot project to cover a few hundred patients in three clinics. Your design used a single database table to represent all patients and their visits, and you used self-joins to generate reports. The server resource utilization was at 50%. Since then, the scope of the project has expanded. The database must now store 100 times more patient records. You can no longer run the reports, because they either take too long or they encounter errors with insufficient compute resources. How should you adjust the database design?

  • A. Add capacity (memory and disk space) to the database server by the order of 200.
  • B. Shard the tables into smaller ones based on date ranges, and only generate reports with prespecified date ranges.
  • C. Normalize the master patient-record table into the patient table and the visits table, and create other necessary tables to avoid self-join.
  • D. Partition the table into smaller tables, with one for each clinic. Run queries against the smaller table pairs, and use unions for consolidated reports.
Show Suggested Answer Hide Answer
Suggested Answer: C 🗳️

Comments

Chosen Answer:
This is a voting comment (?) , you can switch to a simple comment.
Switch to a voting comment New
Hanakon
Highly Voted 3 years, 3 months ago
C - based on Google documentation, self-join is an anti-pattern: https://cloud.google.com/bigquery/docs/best-practices-performance-patterns
upvoted 29 times
awssp12345
2 years, 9 months ago
Super helpful! Thanks. :)
upvoted 1 times
...
...
[Removed]
Highly Voted 4 years ago
Correct: C is correct because this option provides the least amount of inconvenience over using pre-specified date ranges or one table per clinic while also increasing performance due to avoiding self-joins.
upvoted 12 times
...
rocky48
Most Recent 4 months, 3 weeks ago
Selected Answer: C
Normalization is a technique used to organize data in a relational database to reduce data redundancy and improve data integrity. Breaking the patient records into separate tables (patient and visits) and eliminating self-joins will make the database more scalable and improve query performance. It also helps maintain data integrity and makes it easier to manage large datasets efficiently. Options A, B, and D may provide some benefits in specific cases, but for a scenario where the project scope has expanded significantly and there are performance issues with self-joins, normalization (Option C) is the most robust and scalable solution.
upvoted 1 times
...
rtcpost
5 months, 1 week ago
Selected Answer: C
Normalization is a technique used to organize data in a relational database to reduce data redundancy and improve data integrity. Breaking the patient records into separate tables (patient and visits) and eliminating self-joins will make the database more scalable and improve query performance. It also helps maintain data integrity and makes it easier to manage large datasets efficiently. Options A, B, and D may provide some benefits in specific cases, but for a scenario where the project scope has expanded significantly and there are performance issues with self-joins, normalization (Option C) is the most robust and scalable solution.
upvoted 2 times
...
Victor2087
6 months ago
Hello Everyone, I am going to take this exam. It would be a great help if anyone could forward the full deck of questions to this email id - [email protected]. Hoping to hear something good soon :) Thanks a ton.
upvoted 1 times
SB5007
6 months ago
try here https://www.passnexam.com/google/professional-data-engineer
upvoted 2 times
...
ALLYDAN
6 months ago
Did anyone send this to you, would you please forward it to me [email protected]
upvoted 1 times
...
...
Anushka0712
7 months, 3 weeks ago
Hi everyone, I am going to appear for this exam on 25th August. It would be a great help if anyone could forward the full set of questions to this email id: [email protected]. Thanks in advance.
upvoted 1 times
...
vaga1
10 months, 1 week ago
Selected Answer: C
"100 times more patient records"immediately brings to create a patient dimensional table to save space on disk if a generical relational database is mentioned.
upvoted 1 times
...
C - https://cloud.google.com/bigquery/docs/best-practices-performance-patterns
upvoted 1 times
...
bha11111
1 year ago
Selected Answer: C
C- This is correct have verified from different sources
upvoted 1 times
...
Morock
1 year, 1 month ago
Selected Answer: C
Should be C. Basic ER design...
upvoted 1 times
...
GCPpro
1 year, 2 months ago
c - is the correct one.
upvoted 1 times
...
testoneAZ
1 year, 2 months ago
C should be the correct answer
upvoted 1 times
...
Brillianttyagi
1 year, 3 months ago
Selected Answer: C
C- Is the correct answer!
upvoted 1 times
...
Arkon88
2 years ago
Selected Answer: C
C - based on Google documentation, self-join is an anti-pattern: https://cloud.google.com/bigquery/docs/best-practices-performance-patterns
upvoted 2 times
...
ch1nczyk
2 years, 1 month ago
Selected Answer: C
Correct
upvoted 1 times
...
samdhimal
2 years, 2 months ago
correct answer -> Normalize the master patient-record table into the patient table and the visits table, and create other necessary tables to avoid self-join. Avoid self-join at all cost because that's what google says. Reference: https://cloud.google.com/bigquery/docs/best-practices-performance-patterns
upvoted 3 times
samdhimal
1 year, 2 months ago
Normalizing the database design will help to minimize data redundancy and improve the efficiency of the queries. By separating the patient and visit information into separate tables, the database will be able to handle the increased number of records and generate reports more efficiently, because the self-joins will no longer be required. Option A is not a good solution because adding more capacity to the server will not address the underlying problem of the database design, and it may not be sufficient to handle the increased data volume. Option B is not a good solution because it limits the flexibility of the queries and reports, and it may not be sufficient to handle the increased data volume. Option D is not a good solution because partitioning the table into smaller tables may lead to data redundancy and it may not be sufficient to handle the increased data volume.
upvoted 2 times
...
...
MaxNRG
2 years, 4 months ago
C is correct because this option provides the least amount of inconvenience over using pre-specified date ranges or one table per clinic while also increasing performance due to avoiding self-joins. A is not correct because adding additional compute resources is not a recommended way to resolve database schema problems. B is not correct because this will reduce the functionality of the database and make running reports more difficult. D is not correct because this will likely increase the number of tables so much that it will be more difficult to generate reports vs. the correct option. https://cloud.google.com/bigquery/docs/best-practices-performance-patterns https://cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax#explicit-alias-visibility
upvoted 6 times
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...