Welcome to ExamTopics
ExamTopics Logo
- Expert Verified, Online, Free.

Unlimited Access

Get Unlimited Contributor Access to the all ExamTopics Exams!
Take advantage of PDF Files for 1000+ Exams along with community discussions and pass IT Certification Exams Easily.

Exam Professional Data Engineer topic 1 question 168 discussion

Actual exam question from Google's Professional Data Engineer
Question #: 168
Topic #: 1
[All Professional Data Engineer Questions]

You work for a financial institution that lets customers register online. As new customers register, their user data is sent to Pub/Sub before being ingested into
BigQuery. For security reasons, you decide to redact your customers' Government issued Identification Number while allowing customer service representatives to view the original values when necessary. What should you do?

  • A. Use BigQuery's built-in AEAD encryption to encrypt the SSN column. Save the keys to a new table that is only viewable by permissioned users.
  • B. Use BigQuery column-level security. Set the table permissions so that only members of the Customer Service user group can see the SSN column.
  • C. Before loading the data into BigQuery, use Cloud Data Loss Prevention (DLP) to replace input values with a cryptographic hash.
  • D. Before loading the data into BigQuery, use Cloud Data Loss Prevention (DLP) to replace input values with a cryptographic format-preserving encryption token.
Show Suggested Answer Hide Answer
Suggested Answer: D 🗳️

Comments

Chosen Answer:
This is a voting comment (?) , you can switch to a simple comment.
Switch to a voting comment New
AWSandeep
Highly Voted 1 year, 7 months ago
Selected Answer: B
B. While C and D are intriguing, they don't specify how to enable customer service representatives to receive access to the encryption token.
upvoted 10 times
MaxNRG
4 months, 1 week ago
B. BigQuery column-level security: Pros: Granular control over column access, ensures only authorized users see the SSN column. Cons: Doesn't truly redact the data. The SSN values are still stored in BigQuery, even if hidden from unauthorized users. A potential security breach could expose them.
upvoted 1 times
...
ffggrre
6 months ago
there is no SSN in question, it can be any ID.
upvoted 1 times
...
...
MaxNRG
Most Recent 4 months, 1 week ago
Selected Answer: D
The best option is D - Before loading the data into BigQuery, use Cloud Data Loss Prevention (DLP) to replace input values with a cryptographic format-preserving encryption token. The key reasons are: DLP allows redacting sensitive PII like SSNs before loading into BigQuery. This provides security by default for the raw SSN values. Using format-preserving encryption keeps the column format intact while still encrypting, allowing business logic relying on SSN format to continue functioning. The encrypted tokens can be reversed to view original SSNs when required, meeting the access requirement for customer service reps.
upvoted 1 times
MaxNRG
4 months, 1 week ago
Option A does encrypt SSN but requires managing keys separately. Option B relies on complex IAM policy changes instead of encrypting by default. Option C hashes irreversibly, preventing customer service reps from viewing original SSNs when required. Therefore, using DLP format-preserving encryption before BigQuery ingestion balances both security and analytics requirements for SSN data.
upvoted 1 times
MaxNRG
4 months, 1 week ago
Why not B. BigQuery column-level security: Doesn't truly redact the data. The SSN values are still stored in BigQuery, even if hidden from unauthorized users. A potential security breach could expose them.
upvoted 1 times
...
...
...
Aman47
4 months, 1 week ago
Selected Answer: D
Even if you provide Column level access control, The Data Owners or other hierarchies above it will also be able to view very sensitive data. Better to just use encryption and decryption. As this data can also never be used for any analytic workloads
upvoted 2 times
...
spicebits
5 months, 2 weeks ago
Selected Answer: D
Answer has to be D. Question says "you decide to redact your customers' Government issued Identification Number while allowing customer service representatives to view the original values when necessary"... Redact... view the original values... D is the only choice.
upvoted 2 times
...
Nirca
5 months, 3 weeks ago
Selected Answer: B
It might not be D! Since - only the Frame is kept. the data will be changed. Format Preserving Encryption (FPE), endorsed by NIST, is an advanced encryption technique that transforms data into an encrypted format while preserving its original structure. For instance, a 16-digit credit card number encrypted with FPE will still be a 16-digit number
upvoted 1 times
Helinia
3 months, 3 weeks ago
No, the value using FPE can be decrypted with key. "Encrypted values can be re-identified using the original cryptographic key and the entire output value, including surrogate annotation." https://cloud.google.com/dlp/docs/pseudonymization#supported-methods
upvoted 1 times
...
...
ffggrre
6 months, 1 week ago
Selected Answer: B
Customer service needs to see the original value, not possible with other options.
upvoted 1 times
...
kcl10
6 months, 3 weeks ago
Selected Answer: B
of course B
upvoted 1 times
...
ckanaar
7 months, 1 week ago
Selected Answer: D
I believe the crux to the question is that the cryptographic format-preserving encryption token is re-identifiable, whereas the cryptographic hash is not: https://cloud.google.com/dlp/docs/transformations-reference Therefore, customer service can view the original values when necessary in case of D.
upvoted 2 times
ckanaar
7 months, 1 week ago
Nevermind, this can actually also be done in the case of answer B. They are both correct, just different implementations. No idea
upvoted 1 times
...
...
Lanro
8 months, 4 weeks ago
Selected Answer: D
I don't see why we should use DLP since we know exactly the column that should be locked or encrypted. On the other hand having a cryptographic representation of SSN helps to aggregate/analyse entries. So I will vote for D, but B is much more easy to implement. Garbage question indeed.
upvoted 4 times
...
knith66
9 months ago
the question mentions that "user data is sent to Pub/Sub before being ingested" instead of just saying data goes to big query through pub/sub. So some alteration is expected before being injected into the big query. So option D should work.
upvoted 2 times
...
sr25
9 months ago
Selected Answer: D
D. The question says giving CSR's access to values "when necessary" - not default access like given in B. D is a better option using the token.
upvoted 1 times
...
ZZHZZH
9 months, 2 weeks ago
Selected Answer: B
One of the key requirement is to be able to let authorized personel see the ID. D doesn't specify that.
upvoted 1 times
...
vaga1
11 months, 2 weeks ago
Selected Answer: D
The answer is between B and D as well described in many comments. I personally do not see any reason to keep the information available using a token or a mask. It is not a PAN card number, it's just a personal ID. It should not be useful for analytical purposes. I'm gonna go for D then
upvoted 1 times
vaga1
11 months, 2 weeks ago
sorry B
upvoted 1 times
...
...
mialll
11 months, 4 weeks ago
Selected Answer: D
https://cloud.google.com/dlp/docs/classification-redaction
upvoted 3 times
...
Oleksandr0501
12 months ago
gpt: Both options B and D can be used to redact sensitive data while still allowing authorized users to view the original values when necessary. However, the choice between them would depend on specific business requirements and security considerations. Option B uses BigQuery column-level security to set table permissions for users, allowing only members of the Customer Service user group to view the SSN column. This approach is straightforward and can be implemented easily. However, it requires creating a separate user group for customer service representatives and granting them access to only the required data columns.
upvoted 1 times
Oleksandr0501
12 months ago
gpt: Option D uses Cloud Data Loss Prevention (DLP) to replace input values with a cryptographic format-preserving encryption token before loading the data into BigQuery. This approach allows for more granular control over data access and can provide an added layer of security. However, it may require additional configuration and implementation effort, and it may also affect the performance of queries on the encrypted data. Google recommends using a combination of data protection techniques to safeguard sensitive data, such as encryption, data masking, and access controls. In this scenario, a possible best practice would be to use both options B and D together to provide multiple layers of protection for the sensitive data while still allowing authorized users to view the original values when necessary.
upvoted 1 times
Oleksandr0501
12 months ago
i`ll take D
upvoted 1 times
Oleksandr0501
11 months, 3 weeks ago
now i ve read and think about better choosing A or B ... garbage question
upvoted 1 times
...
...
...
...
muhusman
1 year ago
Answer is B, If we select C then This approach would also prevent unauthorized access to sensitive data, but it would not allow customer service representatives to view the original values when necessary.
upvoted 1 times
...
streeeber
1 year ago
Selected Answer: D
PII and DLP go hand in hand
upvoted 2 times
El_Bosco
9 months, 1 week ago
That is not an argument. Option D does not explain how Customer Services will have access.
upvoted 1 times
...
...
Community vote distribution
A (35%)
C (25%)
B (20%)
Other
Most Voted
A voting comment increases the vote count for the chosen answer by one.

Upvoting a comment with a selected answer will also increase the vote count towards that answer by one. So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.

SaveCancel
Loading ...