Get Unlimited Contributor Access to the all ExamTopics Exams!
Take advantage of PDF Files for 1000+ Exams along with community discussions and pass IT Certification Exams Easily.
You need to copy millions of sensitive patient records from a relational database to BigQuery. The total size of the database is 10 TB. You need to design a solution that is secure and time-efficient. What should you do?
A.
Export the records from the database as an Avro file. Upload the file to GCS using gsutil, and then load the Avro file into BigQuery using the BigQuery web UI in the GCP Console.
B.
Export the records from the database as an Avro file. Copy the file onto a Transfer Appliance and send it to Google, and then load the Avro file into BigQuery using the BigQuery web UI in the GCP Console.
C.
Export the records from the database into a CSV file. Create a public URL for the CSV file, and then use Storage Transfer Service to move the file to Cloud Storage. Load the CSV file into BigQuery using the BigQuery web UI in the GCP Console.
D.
Export the records from the database as an Avro file. Create a public URL for the Avro file, and then use Storage Transfer Service to move the file to Cloud Storage. Load the Avro file into BigQuery using the BigQuery web UI in the GCP Console.
You are transferring sensitive patient information, so C & D are ruled out. Choice comes down to A & B. Here it gets tricky. How to choose Transfer Appliance: (https://cloud.google.com/transfer-appliance/docs/2.0/overview)
Without knowing the bandwidth, it is not possible to determine whether the upload can be completed within 7 days, as recommended by Google. So the safest and most performant way is to use Transfer Appliance.
Therefore my choice is B.
https://cloud.google.com/solutions/migration-to-google-cloud-transferring-your-large-datasets
The table shows for 1Gbps, it takes 30 hrs for 10 TB. Generally, corporate internet speeds are over 1Gbps. I'm inclined to pick A
If you transfer 10TBs over the wire, your network will be blocked for the entire transfer time. This isn't something a company would be happy to swallow.
Answer should be B: A is also correct but it has its own limit. It allows only 5TB data upload at a time to cloud storage.
https://cloud.google.com/storage/quotas
I will go with B
IMO "A" is the most suitable option since the transfer appliance could take 25 days to get the appliance and then 25 days to ship it back and have the data available.
https://cloud.google.com/transfer-appliance/docs/4.0/overview#transfer-speeds
Given the sensitivity of the patient records and the large size of the data, using Google's Transfer Appliance is a secure and efficient method. The Transfer Appliance is a hardware solution provided by Google for transferring large amounts of data. It enables you to securely transfer data without exposing it over the internet.
10 TB is nothing. With a single 10 GB interconnect you could transfer the data in 3 hours or even with a 1 GB speeds without interconnect you could transfer it in one weekend. The transfer appliance will take 25 days to get the appliance and then 25 days while you wait for the data to be available that is not "time-efficient" at all. I go with A instead of B.
As per Google recommendation above 1TB of transfer from onprem or from Google cloud or other cloud storage like s3 etc we need to use storage transfer service.
Transfer Appliance would take 20 days for epected turnaround time. https://cloud.google.com/architecture/migration-to-google-cloud-transferring-your-large-datasets#expected%20turnaround:~:text=The%20expected%20turnaround%20time%20for%20a%20network%20appliance%20to%20be%20shipped%2C%20loaded%20with%20your%20data%2C%20shipped%20back%2C%20and%20rehydrated%20on%20Google%20Cloud%20is%2020%20days.
The best answer would be A.
If gsutil consume/leverage 100MB it would take 12 days and more time-efficient than B.
This is a reasonable assumption.
https://cloud.google.com/static/architecture/images/big-data-transfer-how-to-get-started-transfer-size-and-speed.png
I will go with " A" because of the transition time to take transfer appliance to Google and that also depends in the organisation location. gsutil works anywhere internet is available.
bhaii ek baar mera point sun lo and khud ki research karo...
option A,because dekho 10tb hai ye mat dekho file ko compress kiya ja raha hai Avro me
jo ki 90%-92% compress kar deta hai, to finaly hamare pass 1TB ya esase bhi kam ka file data hai jisko transfer karna hai , ab batao Transfer Appliance kyo use karu bhaisahab transfer appliance ki catagory hai 40tb aur 300TB ki , kyo offline ja rahe ho jo ki 7 din ya usase jyada time lega tumhara data online aane me,
aur GSUTIL use karoge aur ye 100MB pe hi chala without dedicated bandwidth tab bhi ye ,1TB 100MB/S ki speed se 1 din me pura data online la dega .kyoki avro se file pahale hi 10tb se 1tb ho chuki hai. to GSUTIL is the best,bhale hi cost effective nahi bola hai question me but time bhi to dekho
Transfer Appliance is not as time-efficient when you have enough bandwitdh. https://cloud.google.com/architecture/migration-to-google-cloud-transferring-your-large-datasets#transfer_appliance_for_larger_transfers
A is the answer, the question states the following facts:
- Total size of database 10TB.
- Solution needs to be:
* Secure
* Time-efficient
Total size of database:
will be significantly reduced in an avro file compression (up to 90% compression)
Secure transfer:
Even if we are dealing with sensitive data, data is encrypted when in transit while using `gsutils cp` to upload the data to GCS. https://cloud.google.com/storage/docs/gsutil/addlhelp/SecurityandPrivacyConsiderations#transport-layer-security
Time-Efficient:
gsutil could upload 10TB of data in 30 hours (or 1TB if its avro compressed first in 3 hours)
It has to be gsutil.
In this documentation: https://cloud.google.com/architecture/migration-to-google-cloud-transferring-your-large-datasets#transfer-options
It states that if it meets the projects deadline, use gsutil.
Also, for Transfer Appliance, here (https://cloud.google.com/architecture/migration-to-google-cloud-transferring-your-large-datasets#transfer_appliance_for_larger_transfers), it states:
The expected turnaround time for a network appliance to be shipped, loaded with your data, shipped back, and rehydrated on Google Cloud is 20 days.
Even with 100 Mbps, for 10 TB, it's 12 days. Almost half! of the Transfer Appliance.
It's, of course, option A.
A voting comment increases the vote count for the chosen answer by one.
Upvoting a comment with a selected answer will also increase the vote count towards that answer by one.
So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.
Ganshank
Highly Voted 4 years, 1 month agotprashanth
3 years, 10 months agoBigQuery
2 years, 5 months agoforepick
11 months, 3 weeks agoTNT87
3 years, 7 months agoYiouk
2 years, 9 months agoAzureDP900
1 year, 4 months agoSSV
Highly Voted 3 years, 10 months agoVASI
3 years, 4 months agoVASI
3 years, 4 months agoNaresh_4u
Most Recent 2 weeks, 3 days agoGCanteiro
3 months, 2 weeks agoTVH_Data_Engineer
4 months, 4 weeks agorocky48
5 months, 2 weeks agospicebits
6 months, 1 week agospicebits
6 months, 1 week agoA_Nasser
7 months, 4 weeks agoDineshVarma
9 months agoarien_chen
9 months agoColourseun
9 months agoNeoNitin
10 months agoaewis
10 months, 1 week agoZZHZZH
10 months, 1 week agoWillemHendr
11 months agoEnder_H
11 months, 1 week agodgteixeira
11 months, 2 weeks ago