You are designing a schema for a table that will be moved from MySQL to Cloud Bigtable. The MySQL table is as follows: How should you design a row key for Cloud Bigtable for this table?
From the link:
"Include a timestamp as part of your row key if you often need to retrieve data based on the time when it was recorded.
For example, your application might need to record performance-related data, such as CPU and memory usage, once per second for a large number of machines. Your row key for this data could combine an identifier for the machine with a timestamp for the data (for example, machine_4223421#1425330757685). Keep in mind that row keys are sorted lexicographically."
By using Event_timestamp_Account_id as the row key, you can efficiently query for events within a specific time range for a given account.
While Option B would allow you to query events for a specific account, it might not be as efficient for time-based queries.
Designing an appropriate row key for Cloud Bigtable requires considering the access patterns and ensuring that the read and write operations are spread evenly across the key space to avoid hotspots.
B. Set Account_id_Event_timestamp as a key.
This option is likely the best choice because:
Combining Account_id with Event_timestamp in the row key would allow you to maintain a good level of data distribution while preserving the ability to query efficiently by Account_id and sort by Event_timestamp within each account. This aligns well with Bigtable’s strengths in handling large, scalable, and sparse datasets.
By leading with Account_id, you group all events for a single account close together in the key space, which can be efficient for reads that are interested in the activity of a specific account.
B. Set Account_id_Event_timestamp as a key.
The primary key in the MySQL table is a composite key consisting of Account_id and Event_timestamp, so it would make sense to use both of these values as the row key in Cloud Bigtable. This allows for efficient querying and sorting by both Account_id and Event_timestamp.
https://cloud.google.com/bigtable/docs/schema-design#row-keys
It's B because :
"Row keys that start with a timestamp. This pattern causes sequential writes to be pushed onto a single node, creating a hotspot. If you put a timestamp in a row key, precede it with a high-cardinality value like a user ID to avoid hotspotting."
https://cloud.google.com/bigtable/docs/schema-design#row-keys
avoid row keys that starts with a timestamp.
Also using a key such as userID_timestamp allows bigtable to query related rows in a range rather than parsing the entire database.
"Row keys that start with a timestamp. This will cause sequential writes to be pushed onto a single node, creating a hotspot. If you put a timestamp in a row key, you need to precede it with a high-cardinality value like a user ID to avoid hotspotting."
A voting comment increases the vote count for the chosen answer by one.
Upvoting a comment with a selected answer will also increase the vote count towards that answer by one.
So if you see a comment that you already agree with, you can upvote it instead of posting a new comment.
salgabri
Highly Voted 4 years, 5 months agosyu31svc
3 years, 10 months agoPinkeshExampTopics
Most Recent 4 months, 3 weeks agosantoshchauhan
1 year, 1 month ago__rajan__
1 year, 7 months agoomermahgoub
2 years, 3 months agoomermahgoub
2 years, 3 months agoomermahgoub
2 years, 3 months agoomermahgoub
2 years, 3 months agotomato123
2 years, 8 months agomaxdanny
2 years, 9 months agosaumabhaM
3 years agowhigy
4 years, 5 months agomastodilu
3 years, 11 months agodonchick
4 years, 4 months agofraloca
4 years, 3 months ago