![](https://crypto4nerd.com/wp-content/uploads/2023/11/0G_dpUPYFCsNvME3l-1024x587.jpg)
Problem:
1GB model file is generated every 24 hours on one of the machines inside Google data center. This file needs to be copied only once to all the machines across all the data centers. Design this system.
Google has millions of machines and data centers across the globe. Thus, it is impossible to use this machine as a single centralized point of file sharing. The goal is to design a system architecture and flow that can handle these huge files. Design interviews test your ability to analyze and solve big problems, quickly. They are hard, but not impossible, especially if you prepare carefully.
Important Design Considerations:
The most important part is to gather requirements, reduce problem ambiguity, and explore all possible hidden constraints.
What are the clarifying questions we can ask to the original problem?
- Can we compress that file?
- Can we connect any pair of machines?
- Can we be fault-tolerant to network failures?
- How much time will it take to copy that file?
- What are the different types of failure events that can occur?
- How to optimize sync time
- Not all of the nodes are equal in compute/storage/network characteristics and current load (node could be running heavy cpu/gpu tasks).
- Can the file be segmented for parallel transmission?
- What are the security protocols for data transfer between machines?
- How do we ensure data integrity in case of a transfer partial failure?
- What are the network latency characteristics between different data center nodes?
- What is the scalability plan for the system as data centers grow?
Capacity Estimation and Constraints
- Let’s assume 1 data center has 100k server nodes and machines on average Likewise all these servers connected via a (single local) flat topology network where any node can connect to any…