How would you design a system to handle 100 million daily active users for a messaging platform, ensuring 99.9% uptime and sub-100ms latency?
Why interviewers ask this
This question assesses your system design skills, understanding of distributed systems, scalability challenges, and ability to make trade-offs under real-world constraints. Interviewers want to see if you can break down complex requirements and communicate technical solutions clearly.
Sample Answer
I'd start by gathering requirements: 100M DAU with average 50 messages/day means ~58K messages/second peak. For the architecture, I'd use a microservices approach with: 1) Load balancers distributing traffic across multiple regions, 2) API gateway for authentication and rate limiting, 3) Message service with horizontal sharding by user_id, 4) Real-time delivery via WebSocket connections with connection pooling, 5) Database sharding using consistent hashing across multiple MySQL clusters, 6) Redis for caching recent messages and user presence, 7) CDN for media files. For 99.9% uptime, I'd implement circuit breakers, auto-scaling, health checks, and multi-region deployment. To achieve sub-100ms latency, I'd use geographic load balancing, connection pooling, database read replicas, and aggressive caching strategies.
Pro Tips
Start with requirements gathering and capacity estimation, discuss trade-offs between consistency and availability, mention specific technologies and justify your choices
Avoid These Mistakes
Overcomplicating the design without justification, ignoring real-world constraints like cost and operational complexity, not addressing failure scenarios