“Can you design Twitter's feed?" — The system design interviewer passes forward a question to you during your Fortune 500 technical interview. Half an hour later, you've sketched a basic three-tier architecture any junior engineer could draw, and you know right there that the offer isn't forthcoming.
System design interviews separate senior engineers from staff+ candidates. You know how to build systems at work. Explaining your architectural thinking clearly while someone probes every decision is entirely different.
What makes this hard: unlike coding questions with right answers, system design is deliberately open-ended. Interviewers evaluate your thought process, not a final diagram. They're testing whether you ask the right questions, make reasonable trade-offs, and think at an organizational scale. The framework matters as much as the solution.
This guide breaks down practical system design questions that appear in senior+ loops, reveals what interviewers actually evaluate, shows approaches that fail, and provides frameworks demonstrating staff-level architectural thinking.
1. Design a URL Shortener Like bit.ly
URL shortener questions test fundamental distributed systems thinking — unique ID generation, database design, caching strategy, and redirect handling at scale. Interviewers probe whether you understand read-heavy versus write-heavy systems, can estimate capacity requirements, and make pragmatic trade-offs between consistency and performance.
Right there, they want to evaluate four core capabilities:
- Can you estimate the scale properly? Do you understand this is a 100:1 read-to-write ratio, not balanced traffic?
- Do you prevent URL collisions? Can you explain the trade-offs between Base62 encoding and hashing?
- Can you design for 100M+ daily requests? Do you architect for actual scale or theoretical perfection?
- Do you consider cache invalidation? Do you understand that caching strategy matters more than database choice?
Most engineers commonly fail by jumping straight to the database schema without discussing requirements first. They skip back-of-envelope calculations, making scaling claims feel baseless. Many engineers also design around a single database, treating horizontal scaling as an afterthought.
Design Approach
Strong candidates follow a structured approach that demonstrates architectural maturity. Start by clarifying requirements:
- Daily active users
- Custom URL support
- Persistence duration
- Analytics needs
These questions establish constraints that justify every decision afterward. For a service like bit.ly, assume 100 million URLs created monthly with a 100:1 read-to-write ratio.
Next, run back-of-the-envelope calculations to ground your design in reality. With 100 million new URLs monthly, you're handling roughly 40 writes per second. At a 100:1 ratio, that's 4,000 redirects per second.
Storage grows predictably: 100 million URLs at 500 bytes each add up to 50GB annually. These numbers immediately tell you this is read-heavy and needs aggressive caching.
Your high-level architecture should reflect these constraints. Place a load balancer in front of stateless application servers that handle both URL creation and redirection. Behind them sits your primary database (PostgreSQL works well here), fronted by Redis for caching hot URLs.
This simple stack handles the scale while remaining operationally straightforward.
ID Generation Strategy
Deep dive into ID generation strategy because interviewers always probe this. Base62 encoding of auto-increment IDs gives you over 56 billion possible six-character URLs (62^6) while remaining human-readable.
Alternatively, hash-based approaches using MD5 provide natural distribution but require collision detection. Defend your choice with specific trade-offs: auto-increment is simpler but reveals creation order; hashing adds complexity but scales horizontally more easily.
Database design centers on a single URLs table with columns for shorturl, longurl, createdat, and expiry. Index shorturl for O(1) lookups during redirects. If you need custom aliases, add a uniqueness constraint and handle collisions gracefully with retry logic.
Caching Strategy
Caching strategy determines user-experienced latency. Implement a cache-aside pattern where the application checks Redis first, then falls back to the database on cache miss. Set aggressive TTLs for popular URLs while keeping less-accessed ones in cold storage. This approach exploits the 80/20 rule: 80% of traffic hits 20% of URLs.
Interviewers will push on bottlenecks. At 10,000 writes per second, your single database saturates. Propose database sharding based on URL hash to evenly distribute writes. If cache miss rate climbs above 5%, consider pre-warming cache with trending URLs identified through analytics.
Common follow-up questions test your depth:
- How do you prevent URL enumeration attacks? Rate limiting by IP address and API key
- How would you implement analytics? Asynchronous event streaming to a separate analytics database
- How do you handle expired URLs? Background workers scanning for TTL expiry with lazy deletion
Address these concerns proactively, and you demonstrate the operational thinking staff engineers need.
2. Design a Rate Limiter
Rate limiter questions evaluate your understanding of distributed systems constraints, algorithm trade-offs, and real-time decision-making under scale. Interviewers want to see whether you can protect shared resources, prevent abuse, and maintain service quality during unexpected traffic spikes.
They evaluate whether you understand core distributed systems concepts:
- Do you know different rate-limiting algorithms? Can you compare the token bucket, leaky bucket, and sliding window, and explain their actual trade-offs?
- Can you design for distributed environments? Do you understand why single-server counters don't work at scale?
- Do you consider edge cases? What happens with clock skew across data centers, or with race conditions in concurrent requests?
- Do you think about failure modes: What happens when your rate limiter itself becomes unavailable?
Most candidates fail by proposing centralized counters that don't scale horizontally, or by selecting algorithms without discussing trade-offs. Others ignore the distributed nature of modern systems, where multiple data centers must enforce limits consistently despite network partitions.
Some forget that rate limiters themselves become critical infrastructure requiring high availability.
Begin by clarifying what you're limiting and why:
- Are limits per-user, per-API endpoint, or per-IP address?
- Do you need hard limits that immediately reject requests, or soft limits that queue overflow?
- What's the time window — per second, minute, or hour?
- Will this run in a single data center or globally?
These questions establish whether you need strong consistency or can tolerate eventual consistency across regions.
Algorithm Selection
Algorithm selection drives your entire architecture, so compare options systematically:
- Token bucket allows brief traffic bursts while maintaining average rate — perfect for API gateways serving bursty mobile clients
- Leaky bucket enforces strict constant outflow, useful for backend services that can't handle spikes
- Sliding window log provides the most accurate tracking, but consumes memory linearly with the request count.
Sliding window counterbalances accuracy and memory efficiency, making it popular for production systems. Choose based on your clarified requirements and defend the trade-offs.
System Architecture
For high-level architecture serving millions of requests per second, sketch API gateway instances with rate limiter middleware, backed by a distributed Redis cluster for shared state. Store rate limit rules in a configuration service, allowing dynamic updates without deployment.
Return HTTP 429 with the Retry-After header when limits are exceeded. This design separates policy from enforcement, enabling operators to respond to attacks quickly.
Deep dive into implementation mechanics because details matter at scale. Use Redis atomic operations (INCR with EXPIRE) to safely maintain distributed counters. Handle clock skew across data centers by implementing loose synchronization with acceptable drift windows.
Prevent race conditions by using Lua scripts to execute multiple Redis commands atomically. Address Redis failure scenarios: do you fail open (allowing all traffic) or fail closed (blocking all traffic)? Justify your choice based on system priorities.
Scaling Considerations
Scale considerations become critical when discussing staff-level thinking. A Redis cluster provides horizontal scaling through key sharding, but hot keys (such as rate limits for celebrity users) can cause an imbalance in load.
Implement local caching to reduce Redis queries by 90%, trading slight accuracy for massive throughput gains. Deploy CDN-level rate limiting as the first line of defense against DDoS attacks, protecting your entire origin infrastructure.
Interviewers can probe edge cases to test depth:
- How do you handle distributed counting across regions when network partitions occur? Accept temporary over-limit allowance using gossip protocols for eventual consistency.
- How do you rate-limit without user authentication? Fingerprint by IP plus User-Agent, acknowledging NAT limitations.
- How do you prevent legitimate traffic from getting blocked? Implement multiple limit tiers with exponential backoff, reserving strictest limits for detected abuse patterns.
The key is showing you understand rate limiting not just as an algorithmic problem, but as a distributed systems challenge that requires careful trade-offs among accuracy, performance, and availability.
3. Design a Distributed Cache Like Redis
Distributed cache questions test your grasp of low-latency data access, consistency models, and availability trade-offs under failure conditions. Interviewers evaluate whether you understand when to prioritize speed over consistency, how replication affects durability, and why cache eviction policies matter operationally.
They're evaluating whether you think systematically about caching:
- Do you understand cache eviction policies? Can you explain when LRU fails and why LFU or TTL-based eviction might be better?
- Can you design for high availability? How do you handle cache node failures without losing all data?
- Do you know caching patterns? Can you compare cache-aside, write-through, and write-behind with real trade-offs?
- Do you consider cache invalidation? Can you handle the most complex problem in computer science?
Common failure patterns reveal shallow thinking. Candidates propose single-node caches with no availability story, or discuss consistency guarantees without acknowledging CAP theorem trade-offs.
Others focus entirely on get/set operations while ignoring memory management under pressure, cache warming strategies, or monitoring approaches that detect degradation before users notice.
Start with requirements that establish your design's constraints:
- What QPS do you need to support — 10,000 or 1 million reads per second?
- How much data must stay in memory — 10GB or 1TB?
- Can you tolerate eventual consistency after failover, or do reads require strong consistency?
- What's an acceptable cache miss rate given downstream database capacity?
These numbers ground every subsequent decision.
Data Structure Design
Core data structures determine performance characteristics. Implement a hash map providing O(1) lookups by key, paired with a doubly-linked list enabling O(1) LRU eviction. This combination delivers the speed users expect while intelligently managing memory.
For persistence, add an append-only log capturing writes so recovery after crashes doesn't lose recent updates. Balance memory usage against durability guarantees based on your clarified requirements.
High-level architecture for production deployment distributes data across clustered cache nodes using consistent hashing for key assignment. This minimizes rehashing when nodes join or leave, preserving cache hit rates during scaling events.
Add a replication factor of 3 (one primary, two replicas) to provide fault tolerance without excessive storage overhead. Place a stateless proxy layer in front that routes requests to correct nodes and handles failover transparently, shielding clients from cluster topology changes.
Caching Patterns
Caching strategies fundamentally change application behavior, so explicitly discuss the trade-offs:
- The cache-aside pattern gives applications complete control over cache population and invalidation, making it well-suited to read-heavy workloads with infrequent updates
- The write-through pattern updates the cache and database atomically, sacrificing write latency for guaranteed consistency
Choose based on whether your system tolerates stale reads or requires strong consistency guarantees.
Likewise, eviction and expiration policies prevent memory exhaustion while keeping the working set cached:
- Least Recently Used (LRU) removes the least-recently-used items, capturing temporal locality in most access patterns
- Least Frequently Used (LFU) tracks access frequency, making it better suited to workloads with stable hot data. Combine approaches with TTL-based expiration, allowing explicit lifetimes.
Implement background scanning to delete expired keys, avoiding synchronous read overhead while lazily loading. Monitor the eviction rate as a leading indicator of insufficient memory before performance degrades.
Performance Optimization
Scale considerations expose architectural maturity. Hot keys (data accessed orders of magnitude more than average) create load imbalance across cluster nodes. Replicate hot keys across multiple nodes to distribute read load horizontally. Implement local application-side caching for extremely hot data to eliminate network round-trips.
Address thundering herd during cache misses by implementing request coalescing, where multiple concurrent requests for the same key wait on a single upstream fetch.
4. Design a News Feed Like X (Twitter) or Facebook
News feed questions test your ability to design read-heavy systems with real-time requirements, personalization at scale, and complex ranking algorithms. Interviewers probe whether you can balance latency against consistency, handle celebrity user edge cases, and reason about storage costs when pre-computing feeds.
They evaluate your architectural thinking across several dimensions:
- Do you understand fan-out patterns? Can you explain when to pre-compute and when to compute on demand?
- Can you balance latency versus consistency? Do you know when eventual consistency is acceptable?
- Do you consider celebrity user edge cases? How do you efficiently handle users with millions of followers?
- Can you design ranking systems? Do you understand feed algorithms beyond simple chronological ordering?
Most engineers typically stumble by proposing pull-only models that can't meet latency requirements or push-only models that drive up storage costs. They don't consider the hybrid approach that production systems actually use.
Others ignore ranking complexity, treating feeds as simple chronological lists. Many also forget about privacy filtering, spam detection, or handling deleted posts across pre-computed feeds.
Requirements clarification establishes scale and constraints. For an X (Twitter) scale system, assume 300 million daily active users, an average of 200 followers per user, and users expect fresh content within seconds.
Ask whether feeds must be strictly chronological or algorithmically ranked. Clarify read-to-write ratio (typically 100:1) and whether real-time updates require WebSocket connections or polling suffices.
Fan-Out Strategy
Fan-out approaches determine your entire architecture, so discuss trade-offs methodically:
- Fan-out on write pre-computes feeds by copying each post to all followers' timelines, delivering speedy reads but expensive writes for users with millions of followers
- Fan-out on read computes feeds on demand by gathering posts from all followed users, minimizing write cost, but potentially slow for large following lists
Production systems use a hybrid approach: pre-compute feeds for regular users, compute on demand for celebrities, and merge results at read time.
System Architecture
High-level architecture reflects this hybrid approach. When users post, the content flows to a fan-out service that asynchronously distributes posts to followers' feed caches stored in Redis sorted sets.
For celebrity posts, skip fan-out entirely and store in a separate high-fan-out cache. Timeline service merges precomputed and on-demand feeds, ranks them by relevance, and returns the top results.
Store social graph in a dedicated graph database optimized for follower/following queries. Serve media content through CDN, keeping origin servers focused on feed logic.
Deep dive into implementation mechanics that distinguish staff-level thinking. Model feed storage using Redis sorted sets where scores represent timestamps or ranking signals, enabling efficient chronological or ranked retrieval. Implement feed trimming, keeping only the most recent 1,000 posts per user to prevent unbounded storage growth.
Handle post deletions through lazy cleanup during reads rather than trying to remove from millions of pre-computed feeds synchronously. Use message queues to buffer fan-out work and provide back-pressure when celebrity posts spike the load.
Feed Ranking
Ranking and personalization layers add complexity worth discussing. Beyond simple chronological order, incorporate engagement signals (likes, replies, click-through rate) to surface relevant content.
Apply ML models predicting user interest based on historical interactions. Implement real-time spam filtering before posts reach feeds. Balance fresh content with popular posts using time-decay functions. Pre-compute ranking for cached feeds, but recalculate dynamically for on-demand portions.
Scale bottlenecks emerge around celebrity users and viral content. For example, when a user has 50 million followers, fan-out work overwhelms queue workers: partition follower lists and process in batches with rate limiting.
For viral posts receiving millions of likes per minute, aggregate engagement metrics are updated asynchronously rather than in real time. Implement read replicas for the feed cache to distribute query load horizontally. Monitor queue lag as an early warning system for capacity issues.
5. Design a Video Streaming Service Like YouTube
Video streaming questions evaluate your understanding of content delivery at scale, encoding pipelines, storage economics, and handling massive global bandwidth requirements. Interviewers test whether you can design upload flows, processing pipelines, and playback systems that work reliably across terrible network conditions.
They test multiple dimensions of systems knowledge:
- Do you understand video encoding? Can you explain transcoding, bitrates, and adaptive streaming protocols?
- Can you design for global CDN distribution? Do you understand how CDNs reduce the load on origin servers?
- Do you handle upload processing? Can you design asynchronous pipelines that process hours of video uploads per minute?
- Do you consider storage costs? Do you understand petabyte-scale storage and lifecycle management?
Candidates commonly fail by focusing only on storage without discussing encoding complexity, or by proposing to serve videos directly from origin servers without a CDN architecture.
They don't consider how adaptive bitrate streaming works, ignore the upload bandwidth problem for multi-gigabyte files, or forget about metadata search and recommendation systems that drive discovery.
Start requirements gathering by establishing scale parameters. YouTube receives 500 hours of video uploaded every minute while serving millions of concurrent viewers. Storage scales to petabyte levels quickly — a single 4K video can consume 20GB after encoding across multiple quality tiers.
Ask about supported video qualities, upload size limits, acceptable buffering rates, and whether live streaming is required. These numbers inform every architectural choice.
Upload Processing
The upload pipeline handles the most technically complex flow. Implement a resumable upload protocol so users can retry failed uploads from the last checkpoint rather than restarting from scratch. Use multipart uploads to split large files into manageable chunks uploaded in parallel.
Once the upload completes, trigger the video processing queue that fans out to the encoding worker farm.
Each worker transcodes the original file into multiple quality levels (360p, 720p, 1080p, 4K), plus generates thumbnails and extracts metadata. Store all artifacts in object storage, such as Amazon S3, which provides durability at scale.
Content Delivery
Streaming architecture delivers content globally with minimal latency through CDN distribution. Store encoded videos as segmented HLS or DASH manifests, pointing to 2-10-second chunks. When users request video, edge servers serve manifests and chunks from cache, falling back to origin servers on cache miss.
Implement adaptive bitrate streaming where the client player automatically switches quality levels based on measured network conditions, preventing buffering. Pre-warm CDN caches for newly published videos from popular creators, ensuring first viewers get cache hits.
Discovery Systems
Metadata and discovery systems enable users to find content among billions of videos. Store video metadata (title, description, upload date, view count) in a sharded relational database partitioned by creator ID.
Maintain an Elasticsearch search index asynchronously via change data capture, enabling sub-second search latency. Build a recommendation engine using collaborative filtering and content-based approaches, precomputing suggestions and caching them.
Track view count and engagement metrics through an event streaming pipeline aggregating statistics in a real-time data warehouse.
Scale considerations dominate operational discussions. Video encoding compute represents a massive ongoing cost — optimize by detecting duplicate uploads via perceptual hashing before transcoding. Storage costs grow unboundedly — implement lifecycle policies moving older, less-watched content to cheaper cold storage tiers.
Bandwidth costs at CDN scale become significant—negotiate committed-use discounts and implement innovative caching policies that prioritize popular content. Monitor cache hit rates above 95%, encoding queue lag under 5 minutes, and playback start time under 2 seconds.
Interviewers can then probe failure scenarios and edge cases:
- How do you handle corrupted uploads? Validate file headers and run integrity checks before queuing for encoding.
- What happens when encoding fails? Implement retry with exponential backoff and dead-letter queues for manual investigation.
- How do you prevent copyright violations? Implement a content ID system that matches uploaded videos against a reference database.
- How do you handle sudden viral spikes? CDN absorption, combined with origin rate limiting, protects the infrastructure while serving cached content.
These failure scenarios test whether you think beyond happy path designs to real-world operational challenges. Strong candidates proactively address edge cases, demonstrating the production mindset that distinguishes staff-level engineers from those still learning system design fundamentals.
6. Design a Messaging System Like WhatsApp or Slack
Messaging system questions test your understanding of real-time communication, WebSocket connection management, message delivery guarantees, and scaling persistent connections across millions of users. Interviewers evaluate whether you can design for low latency, handle offline users gracefully, and ensure messages arrive exactly once in the correct order.
They evaluate your understanding of real-time systems:
- Do you understand WebSocket versus polling? Can you explain why persistent connections matter for real-time messaging?
- Can you design message persistence? How do you store billions of messages with fast retrieval?
- Do you handle offline users? What happens to messages when recipients aren't connected?
- Can you implement read receipts? How do you track message state across a distributed system?
Common failures reveal gaps in distributed systems thinking. Candidates propose HTTP polling instead of persistent connections, which wastes bandwidth and adds latency. They don't discuss message ordering guarantees or explain how offline message delivery works.
Many also ignore the complexity of group chat scaling — naive fan-out to thousands of members creates hotspots. Others forget about read receipts, typing indicators, and presence information that users expect.
Requirements clarification establishes your system's scope:
- Confirm support for one-to-one and group conversations, expected message delivery latency, offline message retention period, and whether media sharing is required
- Ask about read receipts, typing indicators, and end-to-end encryption requirements. For WhatsApp scale, assume over 3 billion users and billions of messages daily, implying 1 million messages per second at peak
This helps you make core architectural decisions.
Connection Architecture
Core architecture centers on persistent connections for real-time delivery. Deploy WebSocket gateway servers maintaining long-lived connections to online clients, enabling sub-100ms message delivery.
Behind gateways sits a message-routing service that determines the destination users and invokes delivery logic. Use a message queue like Kafka for reliable delivery — messages persist in the queue until the recipient acknowledges them. Store message history in a NoSQL database, such as Cassandra, partitioned by conversation ID for efficient retrieval.
Message Delivery
Message flow demonstrates your systems thinking. When the sender transmits a message, it reaches the WebSocket gateway, which forwards it to the message service. The message service writes to a Kafka topic, persists to Cassandra, and attempts immediate delivery if the recipient is online.
Delivery workers consume from Kafka and push messages to the recipient's WebSocket connection. If the recipient is offline, the message waits in Kafka with delivery retry logic. Client acknowledges receipt, allowing Kafka to mark the message consumed.
This flow provides at-least-once delivery semantics with client-side deduplication, achieving exactly-once.
Group chat introduces scaling complexity worth discussing. For small groups of fewer than 100 members, fan out messages to all member connections synchronously. For large groups like company-wide channels, don't fan out — instead, publish once to the group's Kafka topic and have clients subscribe.
Track per-user read cursors to indicate the last-seen message, enabling catch-up when users come online. Implement pagination for historical message retrieval to prevent large groups from overwhelming clients during join.
Additional features demonstrate attention to product details that interviewers value. Typing indicators use ephemeral events stored in Redis with a 5-second TTL — no need to persist these transient signals. Read receipts update message metadata in Cassandra when clients confirm reading.
Presence information tracks online status through heartbeat protocol, timing out connections after 30 seconds of inactivity. Media sharing uploads files to object storage and inserts the URL into the message payload rather than inline binary data.
Scaling Challenges
Scale considerations reveal operational maturity. WebSocket connections consume server memory — each connection requires 10KB, so 1 million connections need 10GB RAM per gateway node.
Implement connection draining during deployments to enable graceful failover. Message queues accumulate during offline periods — monitor lag and scale delivery workers dynamically. Implement backfill limits to prevent users from being offline for weeks and from receiving thousands of queued messages simultaneously.
Shard Cassandra by conversation hash, evenly balancing storage across cluster nodes.
How DataAnnotation Builds Technical Interview Readiness
During technical interviews, he challenge isn't knowing the patterns — you apply them at work constantly. The problem is that during extended interview cycles, you're not regularly exercising the rapid evaluation and trade-off analysis these interviews demand.
What keeps architectural thinking sharp is regularly evaluating technical decisions under realistic constraints. When you review AI-generated code for platforms like DataAnnotation that pay $40+ per hour, you diagnose problems, choose fixes, and justify decisions clearly.
You're constantly making technical judgments about Python, JavaScript, and other languages while getting paid. Every evaluation mirrors interview pressure: assess complex situations quickly, explain your reasoning concisely, and communicate decisions knowing they'll be scrutinized.
The platform has paid over $20 million to remote workers since 2020, maintaining 3.7/5 stars on Indeed with 700+ reviews and 3.9/5 stars on Glassdoor with 300+ reviews.
You understand distributed systems from work experience. What you need is active practice evaluating architectural decisions under time pressure — exactly what these interviews test.
Stay Sharp for Technical Interviews With DataAnnotation
You have the engineering experience. What you're missing is practice articulating complex situations clearly under pressure while someone evaluates your reasoning. Code evaluation work solves this challenge.
DataAnnotation's coding projects at $40+ per hour develop the rapid, clear communication these interviews demand. After hundreds of evaluations, your ability to deliver crisp STAR answers becomes natural because you've practiced that exact skill repeatedly.
Getting from interested to earning takes five straightforward steps:
- Visit the DataAnnotation application page and click “Apply”
- Fill out the brief form with your background and availability
- Complete the Starter Assessment
- Check your inbox for the approval decision (which should arrive within a few days)
- Log in to your dashboard, choose your first project, and start earning
No signup fees. DataAnnotation stays selective to maintain quality standards. You can only take the Starter Assessment once, so read the instructions carefully and review before submitting.
Start your application at DataAnnotation today and keep your technical evaluation skills sharp during interview cycles.
.jpeg)




