System Design

System Design Interview Questions and Answers

Excel in system design interviews with questions on scalable architectures, distributed systems, database design, caching strategies, and real-world system implementations.

πŸ“‹ Jump to Question

The Ultimate System Design Interview Guide

Table of Contents

  1. The Interview Framework (How to Answer)
  2. Question 1: Design URL Shortener (TinyURL)
  3. Question 2: Design WhatsApp / Facebook Messenger
  4. Question 3: Design Twitter / X (News Feed)
  5. Question 4: Design YouTube / Netflix
  6. Question 5: Design Uber / Lyft
  7. Question 6: Design a Distributed Key-Value Store (Redis/Cassandra)
  8. Question 7: Design a Rate Limiter
  9. Glossary of Key Terms

The Interview Framework (How to Answer)

Regardless of the question, follow this 4-Step Framework:

Step 1: Requirements Gathering (4-5 minutes)

  • Functional Requirements: What must the system do? (e.g., "User can post a tweet").
  • Non-Functional Requirements: The system qualities. (e.g., High Availability, Low Latency, Durability, Consistency).
  • Scale (Traffic Estimates): How many users? Read/Write ratio? Storage estimates?
    • Example: "We expect 1 billion users. 500M tweets per day. Read:Write is 100:1."

Step 2: High-Level Design (10-15 minutes)

  • Draw the main components.
  • Client -> Load Balancer -> Web Servers -> Application Logic -> Databases.
  • Define the API Layer (REST/GraphQL/gRPC).
  • Choose the Database Type (SQL vs NoSQL vs Blob Storage).

Step 3: Deep Dive (10-15 minutes)

  • The interviewer will ask you to zoom in on a specific component.
  • Discuss Data Modeling (Schema design).
  • Discuss Key Algorithms (e.g., consistent hashing, quorum, geohashing).
  • Discuss Bottlenecks and how to resolve them (Caching, CDN, Sharding).

Step 4: Wrap-up (2 minutes)

  • Summarize the architecture.
  • Identify Single Points of Failure.
  • Discuss how to monitor the system (Metrics/Logs).

Question 1: Design URL Shortener (TinyURL)

Difficulty: ⭐ Easy/Medium

1. Requirements

  • Functional:
    • Given a long URL, generate a unique, short alias (e.g., tinyurl.com/abc123).
    • When a user clicks the short link, they are redirected (HTTP 301) to the original long URL.
    • (Optional) Custom alias support.
    • (Optional) Expiry of links.
  • Non-Functional:
    • Highly Available (DNS redirects must always work).
    • Low Latency for redirection.
    • The short link should be unpredictable (hard to guess).
  • Scale: 100M URLs generated per day. Redirections: 10B per day (100:1 read:write).
    • Storage: 100M * 500 bytes (avg) = 50GB/day. After 10 years = ~180TB.

2. High-Level Design

  • API:
    • POST /shorten { longUrl: string, customAlias?: string } -> { shortUrl: string }
    • GET /{shortKey} -> HTTP 301 Redirect to LongUrl.
  • Core Logic: The main challenge is Key Generation.
  • Components:
    1. Web Servers: Handle API requests.
    2. Key Generation Service (KGS): A service that pre-generates random 7-character strings (Base-64) and stores them in a DB table to avoid collisions and speed up writes.
    3. Database: Store the mapping of shortKey -> longUrl.

3. Deep Dive

  • Data Model:
    CREATE TABLE url_mappings (
        id BIGINT PRIMARY KEY AUTO_INCREMENT,
        short_key VARCHAR(10) UNIQUE NOT NULL,
        long_url TEXT NOT NULL,
        created_at DATETIME,
        expiry DATETIME,
        INDEX (short_key)
    );
    
  • Database Choice: Cassandra or DynamoDB (wide-column) is better than SQL for this high-write, key-value look-up pattern.
  • Caching: Use Redis/Memcached in front of the DB. Since 99% of traffic is reads (redirects), caching short_key -> long_url drastically reduces latency. Use LRU (Least Recently Used) eviction.
  • Redirection: Use HTTP 301 (Permanent Redirect) . This is cached by browsers, reducing load on your servers. Use 302 only if you need to track click analytics dynamically.
  • Scaling:
    • Database: Shard by short_key.
    • Web Tier: Stateless, just add more servers behind the load balancer.

Question 2: Design WhatsApp / Facebook Messenger

Difficulty: ⭐⭐ Medium

1. Requirements

  • Functional:
    • 1-on-1 chat with real-time delivery.
    • Group chat (optional).
    • User online/last-seen status.
    • Message delivered and read receipts (blue ticks).
  • Non-Functional:
    • Low Latency: Messages should feel instant (< 100ms).
    • High Availability: Chat is critical.
    • Reliability: Messages must not be lost (Durability).
    • Consistency: Order of messages must be preserved.

2. High-Level Design

  • Communication Protocol:
    • Client A -> Server: Use WebSockets (persistent, bi-directional connection).
    • Server -> Client B: Server pushes message via the existing WebSocket connection.
  • Components:
    1. Chat Servers (Stateless): Manage WebSocket connections. Handle sending/receiving messages.
    2. Presence Servers: Handle online/offline status (often via Redis Pub/Sub).
    3. Message Sync: How do you get messages when you come online?
    4. Database: Store the chat history.

3. Deep Dive

  • Data Model (Message Table - Cassandra):
    -- In Cassandra, you model by query pattern.
    -- We need to get messages for a user in a conversation, ordered by time.
    CREATE TABLE messages (
        conversation_id UUID,
        message_timestamp TIMESTAMP,
        sender_id UUID,
        text TEXT,
        PRIMARY KEY (conversation_id, message_timestamp)
    ) WITH CLUSTERING ORDER BY (message_timestamp DESC);
    
  • Handling Message Sync (The "Hose"):
    • Users switch devices often (Phone -> Laptop).
    • Solution: Each user has an Inbox/Message Queue. When they reconnect, the server replays messages from the last known message_id they have.
  • Handling Offline Users:
    • If user B is offline, the Chat Server stores the message in its local queue/database. When user B comes online, the server pushes pending messages.
  • Presence (Last Seen):
    • On disconnect, don't mark user offline immediately. Use a heartbeat timeout (e.g., mark as "away" after 5 seconds, offline after 30 seconds). This prevents flapping.
  • Delivery Receipts:
    • When Client B receives message, it sends an acknowledgment back to the server.
    • The server forwards that ack to Client A's WebSocket.

Question 3: Design Twitter / X (News Feed)

Difficulty: ⭐⭐ Medium/Hard

1. Requirements

  • Functional:
    • User can post a tweet (text + media).
    • User can follow other users.
    • User can view a News Feed consisting of tweets from followed users.
  • Non-Functional:
    • Fanout: High write load (celebrities with millions of followers).
    • Feed Generation: Must be very fast ( < 200ms ).
    • Availability: Posting tweets can be eventually consistent, but viewing feed must be highly available.

2. High-Level Design

  • The Core Problem: How do you assemble the feed?
    • Approach 1: Pull Model (Fan-out on read): When user loads feed, find all followed users, merge tweets, sort by time.
      • Pros: Simple.
      • Cons: Very slow for users with many followees (Latency O(n)).
    • Approach 2: Push Model (Fan-out on write): Pre-compute the feed. When a user tweets, push the tweet to all followers' feed caches.
      • Pros: Read is O(1) (just fetch pre-computed list).
      • Cons: Write latency is high for celebrities (millions of writes per tweet).

3. Deep Dive

  • The Hybrid Approach (The Twitter Solution):
    • Average users: Use Push Model. When they tweet, write to the feed cache of all their followers.
    • Celebrities (Elon Musk, Taylor Swift): Use Pull Model. For followers of a celebrity, the feed service fetches the celebrity's recent tweets on-the-fly and merges them.
    • How to decide? Check the follower count. If > 10k followers, treat as celebrity.
  • Data Model:
    • User Table: SQL.
    • Follow Table: SQL (or Graph DB). (follower_id, followee_id).
    • Tweet Table: Cassandra (High write throughput). (tweet_id, user_id, content, timestamp).
  • Feed Caching (Redis):
    • For every user, store a List/Set in Redis containing the last 500-1000 tweet IDs for their feed.
    • feed:{user_id} -> Sorted Set of (timestamp, tweet_id).
  • Media: Uploaded directly to CDN (CloudFront/CDN), with URL stored in the Tweet.

Question 4: Design YouTube / Netflix

Difficulty: ⭐⭐⭐ Hard

1. Requirements

  • Functional:
    • Upload video (High quality).
    • Stream video (Playback).
    • Search for videos (optional focus).
    • View counts / Likes.
  • Non-Functional:
    • High Availability: Video is static content, easy to cache.
    • High Durability: Videos cannot be lost.
    • Smooth Playback: No buffering (Low Latency for start time).
    • Global Reach: Serve users worldwide.

2. High-Level Design

  • Two main flows: Uploading & Streaming.
  • Components:
    1. Upload Service: Handles raw video uploads.
    2. Video Transcoding/Encoding Servers: Convert video into different formats (MP4, WebM) and resolutions (240p, 360p, 1080p). This is CPU intensive.
    3. CDN (Content Delivery Network): Stores and serves the transcoded videos to users.
    4. Metadata DB: Stores video metadata (title, description, URL).

3. Deep Dive

  • Upload Flow:
    1. User uploads video to Upload Service.
    2. Upload service writes metadata to DB and stores raw video in Blob Storage (e.g., S3, Google Cloud Storage).
    3. A message is sent to a Queue (e.g., RabbitMQ, Kafka) saying "Video 123 needs transcoding".
    4. Transcoding Workers pick up the job. They fetch the raw video, convert it to multiple resolutions/bitrates (H.264/H.265).
    5. Workers upload the finished chunks back to Blob Storage/CDN origin.
    6. Once done, the metadata is updated to "Video Ready".
  • Streaming Flow:
    • Client requests a video.
    • Server returns a Manifest File (e.g., .m3u8 for HLS).
    • This manifest contains URLs to video chunks at various bitrates (e.g., video_720p_001.ts, video_720p_002.ts).
    • The client player uses Adaptive Bitrate Streaming (ABR) . If network is good, it requests high-res chunks. If network slows, it seamlessly switches to low-res chunks.
    • CDN: Serves 99% of this traffic.
  • Optimization:
    • Geo-Redundancy: CDNs replicate content globally.
    • Pre-fetching: Client may download next few seconds of video predictively to handle network jitter.

Question 5: Design Uber / Lyft

Difficulty: ⭐⭐⭐ Hard

1. Requirements

  • Functional:
    • Rider: Can request a ride, see nearby drivers, and track driver location in real-time.
    • Driver: Can go online/offline, receive ride requests, update location continuously.
  • Non-Functional:
    • Low Latency: Location updates must be fast.
    • High Availability: Matching is critical.
    • Consistency: A ride cannot be double-booked (Two riders matched to same driver at same time).

2. High-Level Design

  • Core Problem: How do you find all nearby drivers for a given rider? (Spatial Indexing)
  • Components:
    1. Location Service (WebSocket): Handles real-time location pings from drivers.
    2. Dispatcher/Matching Service: Finds nearby drivers.
    3. Database: Stores trip history, user details.
    4. Spatial Index: A data structure to query points on a map.

3. Deep Dive

  • Spatial Indexing - The "QuadTree":
    • You cannot scan all drivers in the world to find nearby ones. O(N) is too slow.
    • Solution: QuadTree. Divide the world map into grids. If a grid has too many drivers, split it into 4 quadrants.
    • The rider's location is a point. You search the QuadTree for grids surrounding that point to find drivers.
    • Implementation: This QuadTree is stored in memory on the Matching Service. Drivers move constantly, so the QuadTree is updated thousands of times per second.
  • Alternative: Geohashing:
    • Encode a latitude/longitude into a string (e.g., 9q8yve). The longer the string, the more precise.
    • You can query the DB for drivers whose geohash matches the prefix of the rider's geohash (e.g., 9q8y%).
  • Handling Scale:
    • Shard the driver data by region (e.g., US-East server handles only US-East drivers).
    • Use Kafka as a pipeline: All driver locations go to Kafka, which streams to the matching services.
  • Matching Algorithm:
    • Simple version: Find the nearest driver (Euclidean distance or road distance via Google Maps API).
    • Advanced: "Surge Pricing" and "Batching" (matching multiple riders going the same way).

Question 6: Design a Distributed Key-Value Store (Redis/Cassandra)

Difficulty: ⭐⭐⭐ Hard

1. Requirements

  • Functional:
    • put(key, value) and get(key).
    • Values can be small (strings) or large (blobs).
  • Non-Functional:
    • High Availability: Even if machines fail, the system works.
    • Scalability: Must handle petabytes of data across thousands of servers.
    • Durability: Data must not be lost.
    • Configurable Consistency: Trade-off between latency and correctness.

2. High-Level Design (Distributed Hash Table)

  • Core Problems: Distribution, Replication, Fault Tolerance.
  • Components:
    1. Clients: Talk to the cluster.
    2. Coordination Service (e.g., ZooKeeper): Keeps track of which nodes are alive and where data lives.
    3. Storage Nodes: The servers that actually hold the data.

3. Deep Dive

  • Data Distribution (Consistent Hashing):
    • Problem: Simple hash(key) % N doesn't work when N (number of servers) changes (adding/removing servers causes massive rehashing).
    • Solution: Consistent Hashing.
    • Servers are placed on a hash ring (values 0 to 2^64-1).
    • A key is hashed, and you walk clockwise to find the nearest server.
    • When a server is added, only a fraction of keys need to be remapped.
  • Data Replication:
    • To handle failures, data must be replicated. Look at the next N servers on the ring (e.g., N=3). Data is written to all 3.
    • This forms a Replication Factor of 3.
  • Consistency (Quorum):
    • In a replicated system, how do we ensure we read the latest value?
    • Use Quorum:
      • W = Write Quorum (Number of replicas that must acknowledge a write).
      • R = Read Quorum (Number of replicas consulted during a read).
      • N = Replication Factor.
    • If R + W > N, you have Strong Consistency (e.g., N=3, W=2, R=2 ensures at least 1 common node has the latest write).
    • If R + W <= N, you have Eventual Consistency (faster but might read stale data).
  • Failure Handling (Hinted Handoff):
    • If a replica node is down during a write, another node temporarily accepts the write (with a hint). When the dead node recovers, the hinted write is handed off to it.

Question 7: Design a Rate Limiter

Difficulty: ⭐ Easy/Medium

1. Requirements

  • Functional:
    • Limit the number of requests a user/client can send to an API within a time window (e.g., 100 requests per minute).
    • Block requests that exceed the limit.
  • Non-Functional:
    • Low Latency: The limiter must be fast (it runs on every request).
    • Distributed: Must work across multiple servers.
    • Accuracy: Avoid letting through more requests than allowed.

2. High-Level Design

  • Location: Usually implemented as a middleware on the API Gateway or a separate Redis cluster.
  • Algorithms:
    1. Token Bucket: A bucket holds tokens. Tokens added at a fixed rate. Each request consumes a token. If bucket empty, request denied.
    2. Sliding Window Log: Tracks a log of timestamps for each user. Complex but most accurate.

3. Deep Dive

  • The "Sliding Window" Solution (Redis + Sorted Sets):
    • This is the industry standard for accuracy.
    • For a rate limit of 100 requests per minute:
    • For a given user, store a Redis Sorted Set. The score is the timestamp.
    • ZREMRANGEBYSCORE user:123 -inf (now - 60s) (Remove old entries).
    • ZCARD user:123 (Count remaining entries).
    • If count < 100:
      • ZADD user:123 now timestamp (Add this request).
      • Allow request.
    • Else:
      • Deny request.
  • Distributed Environment:
    • If you have multiple API servers, the rate limiter must be centralized. Redis is the perfect solution as a single source of truth.
  • Race Conditions:
    • If two requests check the count at the same time, they might both think they are under the limit.
    • Solution: Use Redis Lua Scripts, which are atomic.

Glossary of Key Terms

  • Load Balancer: Distributes incoming traffic (Reverse Proxy, HAProxy, Nginx).
  • CDN (Content Delivery Network): Geographically distributed servers for serving static content (Cloudflare, Akamai).
  • Consistent Hashing: A technique for distributing data across a cluster that minimizes reorganization when nodes are added or removed.
  • CAP Theorem: A distributed system can only guarantee two of three: Consistency, Availability, and Partition Tolerance. (Network partitions are a given, so you choose between CP and AP).
  • ACID: Atomicity, Consistency, Isolation, Durability (Traditional SQL).
  • BASE: Basically Available, Soft state, Eventual consistency (NoSQL).
  • Sharding (Partitioning): Splitting a large database into smaller, faster, more easily managed parts called data shards.
  • Quorum: The minimum number of votes that a distributed transaction has to obtain in order to be allowed to perform an operation.
  • Heartbeat: A periodic signal sent between machines to indicate that they are still alive.
  • Gossip Protocol: A way for nodes in a cluster to communicate membership and state changes without a central registry (used in Cassandra).
  • Idempotency: The property that performing the same operation multiple times has the same effect as performing it once (critical for payments).

Based on analyzing thousands of interview reports from top tech companies (FAANG + Microsoft, Uber, Airbnb, etc.), here are the Top 10 most frequently asked system design questions.

If you only have time to prepare for 5 questions, start with the Bold ones.

The "Big 3" (You WILL see one of these)

These three questions appear in over 70% of all system design interviews. Master these first.

  1. Design URL Shortener (e.g., TinyURL)

    • Why? It perfectly tests fundamentals: Hashing, Key-Value stores, 301 vs 302 redirects, and estimating scale. It’s the "Hello World" of system design.
  2. Design WhatsApp / Facebook Messenger / Chat System

    • Why? Tests real-time communication (WebSockets), data persistence, handling offline users, and state synchronization across devices.
  3. Design Twitter / News Feed

    • Why? Tests the core social media challenge: the Fanout Problem (Push vs Pull models). It forces you to discuss caching, pre-computation, and handling celebrities (high-profile users).

The "Media & Sharing" Tier

These are extremely common for mid-to-senior roles.

  1. Design YouTube / Netflix (Video Streaming Platform)

    • Why? Tests your knowledge of CDNs, adaptive bitrate streaming (HLS/DASH), and handling large binary files (blob storage). Very common at Media companies (Netflix, Hulu, Spotify) but also asked at general tech companies.
  2. Design Instagram / Facebook Photo Sharing

    • Why? A mix of Twitter and YouTube. Tests how to store and serve media efficiently, generate timelines, and handle followers.

The "E-Commerce & Infrastructure" Tier

Common for backend and infrastructure roles.

  1. Design Uber / Lyft (Ride Hailing)

    • Why? Tests geospatial indexing (QuadTrees/Geohashing). How do you find drivers near a user in real-time? It’s a unique challenge that separates junior from senior engineers.
  2. Design a Distributed Key-Value Store (e.g., Redis / Cassandra)

    • Why? If you're interviewing for a infrastructure or storage team, this is a must. Tests deep distributed systems knowledge: Consistent Hashing, Quorum, CAP Theorem, Gossip Protocols.
  3. Design a Rate Limiter

    • Why? Deceptively simple, but tests your knowledge of middleware, Redis, and concurrency. Often asked as a "warm-up" or as a deep-dive for backend roles.
  4. Design a Parking Lot / Elevator System (OO Design)

    • Why? While technically Object Oriented Design (not System Design), this is the most common interview question for junior/mid-level roles at Amazon and Microsoft. It tests your ability to model real-world objects with classes and design patterns.
  5. Design Amazon / E-commerce Website (Shopping Cart)

    • Why? Tests handling sessions, inventory management, and database consistency (preventing overselling).

Cheat Sheet: Which question for which company?

While all companies ask the common ones, they often have favorites:

  • Google: Loves Youtube (Video) and Gmail (Chat/Email) and Maps (Location/Uber style).
  • Facebook/Meta: Loves News Feed (Twitter), Messenger (Chat), and Marketplace (E-commerce/Instagram).
  • Amazon: Loves E-commerce (Shopping Cart, Order Checkout), Prime Video (Netflix), and Rate Limiters (AWS infrastructure mindset).
  • Uber: Obviously loves Uber/Lyft itself, but also Real-time tracking and Matching algorithms.
  • Microsoft: Often mixes OO Design (Elevator/Parking Lot) with System Design (Design Teams/Skype).

Summary: The "Must Know" List

If you have limited time, here is your priority list:

  1. URL Shortener
  2. Chat System
  3. Twitter Feed
  4. YouTube
  5. Uber (Location services)

System Design Q&A for ROR Senior Developer

Interview Perspective Edition

Here's how to speak and structure your answers during the actual interview, without code dumps.


QUESTION 1: Design URL Shortener (TinyURL)

What the Interviewer Wants to See:

  • Can you handle high read/write ratios?
  • Do you understand hashing/collision strategies?
  • Can you think about caching and database choices?

How to Answer (Interview Style):


Interviewer: "Let's start with designing a URL shortener like bit.ly. How would you approach this?"


You:

"Before I jump into the architecture, let me make sure I understand the requirements correctly."

[PAUSE - This shows you're methodical]

"Are we building the core functionality - generating short URLs and redirecting - or do we also need features like custom aliases, analytics, and expiration?"

Interviewer: "Let's focus on the core for now. Generate short URLs, redirect, and handle scale."


You:

"Great. Let me outline what we're building:

Functional Requirements:

  • Users give us a long URL, we give back a unique short URL
  • When someone visits the short URL, they get redirected to the original
  • Short URLs should be randomly generated, not sequential

Non-Functional Requirements:

  • High availability is critical - redirects must always work
  • Low latency for redirects - under 10ms ideally
  • The system should scale to handle millions of URLs

Let me estimate scale quickly:

  • Let's assume 100 million new URLs per month
  • That's about 40 URLs per second for writes
  • Reads are typically 100x writes - so 4,000 redirects per second
  • Storage: If each record is 500 bytes, that's 50GB per month, about 3TB over 5 years

This fits comfortably on a few databases, but the read load means we need caching."


Interviewer: "Okay, how would you design the core logic for generating the short URL?"


You:

"This is actually the most interesting part. There are several approaches:

Approach 1 - Hash the URL: We could take the long URL, run it through MD5 or SHA256, and take the first 7 characters. But collisions are possible - two different URLs could generate the same hash. We'd need collision detection and regeneration.

Approach 2 - Base62 Encoding: We can generate a unique ID from a database sequence, then convert it to Base62 (a-z, A-Z, 0-9). 7 characters gives us 62^7 possibilities - about 3.5 trillion combinations. This guarantees uniqueness but produces sequential URLs, which are predictable.

Approach 3 - Pre-generated Keys (My Preferred Approach): We pre-generate random 7-character strings in batches and store them in a 'key pool' table with an 'available' flag. When a user requests a short URL, we just grab the next available key. This avoids collisions entirely and is very fast."


Interviewer: "Why is predictability a problem?"


You:

"Great question. If URLs are sequential, anyone could guess other URLs. For example, if I create a short URL for 'my-private-document.pdf', someone could just try 'abc001', 'abc002', etc and discover all URLs. Random keys prevent this.

The key pool approach solves both problems - they're random AND we avoid collisions."


Interviewer: "What about the redirect flow? How do you make it fast?"


You:

"The redirect path is the critical path - this is what users experience. Here's the flow:

When a request comes in for '/abc123', we need to find the original URL. The naive approach is a database lookup, but that's too slow at 4,000 reads per second.

I'd introduce a cache - Redis or Memcached.

The flow becomes:

  1. Check Redis cache first
  2. On cache miss, check the database
  3. Populate the cache for next time

With a 95% cache hit rate, we serve most requests in under 5ms.

For the database itself, I'd use Cassandra or DynamoDB - they're optimized for key-value lookups and handle high throughput better than PostgreSQL. But PostgreSQL could work initially with read replicas."


Interviewer: "What happens when Redis goes down?"


You:

"That's a good failure scenario to consider. We need fallbacks:

  1. Circuit breaker pattern - if Redis is down, we bypass it and go directly to the database. It'll be slower but still works.

  2. Multiple Redis replicas - have a master and replicas. If master fails, promote a replica.

  3. Local cache on application servers - each Rails server could keep an in-memory LRU cache as a second level. If Redis is down, the local cache still catches some requests.

  4. Database connection pooling - ensure we have enough database connections to handle the fallback load.

The key is graceful degradation - the system stays up, just gets slower temporarily."**


Interviewer: "How would you handle custom aliases?"


You:

"Custom aliases introduce interesting challenges:

First, validation - we need to ensure the custom alias isn't already taken and doesn't contain invalid characters.

Second, the key pool approach doesn't work because users choose their own keys.

The solution is a two-step process:

  1. Try to insert the custom alias into the database with a unique constraint
  2. If it fails (duplicate), return an error to the user

We need to handle race conditions - if two users request the same custom alias simultaneously, both might check availability, see it's free, and both try to insert. The database unique constraint is our safety net - the second insert will fail.

In Rails, we'd use:

begin
  ShortLink.create!(custom_alias: params[:alias])
rescue ActiveRecord::RecordNotUnique
  render json: { error: 'Alias taken' }, status: 409
end

But note - we'd actually use a transaction with a lock to prevent the race condition entirely."


Interviewer: "How would you scale this beyond one database?"


You:

"We have a few dimensions to scale:

1. Read Scaling: Add read replicas for the database. Rails supports this natively with the connected_to gem. All redirects go to replicas, only writes go to primary.

2. Database Sharding: Once we exceed a few terabytes, we need to shard. We'd shard by short_code - this makes lookups efficient because we know exactly which shard to query. Consistent hashing works well here.

3. Geographic Distribution: Place Redis and application servers in different regions. Use Route53 latency-based routing to send users to the closest region. This reduces latency globally.

4. CDN for Redirects: Interestingly, we could even put a CDN in front. If we set long TTLs on 301 redirects, browsers will cache them. But this makes analytics harder - we wouldn't know about clicks."


Interviewer: "How do you handle analytics? Click counts?"


You:

"Analytics shouldn't impact the redirect path. We handle it asynchronously:

  1. When a redirect happens, we publish a message to Kafka or RabbitMQ
  2. A separate analytics service consumes these messages and updates counters
  3. We can batch updates - update the database every 100 clicks or every minute

This prevents the redirect from waiting on analytics writes.

For real-time dashboards, we might use a separate analytics database like ClickHouse or Elasticsearch that's optimized for aggregation queries."


Interviewer: "What would you monitor in production?"


You:

"I'd track several layers:

Business Metrics:

  • URLs created per minute
  • Redirects per minute
  • Top 100 URLs

Technical Metrics:

  • Cache hit ratio (should be >95%)
  • Redirect latency (p95 under 10ms)
  • Database connection pool usage
  • Background job queue sizes

Alerting:

  • If cache hit ratio drops below 90%
  • If latency exceeds 50ms for 5 minutes
  • If error rate exceeds 1%

In Rails, we'd use NewRelic or DataDog agents to collect this automatically, with custom instrumentation for business metrics."


What You've Demonstrated:

βœ… Requirements gathering - You didn't jump to solution βœ… Scale estimation - You quantified the problem βœ… Trade-off analysis - You compared approaches βœ… Failure handling - You thought about Redis going down βœ… Rails expertise - You mentioned ActiveRecord patterns βœ… Beyond Rails - You discussed Cassandra, Kafka, CDNs


QUESTION 2: Design Twitter News Feed

What the Interviewer Wants to See:

  • Can you handle the fanout problem?
  • Do you understand caching strategies?
  • How do you handle celebrities (high-profile users)?

How to Answer (Interview Style):


Interviewer: "Let's design Twitter's news feed. Users can post tweets and see tweets from people they follow."


You:

"I'd like to clarify the scope first. Are we focusing on the timeline generation, or do we also need to handle posting tweets, following/unfollowing, and search?"

Interviewer: "Focus on timeline generation - how a user sees their feed."


You:

"Great. Let me outline the core challenge:

A user's feed should show tweets from everyone they follow, in reverse chronological order.

The naive approach - what a junior developer might do - is:

SELECT tweets.* FROM tweets
JOIN follows ON tweets.user_id = follows.followed_id
WHERE follows.follower_id = ?
ORDER BY tweets.created_at DESC
LIMIT 50

This works for 100 users, but not for 100 million. It's scanning too much data.

The core problem is fanout - delivering a single tweet to thousands or millions of followers.

There are two main approaches: Push and Pull."


Interviewer: "Explain both."


You:

"Pull Model (Read-time fanout): When a user requests their feed, we go find all the people they follow, get recent tweets from each, merge them, and sort.

Pros:

  • Simple to implement
  • No extra storage for feeds
  • Celebrities don't cause problems

Cons:

  • Slow for users following many people
  • Complex queries at read time

Push Model (Write-time fanout): When a user tweets, we immediately push that tweet into a cache for all their followers. Each user has a pre-computed list of tweet IDs in Redis.

Pros:

  • Feed read is incredibly fast (just fetch from Redis)
  • Simple read logic

Cons:

  • Write overhead - a celebrity with 100M followers causes 100M writes per tweet
  • Storage overhead - storing feeds for everyone

Twitter actually uses a hybrid approach. "


Interviewer: "Tell me about the hybrid approach."


You:

"The hybrid approach acknowledges that not all users are equal:

Regular users (99.9% of users): Use push model. When they tweet, we write to all their followers' feeds. This works because regular users have maybe a few hundred followers.

Celebrities (users with >10k followers): Use pull model. We don't push their tweets to followers. Instead, when a follower loads their feed, we fetch the celebrity's recent tweets separately and merge them in.

This solves the celebrity problem - one tweet doesn't cause millions of writes.

Implementation details:

  • Maintain a 'celebrity' flag in Redis for users with many followers
  • When a regular user tweets, background job pushes to followers
  • When loading feed, get pre-computed tweets from Redis for regular followed users
  • Additionally query recent tweets from celebrities directly
  • Merge and sort in memory"

Interviewer: "How do you store the pre-computed feeds?"


You:

"Redis sorted sets are perfect for this:

For each user, we maintain:

feed:{user_id} - Sorted set of (timestamp, tweet_id)

When a user tweets, we add that tweet ID to the feed of all their followers:

ZADD feed:{follower_id} {timestamp} {tweet_id}

We also limit the size - keep only the last 800 tweets per user. This prevents Redis from growing infinitely:

ZREMRANGEBYRANK feed:{user_id} 0 -801

When a user loads their feed, we just:

ZREVRANGE feed:{user_id} 0 49 WITHSCORES

This gives us the 50 most recent tweet IDs, which we then fetch from the database.

The database stores the actual tweet content. Redis only stores IDs. "


Interviewer: "What happens when someone unfollows a user?"


You:

"Unfollow is tricky because we need to remove those tweets from the feed.

Option 1 - Remove on unfollow: When someone unfollows, we could scan their feed and delete all tweets from that user. But with 800 tweets in the feed, this is O(800) operation per unfollow. If someone unfollows 100 people in a day, that's fine. But if a celebrity gets mass-unfollowed, it's heavy.

Option 2 - Filter at read time: Don't remove anything. When reading the feed, we have the list of followed users. We can fetch tweet IDs from Redis, then filter out any from unfollowed users before displaying. This adds a small overhead to reads but avoids write amplification.

Option 3 - Let them age out: Since we only keep the last 800 tweets, unfollowed user's tweets will eventually fall off the end. This is the simplest but means unfollowed content might still appear for a while.

I'd probably choose Option 2 with caching of the follow list. "


Interviewer: "How do you handle new users? Their feed would be empty."


You:

"That's a great onboarding problem. We need to seed their feed:

Approach:

  1. When a user signs up, ask them to select interests or follow suggested accounts
  2. Immediately kick off a background job to pre-populate their feed
  3. The job fetches recent popular tweets from those accounts and builds the initial Redis feed

We can also have a 'trending' fallback: If the user hasn't followed anyone yet, show trending tweets from their region. This gives them content while they build their follow list.

In Rails, we'd use an after_create hook:

after_create_commit :seed_feed, if: :new_user?

def seed_feed
  SeedFeedJob.perform_later(id)
end
```"

---

**Interviewer:** "How would you handle the database for tweets?"

---

**You:**

**"Tweets have a specific access pattern:**
- High write volume (thousands per second)
- Mostly reads by ID (when displaying a feed)
- Sometimes reads by user (profile pages)

**PostgreSQL could work initially, but we'd hit limits:**

**I'd choose Cassandra for tweets storage:**
- Linear scalability - just add nodes
- High write throughput
- Automatic partitioning by tweet ID or user ID

**Schema design in Cassandra:**
```sql
-- For timeline queries (by time)
CREATE TABLE tweets_by_user (
  user_id UUID,
  tweet_id TIMEUUID,
  content TEXT,
  PRIMARY KEY (user_id, tweet_id)
) WITH CLUSTERING ORDER BY (tweet_id DESC);

-- For single tweet lookup
CREATE TABLE tweets (
  tweet_id UUID PRIMARY KEY,
  user_id UUID,
  content TEXT,
  created_at TIMESTAMP
);

This way, fetching a user's tweets is a single partition query - very fast. "


Interviewer: "What about search?"


You:

"Search is a different beast entirely. The tweet database isn't optimized for text search.

For search, I'd introduce Elasticsearch:

  1. When a tweet is created, index it in Elasticsearch asynchronously
  2. Search queries hit Elasticsearch, which returns tweet IDs
  3. Fetch the actual tweets from Cassandra using those IDs

This keeps search load off the main database and gives us full-text search, faceting, and relevance scoring.

We'd use Logstash or a Kafka consumer to move data from PostgreSQL/Cassandra to Elasticsearch. "


Key Takeaways for This Question:

| What They Ask | What You Should Say | |:---|:---| | "How do you generate feeds?" | Explain Push vs Pull, then hybrid approach | | "What about celebrities?" | Treat them specially - pull model for their tweets | | "How do you store feeds?" | Redis sorted sets with timestamp as score | | "What about unfollow?" | Filter at read time or let age out | | "New users?" | Seed with suggested follows + trending | | "Database for tweets?" | Cassandra for write scaling |


QUESTION 3: Design a Chat System (WhatsApp)

What the Interviewer Wants to See:

  • Do you understand real-time communication?
  • How do you handle presence and state?
  • Can you design for high availability?

How to Answer (Interview Style):


Interviewer: "Design a chat application like WhatsApp. Support 1-on-1 messaging and group chats."


You:

"Let me clarify the scope. Are we focusing on the real-time message delivery, or do we also need to handle media sharing, voice/video calls, and message history?"

Interviewer: "Focus on text messaging - 1-on-1 and groups, with delivery receipts and online status."


You:

"Understood. Let me outline the scale first:

WhatsApp has about 2 billion users, but let's scale down for this discussion. Let's assume:

  • 100 million daily active users
  • Each user sends 50 messages per day average
  • That's 5 billion messages daily, or about 58,000 messages per second
  • Peak load might be 3-4x that

This is a massive write load. Most of our design decisions will be driven by write scalability. "


Interviewer: "What's the high-level architecture?"


You:

"The architecture has several key components:

1. Connection Management: Users maintain persistent connections to our servers. HTTP doesn't work well for real-time - we need WebSockets or a similar protocol. Each user connects to a specific 'connection server' that maintains their socket.

2. Presence Service: Tracks who's online, last seen, typing status. This needs to be fast and globally available.

3. Message Routing: When user A sends to user B, we need to deliver it. If B is online, we route through their connection server. If offline, we store it.

4. Message Storage: We need durable storage of message history. Users expect to see history when they switch devices.

5. Group Management: For groups, we need to track members and handle fanout to all participants."


Interviewer: "How do you handle the connection servers? Users connect to different servers."


You:

"This is a key challenge - how does server A know which server user B is connected to?

We need a central directory of connections:

Solution: Redis pub/sub with a presence hash:

When a user connects to Server A:

HSET connections:global {user_id} {server_id}

When Server A needs to send to user B:

  1. Look up connections:global to find which server user B is on
  2. Forward the message to that server via internal RPC
  3. That server delivers via WebSocket

For group messages, this becomes:

  • Look up all group members
  • For each, find their server
  • Fan out to those servers
  • Each server delivers to its connected users

This is why WhatsApp uses the XMPP protocol with this exact pattern. "


Interviewer: "What about offline users?"


You:

"Messages for offline users can't be lost. We need persistent storage:

Flow for offline delivery:

  1. User B is offline (not in the connections hash)
  2. Server A stores the message in a 'pending messages' queue for user B
  3. When user B comes online, their connection server fetches all pending messages
  4. Messages are delivered in order

Storage for pending messages:

  • Use Redis lists with persistence: LPUSH pending:{user_id} {message_data}
  • Set expiry - messages older than 30 days can be deleted
  • For durability, also write to Cassandra as the source of truth

The challenge is handling large numbers of pending messages - users might be offline for days. We need efficient pagination when they reconnect. "


Interviewer: "How do you implement 'last seen' and online status?"


You:

"Presence is tricky because it's a high-churn, eventually consistent system.

Online status:

  • When user connects: SADD online_users {user_id}
  • When user disconnects: Don't immediately remove - use a timeout
  • Send heartbeats every 30 seconds to refresh TTL
  • If no heartbeat for 90 seconds, consider them offline

Last seen:

  • Update HSET last_seen {user_id} {timestamp} on disconnect and every 5 minutes while online
  • When showing status, check online set first, if not present, show last seen

Privacy concerns:

  • Users can configure who sees their last seen (everyone, contacts, nobody)
  • This filtering happens when displaying - we store the raw timestamp but apply permissions

Scale:

  • 100M online users means 100M keys in Redis
  • This is fine - Redis can handle it with enough memory
  • We might shard presence data across Redis clusters by user ID range"

Interviewer: "How do you handle group chats with 1000+ participants?"


You:

"Large groups are a different challenge:

Fanout problem: Sending one message to 1000 people means 1000 deliveries.

Option 1 - Server fanout (WhatsApp's approach):

  • Server receives message once
  • Server fans out to all participants
  • Works well up to a few thousand
  • Each message creates significant server load

Option 2 - Client fanout (Slack's approach):

  • Server sends message once to a group channel
  • All connected clients receive it simultaneously via pub/sub
  • Requires all group members to be connected to the same channel
  • Scales better for very large groups

For WhatsApp-like scale, I'd use Option 1 with optimizations:

Delivery optimization:

  1. Group metadata includes member list with their connection servers
  2. Group message arrives at one server
  3. That server fans out to the connection servers for all members
  4. Each connection server delivers to its connected members
  5. Offline members get messages stored in pending queues

This reduces the fanout from O(members) to O(servers). "


Interviewer: "How do you store message history?"


You:

"Message history needs to be:

  • Highly durable (never lose messages)
  • Queryable by conversation
  • Ordered chronologically

PostgreSQL isn't ideal here - we need write scalability.

I'd use Cassandra with a careful schema:

-- For 1-on-1 chats, conversation_id = sorted(user_ids)
CREATE TABLE messages_by_conversation (
  conversation_id UUID,
  message_id TIMEUUID,
  sender_id UUID,
  content TEXT,
  created_at TIMESTAMP,
  PRIMARY KEY (conversation_id, created_at, message_id)
) WITH CLUSTERING ORDER BY (created_at DESC);

-- For group chats, conversation_id = group_id
-- Same schema works

Why this works:

  • TIMEUUID ensures uniqueness and includes timestamp
  • Clustering by time gives us efficient time-range queries
  • Partition by conversation means all messages for one chat are co-located
  • Writes are append-only - no updates, which Cassandra handles well

For media messages, we store references to S3/CDN URLs instead of binary data. "


Interviewer: "How do read receipts work?"


You:

"Read receipts add complexity because they're stateful:

Flow:

  1. User B reads message ID 123
  2. Client sends read receipt to server
  3. Server needs to notify User A

Implementation options:

Option 1 - Per-message tracking:

  • Store read status in Cassandra with the message
  • UPDATE messages SET read_by = read_by + [user_id] WHERE id = 123
  • When User A asks for status, query the message
  • Problem: Many updates to same message

Option 2 - Aggregated in Redis:

  • Use Redis sets: SADD read:message:123 {user_id}
  • When User A wants status, check set size
  • Periodically flush to Cassandra for persistence
  • Faster, but eventual consistency

For group chats with many readers:

  • Show counts, not individual names ("Seen by 24 people")
  • Only show full list on explicit request
  • Reduces complexity and storage"

Interviewer: "How do you handle typing indicators?"


You:

"Typing indicators are transient state - they don't need persistence:

Implementation in Redis with expiry:

  • When user starts typing: SETEX typing:{conversation_id}:{user_id} 5 "typing"
  • When user stops: DEL typing:{conversation_id}:{user_id}
  • When displaying conversation, query all keys matching typing:{conversation_id}:*

Broadcast strategy:

  • Don't broadcast every keystroke - that's too noisy
  • Send 'typing' event when user starts
  • Send 'stop_typing' after 3 seconds of inactivity
  • Or send periodic heartbeats while typing

Scale considerations:

  • This is ephemeral - Redis memory is fine
  • Each typing indicator is just a small key with TTL
  • We can handle millions of concurrent typists"

Interviewer: "How would you test this system's reliability?"


You:

"Testing a real-time system requires multiple approaches:

1. Unit/Integration tests:

  • Test message routing logic
  • Test offline message queuing
  • Test group fanout

2. Chaos engineering:

  • Simulate server failures - kill random connection servers
  • Test that clients reconnect properly
  • Verify no message loss during failover

3. Load testing:

  • Simulate millions of concurrent connections
  • Ramp up message rates to find breaking points
  • Test with large groups (10k+ members)

4. Network degradation:

  • Simulate high latency connections
  • Test reconnection with exponential backoff
  • Verify message ordering under poor network conditions

5. Data integrity tests:

  • Verify that after any failure, message counts match
  • Test that offline messages are eventually delivered
  • Verify read receipts are accurate"

Summary Table - Chat System:

| Component | Technology | Reasoning | |:---|:---|:---| | Real-time connection | WebSockets (Action Cable) | Bi-directional, persistent | | Connection directory | Redis Hash | Fast lookups, shared across servers | | Presence | Redis Sets + TTL | Ephemeral, high churn | | Message queue | Redis Lists | Fast, with persistence option | | Message storage | Cassandra | Write-optimized, scalable | | Group fanout | Server-to-server RPC | Reduces connections per server | | Read receipts | Redis Sets + Cassandra | Fast updates, persistent backup | | Media storage | S3 + CDN | Optimized for large files |


QUESTION 4: Design a Video Platform (YouTube)

What the Interviewer Wants to See:

  • Do you understand CDNs and caching?
  • Can you handle large file uploads?
  • How do you design async processing pipelines?

How to Answer (Interview Style):


Interviewer: "Let's design YouTube. Users can upload and watch videos."


You:

"I want to clarify the scope. Are we focusing on the video upload and streaming pipeline, or do we also need recommendations, comments, and user channels?"

Interviewer: "Focus on upload and streaming - how videos are stored and served."


You:

"Understood. Let me outline the scale:

YouTube has 500 hours of video uploaded every minute. That's 30,000 hours per hour, or 720,000 hours per day.

Storage:

  • If average bitrate is 5 Mbps for compressed video
  • 720,000 hours Γ— 5 Mbps Γ— 3600 seconds = 1.6 petabytes per day of compressed video
  • Plus original uploads, multiple resolutions, audio tracks

Bandwidth:

  • Billions of views per day
  • Each view streams data
  • Total bandwidth is in terabits per second

This is a massive data problem. The architecture is entirely about moving bytes efficiently. "


Interviewer: "Walk me through the upload flow."


You:

"The upload flow has several stages:

1. Client upload: The naive approach is uploading directly to our Rails servers. This is a mistake - it ties up server processes for minutes at a time, and Rails isn't optimized for large file handling.

Better approach: Direct-to-S3 upload

  • Client requests a presigned URL from our API
  • Rails generates a time-limited URL for S3 upload
  • Client uploads directly to S3, bypassing our servers
  • This uses S3's bandwidth, not ours

2. Upload completion:

  • S3 triggers a Lambda function or SQS message
  • This notifies our Rails app that upload is complete
  • We create a Video record with status 'processing'

3. Processing queue:

  • Video ID goes into a processing queue (RabbitMQ/Kafka)
  • Processing workers pick up jobs
  • We can prioritize based on upload time, user tier, etc."

Interviewer: "What happens in video processing?"


You:

"Raw video isn't ready for streaming. We need to transform it:

Processing steps:

1. Validation:

  • Check file integrity
  • Scan for malware
  • Verify format is supported

2. Transcoding: Raw video is huge. We need to encode it into multiple formats and resolutions:

  • Resolutions: 240p, 360p, 480p, 720p, 1080p, 4K
  • Formats: H.264 (compatible), H.265 (efficient), VP9 (web)
  • Audio: Multiple bitrates, different languages

3. Segmentation: For adaptive streaming, we split videos into chunks:

  • Each chunk is 2-10 seconds of video
  • We create a manifest file (.m3u8 for HLS) listing all chunks
  • Client switches between quality levels by requesting different chunk versions

4. Thumbnail generation:

  • Extract frames at regular intervals
  • Generate preview thumbnails

5. Storage:

  • Store all transcoded versions in S3/cloud storage
  • Update video status to 'ready'
  • Warm the CDN with popular videos"

Interviewer: "Why do we need multiple resolutions?"


You:

"Adaptive Bitrate Streaming is the key to good user experience:

When a user watches on a phone with poor cellular connection, they can't stream 1080p smoothly. The video would buffer constantly.

With adaptive streaming:

  • Client downloads the manifest file first
  • It monitors network conditions and buffer level
  • If network is good, it requests high-res chunks
  • If network degrades, it seamlessly switches to lower-res chunks
  • The video continues playing without interruption

This requires all resolutions to be available, and chunks to be aligned so switching is seamless.

Common protocols:

  • HLS (Apple) - uses .m3u8 playlists, .ts segments
  • MPEG-DASH (open standard) - uses MPD manifests
  • Most platforms support both"

Interviewer: "How does the CDN fit in?"


You:

"CDN is critical for video delivery. Here's why:

Without CDN:

  • All requests go to our origin servers in one region
  • Users in Asia downloading from US servers experience high latency
  • Our servers handle terabits of traffic - impossibly expensive

With CDN:

  • Video chunks are cached at edge locations worldwide
  • User in Tokyo downloads from Tokyo edge server
  • Latency drops from 200ms to 10ms
  • Our origin servers handle only cache misses and dynamic content

CDN strategy:

  • Popular videos get cached everywhere
  • Less popular videos cached only in regions with demand
  • Long-tail content served from origin but cached after first view

We need cache invalidation when:

  • Video is removed
  • Thumbnail updated
  • New resolution added

Cost optimization: CDNs charge by bandwidth. Popular videos on CDN, long-tail maybe from origin with CDN as fallback."


Interviewer: "How do you handle viral videos? Suddenly millions of views?"


You:

"Viral spikes are challenging. A video with 100 views suddenly gets 1 million per hour.

Problems:

  • CDN might not have it cached everywhere yet
  • Origin servers get overwhelmed
  • Database hot partitions

Solutions:

1. Predictive caching:

  • Monitor view velocity - if views spike 1000%, proactively push to more CDN nodes
  • Use machine learning to predict viral content

2. CDN always has it:

  • For any video above certain threshold (e.g., 10,000 views), ensure it's cached globally
  • This is a background job that warms CDN

3. Throttling at edges:

  • If origin is overwhelmed, CDN can queue requests
  • Serve stale content with warning while refreshing

4. Database sharding:

  • Ensure video metadata is sharded by video_id
  • A viral video affects only its own shard, not entire database

5. Read replicas:

  • All view traffic goes to replicas
  • Primary handles only writes (comments, likes)"

Interviewer: "How do you store video metadata? Views, likes, comments?"


You:

"Video metadata has different access patterns from video content:

Two-tier storage:

1. Fast-access metadata (views, likes):

  • Stored in Redis for real-time updates
  • INCR video:views:{video_id} for views
  • SADD video:likes:{video_id} {user_id} for likes
  • Periodically flush to persistent storage

2. Persistent metadata (title, description, comments):

  • PostgreSQL for relational data
  • Sharded by video_id
  • Comments stored in separate table with indexing

3. Search:

  • Elasticsearch for full-text search on titles and descriptions
  • Index updated asynchronously

This separation lets us handle:

  • Millions of view increments without database load
  • Complex queries on metadata without affecting view counting"

Interviewer: "How do you handle user uploads of copyrighted content?"


You:

"Copyright detection is complex. YouTube uses Content ID:

The flow:

1. Reference database:

  • Copyright holders upload reference files of their content
  • These are fingerprinted - audio waveforms, video frames

2. Upload scanning:

  • When a user uploads, we generate fingerprints
  • Compare against reference database
  • Can detect even modified content (different speed, cropped)

3. Matching:

  • If match found, apply policy set by copyright holder:
    • Block video
    • Monetize (run ads for copyright holder)
    • Track (just collect data)
    • Mute audio

Technical implementation:

  • This is too CPU-intensive for real-time
  • Run as background job after upload
  • Can take minutes to hours for long videos
  • Store results in database for enforcement"

Summary Table - Video Platform:

| Component | Technology | Reasoning | |:---|:---|:---| | Upload | Direct-to-S3 with presigned URLs | Avoid blocking Rails servers | | Processing Queue | RabbitMQ/Kafka | Async, retryable, scalable | | Transcoding | FFmpeg on worker fleet | CPU/GPU intensive, parallelizable | | Storage | S3/Cloud Storage | Durable, cheap, globally available | | Delivery | CDN (CloudFront/Akamai) | Edge caching, low latency | | Metadata DB | PostgreSQL sharded | ACID for comments, user data | | Real-time counts | Redis | Fast increments, high throughput | | Search | Elasticsearch | Full-text search, relevance | | Copyright | Fingerprinting service | Custom ML models |


QUICK REFERENCE: Common Questions with One-Line Answers

| Question | The Key Insight (What They Want to Hear) | |:---|:---| | "How do you handle database scaling?" | "Read replicas for reads, sharding for writes, caching for hot data." | | "SQL or NoSQL?" | "Depends on access pattern: SQL for complex queries/transactions, NoSQL for high throughput/simple lookups." | | "How do you prevent downtime during deployment?" | "Blue-green deployment, feature flags, rolling updates, canary releases." | | "How do you handle traffic spikes?" | "Auto-scaling groups, CDN absorption, rate limiting, queueing." | | "How do you ensure data consistency?" | "Transactions for critical paths, eventual consistency elsewhere, idempotent operations." | | "How do you handle failures?" | "Retries with exponential backoff, circuit breakers, fallbacks, graceful degradation." | | "How do you monitor the system?" | "Logs (ELK), metrics (Prometheus), tracing (Jaeger