Skip to content

Case 3: Message Queue

Course 3: E-commerce Flash Sale System Architecture Evolution Case Study

Goal: E-commerce flash sales are extremely challenging high-concurrency scenarios, often serving as a "touchstone" for testing system architecture capabilities. This course will guide you through the gradual optimization process of a flash sale system, providing a deep understanding and mastery of core architectural challenges such as handling instantaneous high concurrency, ensuring inventory consistency, distributed transaction processing, and combating rate limiting & anti-fraud.


Phase 0: Initial System (Simplest Flash Sale Attempt)

System Description

  • Very straightforward functionality:
  • Users can browse a specific product.
  • Participate in a limited-quantity sale at a specified time (e.g., 10:00 AM) (e.g., 100 iPhones).
  • After clicking the "Buy Now" button, the system attempts to deduct stock; if successful, an order is generated.
  • Tech Stack: Still simple:
  • Frontend: A static page, plus a JavaScript countdown timer.
  • Backend: Spring Boot (monolithic application), all logic mixed together.
  • Database: MySQL (single instance, single table handles product info, stock, and orders).

Current Architecture Diagram

[User Browser] → [Web Server (Monolith, handles flash sale logic)] → [MySQL (products/orders/stock)]
Characteristics at this point: - Flash sale logic is "naive": Query stock → Check if stock > 0 → Update stock (-1) → Create order. All done within a single database transaction. - Fatal Flaw: Imagine 1000 or more users clicking "Buy Now" precisely at 10:00:00, competing for those 100 iPhones. What happens? A flood of requests instantly overwhelms the database connection pool, CPU spikes, and the database might crash, leaving everyone unable to buy.


Phase 1: Facing Instantaneous High Concurrency → Static Pages + Cache Absorbs Traffic

Challenge Emerges

  • At the moment the flash sale starts, QPS (Queries Per Second) can surge from tens to tens or even hundreds of thousands. The database connection pool is instantly exhausted and cannot respond.

❓ Architect's Thinking Moment: How to block the first wave of the flood? Can't let traffic hit the database directly!

(What can the frontend do? Where is the core bottleneck? Is cache a panacea?)

✅ Evolution Direction: Separate Static/Dynamic Content, Move Core Logic Forward to Cache

  1. Frontend Page Staticization:
    • The flash sale activity page (including product info, countdown, etc.) should be made as purely static HTML/CSS/JS as possible.
    • Push these static resources to CDN nodes in advance. This way, the vast majority of browsing requests are handled by the CDN and never reach the backend server.
  2. Pre-warm Stock to Cache:
    • Before the sale starts, load the stock quantity of the flash sale product into Redis (e.g., SET seckill:iphone:stock 100). Subsequent stock deduction operations happen directly in Redis.
  3. Utilize Redis Atomic Operations for Stock Deduction:
    • When a purchase request reaches the backend, it no longer queries the database stock but operates directly on the Redis stock.
    • Must use Redis atomic operations (like the DECR command or more complex Lua scripts) to deduct stock, ensuring atomicity and preventing overselling due to concurrency.
    • A simple Lua script example:
      local stock = redis.call('GET', KEYS[1]) -- KEYS[1] is the stock key, e.g., seckill:iphone:stock
      if tonumber(stock) > 0 then
          redis.call('DECR', KEYS[1])
          -- Can add logic here to mark user as having purchased, prevent duplicates
          -- redis.call('SADD', KEYS[2], ARGV[1]) -- KEYS[2] is set key for purchased users, ARGV[1] is user ID
          return 1 -- Return 1 indicates successful purchase
      end
      return 0 -- Return 0 indicates sold out
      

Architecture Adjustment:

[User Browser] ----> [CDN (Static Flash Sale Page)]
             ↘ [Web Server (Receives purchase requests)] → [Redis (Atomic Stock Deduction)] → [Subsequent Async Processing...]
                                                       ↳ [MySQL (For order persistence)]
(At this point, order creation might not write to MySQL synchronously. Only after successful stock deduction in Redis is the user considered to have secured eligibility.)


Phase 2: Encountering "Wool Pullers" and Bots → Rate Limiting & Anti-Fraud is Key

New Challenge: Malicious Requests

  • Simple stock pre-warming cannot stop scalpers and bots. They use scripts to flood the flash sale API at frequencies far exceeding normal users.
  • A single IP or user sending thousands of requests in a short time not only consumes server resources but also gives legitimate users almost no chance.

❓ Architect's Thinking Moment: How to identify and block this abnormal traffic?

(IP rate limiting? User rate limiting? Are CAPTCHAs useful? How to block most malicious requests at the gateway layer?)

✅ Evolution Direction: Multi-Dimensional Rate Limiting + Human-Machine Verification

  1. Interface Layer Rate Limiting:
    • Implement rate limiting based on token bucket or leaky bucket algorithms at the API gateway or application entry layer.
    • The key is the limiting dimension: Limit not only by IP but, more importantly, by User ID (e.g., each User ID can only request the flash sale API once every 10 seconds). This usually requires Redis (SETNX or Lua scripts).
  2. Add Human Verification on Frontend:
    • Before clicking the "Buy Now" button, require users to complete a graphical CAPTCHA or slider verification (like Google reCAPTCHA, hCaptcha). This effectively blocks most simple scripts.
  3. Hide Flash Sale API + Dynamic Tokens:
    • The flash sale API URL is not directly exposed in the frontend code. Instead, shortly before the sale starts, the server dynamically issues an encrypted, time-limited Token. Users must carry a valid token when purchasing, increasing the difficulty for scripts to simulate.

Architecture Adjustment (Focus Points):

[User Browser (Requires CAPTCHA, gets dynamic token)] → [API Gateway / Rate Limiting Service (Validates token, limits IP/User ID)] → [Web Server (Handles valid requests)]
                                                               ↳ [Illegal requests blocked directly]

Phase 3: Redis Stock Deduction Succeeded, but Order Lost? → Ensuring Eventual Consistency

The Consistency Dilemma

  • Traffic is blocked, stock is atomically deducted in Redis, but new problems arise:
    • If the server crashes after successful Redis stock deduction, or subsequent MySQL order creation fails, the stock is permanently reduced by one, but the user gets no order (data inconsistency).
    • If the flash sale system is deployed across multiple instances operating on Redis simultaneously (even with atomic operations), how is the subsequent order creation coordinated?

❓ Architect's Thinking Moment: How to ensure stock deduction and order creation either both succeed or both fail?

(Distributed transaction? 2PC is too heavy. Can message queues solve this? How to compensate for failures?)

✅ Evolution Direction: Asynchronous Order Placement + Message Queue + Compensation Mechanism

  1. Asynchronize Core Flow: Split the synchronous "deduct stock + create order" flow into asynchronous steps.
    • After successful atomic stock deduction in Redis (the key proof of purchase success), the Web Server no longer immediately creates the MySQL order.
    • Instead, it sends an "order request" message containing User ID, Product ID, etc., to a Message Queue (Kafka/RabbitMQ).
  2. Independent Order Service Consumes Messages:
    • Create a separate Order Service that subscribes to the order request messages in the queue.
    • The Order Service is responsible for pulling messages from the queue and asynchronously creating order records in MySQL.
  3. Ensure Reliability and Compensation:
    • Message Queue Persistence: Ensure messages are not lost.
    • Consumer Idempotency: The Order Service must guarantee that even if the same message is consumed multiple times, only one order is created (e.g., using a unique request ID from the message or UserID+ProductID for idempotency checks).
    • Failure Handling & Compensation: If the Order Service fails to create the order after consuming the message (due to DB issues, etc.), there needs to be a mechanism to record the failure and retry. If order creation ultimately fails, theoretically, stock should be replenished (INCR stock in Redis). However, this is complex in flash sales, potentially introducing new issues. Often, the priority is ensuring high availability and retry success for the Order Service, or accepting minor stock loss (from a business perspective).
  4. (Optional) Distributed Lock: For scenarios like "preventing duplicate purchases by the same user" (adding SADD in Lua is preliminary), if a stronger lock mechanism is needed, consider introducing a distributed lock (e.g., based on Redis or ZooKeeper). But using distributed locks in the core flash sale path requires extreme caution as it can become a new performance bottleneck.

Architecture Adjustment:

graph TD
    subgraph "Purchase Flow (Web Server)"
        A[Request Arrives] --> B{Rate Limit/Token Check};
        B -- Pass --> C(Redis Atomic Stock Deduct Lua);
        C -- Success --> D("Send Order Msg to Kafka");
        C -- Fail/Sold Out --> E(Return Purchase Failed);
        D --> F(Return Queuing/Eligibility Secured);
    end
    subgraph "Order Flow (Order Service)"
        G["Consume Kafka Order Msg"] --> H{Idempotency Check};
        H -- New Request --> I("Create MySQL Order");
        I -- Success --> K(Update Order Status/Notify User);
        I -- Failure --> J("Log Failure/Delayed Retry");
        H -- Duplicate Request --> L(Ignore or Return Success);
    end
    subgraph "(Optional) Compensation Flow"
        J --> M("If ultimately fails, consider stock replenish");
    end

Phase 4: Order Volume Surges → Database Also Needs Sharding

Storage Bottleneck for Massive Orders

  • As flash sales become routine and business grows, the orders table data volume can exceed tens or even hundreds of millions. Query performance for a single table (especially querying historical orders by User ID) slows down significantly, affecting user experience and backend management.

❓ Architect's Thinking Moment: How to horizontally scale order data?

(What's the best sharding dimension? User ID? Order ID? Time? What about historical data?)

✅ Evolution Direction: Shard by User Dimension + Archive Historical Data

  1. Shard Database/Table by User ID:
    • For data strongly associated with users, like orders, sharding by User ID hash is usually a suitable strategy.
    • Example: hash(user_id) % 16 routes to 16 databases, and within each DB, further shard by user_id hash or modulo into 32 or 64 tables. This aggregates a single user's order data into one DB or a few tables, facilitating queries.
    • Requires database middleware like ShardingSphere to manage sharding rules.
  2. Archive Historical Order Data:
    • Flash sale orders are typically queried less frequently over time. Historical orders older than a certain period (e.g., 3 or 6 months) can be migrated from the online MySQL cluster to lower-cost storage.
    • Option 1: Migrate to Elasticsearch to provide search capability for historical orders.
    • Option 2: Migrate to Object Storage (S3/OSS) or a cheaper Data Warehouse (like Hive/ClickHouse) for offline analysis or specific queries.
  3. Consider CQRS Pattern:
    • If read/write loads differ significantly, or historical order query logic is complex, consider the CQRS (Command Query Responsibility Segregation) pattern. Writes still go to the sharded MySQL, but reads (especially complex queries) can go through Elasticsearch or other specialized read-optimized databases.

Architecture Adjustment (Data Storage Layer):

Online Order Storage: [MySQL Sharded Cluster (Managed by ShardingSphere)]
Historical Order Query: [Elasticsearch Cluster]
Historical Order Archive: [Object Storage (S3/OSS) / Data Warehouse]

Phase 5: Handling Peak Promotion Traffic → Elastic Scaling & Circuit Breaking/Degradation

Challenge of Traffic Peaks and Troughs

  • Flash sale traffic during major promotional events (like "Double Eleven") can be tens or hundreds of times higher than usual, while traffic is relatively stable during non-promotion periods.
  • How to dynamically adjust server resources to cope with drastic traffic fluctuations? How to ensure core services don't crash under extreme pressure?

❓ Architect's Thinking Moment: How to make the system elastic to handle big promotions gracefully?

(Is just adding machines enough? Which services need auto-scaling? What if the system can't handle the load?)

✅ Evolution Direction: Embrace Cloud Native for Auto-Scaling and Intelligent Fault Tolerance

  1. Containerization & Kubernetes Auto-Scaling (HPA):
    • Containerize (Docker) stateless applications (like Web servers, API gateways, Order Service).
    • Deploy to a Kubernetes (K8s) cluster.
    • Configure HPA (Horizontal Pod Autoscaler) to allow K8s to automatically increase or decrease the number of service instances (Pods) based on CPU utilization, memory usage, or custom metrics (like QPS, message queue backlog), achieving elastic scaling.
  2. Service Degradation Plan:
    • Prepare degradation plans in advance to handle unexpected traffic peaks.
    • Example: During peak promotion times, temporarily disable some non-core features like product review display, coupon recommendations, etc., preserving valuable system resources for the core transaction path.
  3. Circuit Breaking Mechanism:
    • Calls between services must have circuit breakers configured (using Hystrix, Sentinel, or Istio's capabilities).
    • When a downstream service (like inventory service, user service) fails or responds too slowly, the circuit breaker fails fast, preventing requests from flooding the downstream service and allowing execution of predefined Fallback logic (e.g., returning "Service busy, please try again later"), preventing cascading failures (avalanche effect).

Architecture Adjustment (Deployment & Fault Tolerance Layer):

[User] → [CDN/WAF] → [K8s Ingress (Gateway Layer)] → [Auto-Scaling Core Flash Sale Service Pods (Web/Order...)]
                                                 ↑ (HPA adjusts Pod count based on load)
                                                 │ (Inter-service calls protected by circuit breakers)
                                                 └─→ [Other Dependent Services (Potentially degraded)]

Summary: The "Forged Through Fire" Path of Flash Sale System Architecture

Phase Core Challenge Key Solution Representative Tech/Pattern
0. Monolith Cannot handle concurrency (Cannot handle) Transaction Lock Contention
1. Traffic Instant high QPS kills DB Static/Dynamic Separation + Cache Atomic Deduct CDN, Redis (Lua/DECR)
2. Anti-Fraud Scalper/Bot Malicious Req Multi-Dim Rate Limit + CAPTCHA + Token Nginx/Redis Limit, CAPTCHA, Encrypted Token
3. Consistency Stock/Order Inconsistency Async Order + MQ + Compensation Kafka/RabbitMQ, Idempotent Consumer
4. Data Scale Order Table Capacity/Perf Shard by User + Archive History ShardingSphere, Elasticsearch/Object Storage, CQRS
5. Elasticity Traffic Fluctuation/Avalanche Container + K8s HPA + Circuit Break/Degrade Docker, K8s HPA, Sentinel/Hystrix/Istio

Course Design Highlights and Reflections

  1. Real and Extreme Scenario: The flash sale scenario pushes challenges like high concurrency, consistency, and availability to the extreme, forcing us to consider various optimization techniques.
  2. Comprehensive Technology Application: Caching, message queues, distributed locks (use cautiously), database sharding, rate limiting, circuit breaking, degradation, elastic scaling are vividly demonstrated in this scenario.
  3. Synchronous vs. Asynchronous Trade-off: The evolution from initial synchronous transaction processing to introducing caching, and finally to asynchronous order placement, showcases the process of balancing performance and consistency.
  4. High Hands-on Value:
    • Try simulating high concurrent requests using tools like JMeter or k6 to observe system performance under different architectures.
    • Try implementing Redis Lua scripts for atomic stock deduction.
    • Try using tools like Arthas to diagnose performance bottlenecks in Java applications under high concurrency online.

Understanding the architectural evolution of a flash sale system not only helps master techniques for handling high concurrency but also provides deep insights into the core architectural thinking of step-by-step progression, constant trade-offs, and continuous optimization.