Case 2: Distributed Cache

Goal: Social networks are the jewels of internet applications, involving many interesting architectural challenges. This course uses a system similar to Twitter/Weibo as an example to gradually unfold its architectural evolution process, helping you master solutions for classic problems like feed stream design under high concurrent reads/writes, storage and querying of relationship graphs, and real-time interactive push.

Phase 0: Initial System (Simple Weibo, Just Enough)

System Description

Core features are very basic:
User registration, login
Publish short text (e.g., limit 140 chars)
Follow other users
View posts from people you follow (Timeline or "feed")
Tech Stack: Still the familiar simple combination:
Frontend: React/Vue
Backend: Node.js or Spring Boot (starting in monolithic mode)
Database: MySQL (users, posts, follows tables - relational database handles everything)

Current Architecture Diagram

[User App/Web] → [Web Server (Monolith)] → [MySQL (users, posts, follows tables)]

Characteristics at this point: - Direct functionality, simple implementation: User's Timeline is generated by directly JOINing the posts and follows tables. - No cache, no pagination, everything is "primitive". This is perfectly fine when data volume is small.

Phase 1: More Users, Can't Load Timeline → Core Choice in Feed Stream Design

Challenge Emerges

As user and post volume increases, the core Timeline query (SELECT posts FROM posts JOIN follows WHERE follower_id=X ORDER BY created_at DESC LIMIT N) becomes extremely slow, putting immense pressure on the database.
Worse, when a "VIP" (user with many followers) posts, numerous followers refresh their Timeline simultaneously, potentially crashing the database instantly.

❓ Architect's Thinking Moment: Feed stream performance bottleneck, push or pull?

(This is the most classic problem in social feed design! Direct DB query is definitely out. Should it be handled when posting, or aggregated when viewing?)

✅ Evolution Direction: Push-Pull Hybrid, Cache is Key

There are two main approaches to solving feed stream performance issues: write diffusion (push) and read diffusion (pull).

Write Diffusion (Push Model - Active Push):
- Method: When a user posts, the system actively pushes the post ID to the Timeline cache of all their followers (typically using Redis Sorted Set, sorted by timestamp). When users read their Timeline, they just get the post ID list from their cache and then query the post content.
- Pros: Reading Timeline is extremely fast because it reads directly from the cache.
- Cons: For "VIPs" with a huge number of followers, one post can trigger millions or even tens of millions of cache write operations (write storm), putting massive pressure on Redis and the push system.
Read Diffusion (Pull Model - Passive Pull):
- Method: When a user accesses their Timeline, the system queries the latest posts of all the users they follow in real-time, then aggregates and sorts them before returning.
- Pros: Posting operation is very lightweight. Especially suitable for "regular users" with few followers.
- Cons: Reading Timeline puts heavy pressure on the system. The more people a user follows, the slower the query. The experience for VIP followers might be poor.
Industry Common Practice: Hybrid Model:
- Combines the advantages of push and pull: Use write diffusion for regular users (followers < threshold, e.g., 10k) to ensure their followers see updates quickly.
- Use read diffusion for VIP users (followers > threshold). When followers read the Timeline, they need to merge posts pulled from VIPs with regular user posts from their cache. Often, popular posts from VIPs are additionally cached.

Architecture Adjustment (Hybrid Model Example):

graph LR
    subgraph Posting Flow
        A["User posts"] --> B{"Check if VIP?"};
        B -- Yes --> C["Store post to DB"];
        B -- No --> D["Store post to DB"];
        D --> E["Get follower list"];
        E --> F["Push post ID to followers' Timeline cache (Redis ZSet)"];
    end
    subgraph Reading Timeline Flow
        G["User requests Timeline"] --> H{"Get following list (incl. VIPs)"};
        H --> I["Read Timeline cache from Redis (regular user posts)"];
        H --> J["Query recent posts of followed VIPs (DB/Cache)"];
        I & J --> K["Merge and sort post list"];
        K --> L["Query post content (DB/Cache)"];
        L --> M["Return Timeline to user"];
    end

(Simplified representation, actual architecture is more complex, involving service splitting, etc.) Core Dependencies: [Web Server] → [Redis (Timeline Cache)] & [MySQL (Posts/Follows)]

Phase 2: Relationship Network Gets More Complex → Graph Database Appears

New Bottleneck

As user scale grows, the follows relationship table expands to billions or even tens of billions of records. Using MySQL for complex relationship queries, like "find common follows between me and A," or "calculate second-degree connections for follow recommendations," becomes extremely slow or even impossible.

❓ Architect's Thinking Moment: MySQL struggles with complex relationships, what now?

(Should graph-structured data use a specialized graph database? Neo4j or JanusGraph? Or are there workarounds?)

✅ Evolution Direction: Introduce Graph Database, Optimize Relationship Queries

Migrate Core Relationships to Graph Database:
- Model users as Nodes and follow relationships as Edges in a graph. Use a professional Graph Database (like Neo4j, JanusGraph) to store and manage this relationship data.
- Graph databases excel at handling multi-level relationship queries, such as "find people followed by people A follows, whom A hasn't followed yet" (N-degree friend recommendations), performing far better than recursive JOINs in relational databases.
Cache Common Relationships:
- For follow/follower relationships of hot users (like celebrities, KOLs), use Redis Graph or other in-memory graph caching solutions to accelerate queries.
Offline Recommendation Calculation:
- More complex global relationship analysis (like community detection, large-scale friend recommendations) can be handled by offline computation frameworks. For example, use Spark GraphX to analyze the full relationship graph, pre-calculate recommendation results, and store them in cache or KV stores.

Architecture Adjustment:

Data Storage Layer Evolution:
[MySQL] → Stores basic user info, post content, non-relationship-intensive data
[Neo4j / JanusGraph] → Stores follow relationships, friend relationships, graph-structured data
[Redis Graph / KV Store] → Caches hot relationships, pre-calculated recommendation results

Phase 3: Likes/Comments Need Real-time → Message Queue & WebSocket

Need for Real-time Interaction

Users expect real-time notifications like "Someone liked your post," "Someone commented on your post," instead of needing to refresh manually.
If clients use traditional polling APIs to check for notifications, it puts huge, unnecessary pressure on the server.

❓ Architect's Thinking Moment: How to push messages to online users in real-time?

(Polling is definitely out. WebSocket? SSE? How to handle large message volumes?)

✅ Evolution Direction: WebSocket Long Connection + Message Queue Decoupling

Establish WebSocket Long Connection:
- Clients (App or Web) establish a WebSocket long connection with the server. WebSocket provides full-duplex communication, allowing the server to actively push messages to the client.
Introduce Message Queue (Kafka) for Decoupling:
- When events like likes, comments, @ mentions occur, the corresponding services (like Post Service, Comment Service) don't directly call the push logic. Instead, they send notification events to a Kafka message queue.
- An independent Push Service consumes messages from Kafka, queries the users to be notified and their online status.
Push to Online Users:
- If the user is online (i.e., has an active WebSocket connection), the Push Service pushes the notification to the user in real-time via the corresponding WebSocket connection.
Offline Message Storage:
- If the user is offline, the notification (or notification count) can be stored in MongoDB or another NoSQL database suitable for unstructured data. When the user logs in next time, the client actively pulls notifications from the offline period.

Architecture Adjustment:

graph TD
    A["User action (like/comment)"] --> B("Business Service");
    B --> C("Send event to Kafka");
    C --> D["Push Service (consumes Kafka)"];
    D --> E{"Is user online?"};
    E -- Online --> F("Push via WebSocket");
    F --> G["Client receives real-time notification"];
    E -- Offline --> H["Store offline notification to MongoDB"];
    I["User logs in"] --> J("Client pulls offline notifications");
    J --> H;
    G <-.-> K("WebSocket Connection");
    I <-.-> K;

Phase 4: Hot Event Arrives → Rate Limiting, Degradation for Safety

Impact of Sudden Traffic

During major social events (like celebrity gossip, breaking news), platform traffic can instantly surge tens or even hundreds of times beyond normal load.
Without countermeasures, this can lead to slow system response or even service avalanche.
In such extreme situations, core functionalities (like posting, browsing Timeline) need to be prioritized for availability.

❓ Architect's Thinking Moment: How to gracefully handle traffic spikes and prevent system collapse?

(Can't just absorb it, need trade-offs. Rate limiting is necessary, which features can be degraded?)

✅ Evolution Direction: Multi-Layer Rate Limiting + Service Degradation + Elastic Scaling

Entry Layer Rate Limiting:
- Configure rate limiting policies at the API Gateway (like Nginx, Kong) layer. Limit QPS for non-core APIs (like "trending topics," "people nearby"), or even temporarily disable them. Prioritize traffic for core APIs.
Service Degradation Strategy:
- Identify and differentiate core and non-core functions. When system load is too high, actively disable or simplify non-core functions.
- Example: Temporarily disable the "personalized recommendation" module, returning a unified static popular content cache to all users; temporarily disable "real-time online status" display.
Backend Service Circuit Breaking:
- Calls between services need circuit breaking mechanisms (using Hystrix, Sentinel, or Istio's capabilities). When a downstream service fails or responds too slowly, fail fast to avoid request piling and cascading failures.
Elastic Scaling is Fundamental:
- For stateless services (like Web servers, API Gateways), deploy them on Kubernetes and configure HPA (Horizontal Pod Autoscaler) to automatically scale the number of instances based on metrics like CPU, memory, or QPS to handle increased traffic.

Architecture Adjustment (Focus Points):

[User] → [CDN/WAF (Basic Protection)] → [API Gateway (Rate Limiting)] → [Core Service Cluster (K8s HPA)]
                                              ↘ [Non-Core Services (Possible Degradation/Limit)]
                                              ↘ [Static Cache / Degraded Data Source]

Phase 5: Going Global → Global Deployment and Data Synchronization

Business Globalization Needs

To serve global users and provide a better access experience, service nodes need to be deployed in multiple geographical regions worldwide.
Challenges arise: How to reduce access latency for overseas users? How to ensure consistency of data distributed across different regions (e.g., if a user in China follows someone, can they see it immediately in the US?)?

❓ Architect's Thinking Moment: Multi-active deployment is the way, but how to balance data consistency and latency?

(Cross-country network latency is a physical limitation. Pursue strong consistency or eventual consistency? Where to put user data?)

✅ Evolution Direction: Multi-Active Data Centers + Global Database/Eventual Consistency + CDN

Multi-Active Data Center Deployment:
- Set up independent data centers or use cloud providers' multi-Region capabilities in key target regions (like North America, Europe, Asia) to deploy complete services.
Database Globalization Solutions:
- Option 1: Global Database. Use distributed databases with native support for cross-region reads/writes and low-latency synchronization, like AWS Aurora Global Database, Google Spanner, CockroachDB. Usually provides strong consistency guarantees but comes at a higher cost.
- Option 2: Eventual Consistency-Based Synchronization. Core user data (like account info) might require strong consistency sync, but things like posts, follow relationships can tolerate eventual consistency. Writes happen in the user's local Region's primary database, then asynchronously replicate to other Regions via message queues or other sync mechanisms. Need to handle conflicts and ensure data convergence.
CDN Global Acceleration:
- Static resources like images, videos, user avatars must be distributed globally via CDN to ensure users load from the nearest edge node.
Intelligent Routing:
- Use GSLB (Global Server Load Balancing) or Intelligent DNS to direct requests to the nearest or healthiest Region based on user's geographic location and network conditions.

Architecture Adjustment (Schematic):

graph TD
    subgraph User Access
        U_AS["Asia User"] --> R_AS("Asia Region GSLB/DNS");
        U_US["North America User"] --> R_US("NA Region GSLB/DNS");
        R_AS --> C_AS("Asia Service Cluster");
        R_US --> C_US("NA Service Cluster");
    end
    subgraph Data Layer
        C_AS --> DB_AS["Asia Datacenter (Primary Write or Partition)"];
        C_US --> DB_US["NA Datacenter (Primary Write or Partition)"];
        DB_AS <-->|"Global DB / Sync Mechanism"| DB_US;
    end
    subgraph Static Resources
        S3["Object Storage Origin"] --> CDN("Global CDN Edge Nodes");
        U_AS --> CDN;
        U_US --> CDN;
    end

Phase	Core Challenge	Key Solution	Representative Tech/Pattern
0. Simple	Feature Implementation	Monolith + Relational DB	MySQL JOIN
1. Feed Stream	Timeline Performance	Push-Pull Hybrid Feed + Cache	Redis Sorted Set, Write/Read Diffusion/Hybrid
2. Relation Graph	Slow Complex Queries	Introduce Graph Database	Neo4j / JanusGraph, Graph Storage
3. Real-time	Push Latency/Overhead	WebSocket + MQ Decoupling	Kafka/RabbitMQ, WebSocket, Offline Storage
4. Traffic Surge	Service Avalanche Risk	Entry Limit + Degradation + Scale	Nginx/API Gateway Limit, K8s HPA, Circuit Break
5. Global	Latency/Consistency	Multi-Active + CDN + Sync	GSLB, Global DB / Eventual Sync, CDN

Course Design Philosophy and Insights

Problem-Driven, Scenario-Based: Each evolution stage originates from a real business pain point (like slow Timeline, VIP effect, real-time notification needs), stimulating thinking more than just listing technologies.
The Art of Trade-offs: Architecture design is full of trade-offs. The course compares push-pull models, different database choices, sync/async communication, guiding learners to understand "no silver bullet," only the best fit for the current scenario.
Technology Interconnection: Shows how technologies like caching, message queues, graph databases, WebSockets are combined to solve complex problems.
Simple to Complex: Follows the natural progression of system development, gradually introducing complex techniques like distributed, asynchronous, real-time from a simple monolith, making it easier to understand and accept.

Understanding the architectural evolution of social networks helps us grasp the core design principles and common techniques for building large-scale, high-concurrency, real-time interactive systems.

Case 2: Distributed Cache

Course 2: Social Network (Twitter/Weibo-like) Architecture Evolution Case Study

Phase 0: Initial System (Simple Weibo, Just Enough)

System Description

Current Architecture Diagram

Phase 1: More Users, Can't Load Timeline → Core Choice in Feed Stream Design

Challenge Emerges

❓ Architect's Thinking Moment: Feed stream performance bottleneck, push or pull?

✅ Evolution Direction: Push-Pull Hybrid, Cache is Key

Architecture Adjustment (Hybrid Model Example):

Phase 2: Relationship Network Gets More Complex → Graph Database Appears

New Bottleneck

❓ Architect's Thinking Moment: MySQL struggles with complex relationships, what now?

✅ Evolution Direction: Introduce Graph Database, Optimize Relationship Queries

Architecture Adjustment:

Phase 3: Likes/Comments Need Real-time → Message Queue & WebSocket

Need for Real-time Interaction

❓ Architect's Thinking Moment: How to push messages to online users in real-time?

✅ Evolution Direction: WebSocket Long Connection + Message Queue Decoupling

Architecture Adjustment:

Phase 4: Hot Event Arrives → Rate Limiting, Degradation for Safety

Impact of Sudden Traffic

❓ Architect's Thinking Moment: How to gracefully handle traffic spikes and prevent system collapse?

✅ Evolution Direction: Multi-Layer Rate Limiting + Service Degradation + Elastic Scaling

Architecture Adjustment (Focus Points):

Phase 5: Going Global → Global Deployment and Data Synchronization

Business Globalization Needs

❓ Architect's Thinking Moment: Multi-active deployment is the way, but how to balance data consistency and latency?

✅ Evolution Direction: Multi-Active Data Centers + Global Database/Eventual Consistency + CDN

Architecture Adjustment (Schematic):

Summary: The Evolution Path of Social Network Architecture

Course Design Philosophy and Insights