Case 1: Monolithic Architecture

Evolutionary Learning Plan for System Architecture

Goal: Many complex systems evolve gradually from a simple starting point. This course uses a progressive scenario-driven approach to simulate the architectural challenges a typical web application (blog platform) might encounter during real business growth, guiding you step-by-step to master classic problem-solving approaches and think like an architect.

Phase 0: Initial System (Monolithic Architecture, The Starting Point)

System Description

Imagine starting with the simplest blog platform, basic but functional:
User registration, login
Publish, edit, delete blogs
View blog lists and details
Tech Stack: Also the most common combination:
Frontend: HTML + JavaScript (maybe jQuery early on, later perhaps React/Vue)
Backend: Python Flask or Java Spring Boot (classic monolithic application)
Database: MySQL (single instance, sufficient initially)

Current Architecture Diagram

[User Browser] → [Web Server (Monolithic App)] → [MySQL (Single Instance)]

Characteristics at this point: - All code is in one project, simple and direct. - Application directly connects to the database for reads/writes, no middle layer. - Scalability? Not considered yet.

Phase 1: Traffic Arrives → Database is the First Bottleneck

Challenge Emerges

Platform popularity is good, but as user numbers grow, problems follow: database queries slow down, especially on high-traffic blog list pages.
Server CPU spikes during peak hours, page response time deteriorates from tens of milliseconds to half a second or more.

❓ Architect's Thinking Moment: Bottleneck is the database, what to do?

(What's usually the first reaction for performance optimization? Add cache? Index? Or directly implement read-write splitting?)

✅ Evolution Direction: Cache First, Consider Indexing

Introduce Cache for Rescue: Facing read pressure, introducing memory cache (like Redis) is the most cost-effective method.
- Cache popular blog lists to significantly reduce database queries.
- Initially adopt the common Cache-Aside pattern: application reads from cache first, if not found, queries the database, then writes back to the cache. (Cache consistency issues need attention later.)
Database Self-Optimization: Don't forget the basics.
- Check indexes on the blogs table, ensure fields used for sorting and querying like created_at have appropriate indexes.

Minor Architecture Tweak:

[User Browser] → [Web Server] → [Redis (Cache)] → [MySQL]

Phase 2: User Surge → Limits of a Single Monolithic Web Server

New Bottleneck

Traffic continues to rise, peak QPS jumps from hundreds to thousands, the single web server's CPU is maxed out.
Users start experiencing frequent access timeouts.

❓ Architect's Thinking Moment: Single point can't handle it, how to scale horizontally?

(Adding machines is certain, but how to make multiple machines work together? How to handle user state?)

✅ Evolution Direction: Load Balancing + Statelessness

Horizontally Scale the Web Layer:
- Deploy multiple web server instances.
- Introduce Nginx at the front as a load balancer to distribute requests to multiple backend instances (common strategies include round-robin, least connections).
Stateless Service is Key: For easy scaling, the web server itself should not store user session state.
- Migrate session data to external shared storage, like Redis. This allows any web server to handle any user's request.

Architecture Evolves To:

[User Browser] → [Nginx (Load Balancer)] → [Web Server Instance x N] → [Redis (Cache+Session)] → [MySQL]

Phase 3: Increased Write Pressure → Database Read-Write Splitting

Write Operations Become the New Focus

As user activity increases, blog publishing and editing become more frequent, highlighting the write pressure on the MySQL primary database, even causing increased master-slave replication lag.
Users start complaining: "Just published a blog, refreshed several times but still can't see it!"

❓ Architect's Thinking Moment: Read performance solved, what about the write bottleneck?

(Is master-slave replication the standard answer? Is sharding too early? Is there an intermediate solution?)

✅ Evolution Direction: Implement Master-Slave Replication and Read-Write Splitting

Enable MySQL Master-Slave Replication:
- Configure one primary (Master) to handle all write operations.
- Configure one or more replicas (Slaves) to handle read operations, distributing the read load.
- Introduce database middleware (like ShardingSphere-JDBC/Proxy or ProxySQL) to automatically route read/write requests, transparently to the application layer.
Addressing Master-Slave Lag:
- After splitting reads/writes, master-slave lag needs attention. For read requests requiring high consistency (like viewing immediately after publishing), forcing routing to the master might be necessary.
- Cache update strategies also need adjustment, e.g., invalidate cache after write instead of updating, reducing the inconsistency window.

Architecture Adjusts Again:

[User Browser] → [Nginx] → [Web Server x N] → [Redis]
                                      ↘ [DB Middleware/Proxy] ↘ [MySQL Master] → [MySQL Slave x N]

Phase 4: Increasing Business Complexity → The Choice of Splitting the Monolith: Microservices

The "Growing Pains" of a Monolith

Business continues to grow, adding new modules like "comment system," "user recommendations," etc. The monolithic application's codebase becomes increasingly large and difficult to maintain.
Development and deployment of different features interfere with each other, reducing team collaboration efficiency and lengthening release cycles.

❓ Architect's Thinking Moment: Time to "split the family," how to do it gracefully?

(Microservices are the trend, but how to define service boundaries? How should services communicate? How to introduce asynchronicity?)

✅ Evolution Direction: Embrace Microservices, Introduce Message Queues and API Gateway

Split Services by Business Capability:
- Decompose the monolith into independent services like Blog Service, User Service, Comment Service, etc. Each service can be developed, deployed, and scaled independently.
- Service Communication: Initially, REST API can be used. For high performance or internal calls, consider RPC (gRPC/Dubbo).
Introduce Message Queue for Asynchronous Decoupling:
- For non-core, asynchronously processable flows (like "notify followers after blog publication," "trigger recommendation calculation"), introduce Kafka or RabbitMQ. Producers send messages, consumers process them asynchronously, improving system resilience and response speed.
Build an API Gateway:
- All external requests (from clients) enter through a unified API Gateway (like Kong, Spring Cloud Gateway, Nginx+Lua).
- The gateway handles: routing, authentication/authorization, rate limiting/circuit breaking, logging/monitoring, and other common functions, simplifying backend services.

Emerging Microservice Architecture Outline:

graph TD
    UserBrowser --> APIGateway[API Gateway]
    APIGateway --> UserService[User Service] --> RedisCache[Redis]
    APIGateway --> BlogService[Blog Service] --> MySQLDB[MySQL]
    APIGateway --> CommentService[Comment Service] --> MySQLDB
    BlogService --> KafkaMQ[Kafka]
    CommentService --> KafkaMQ
    KafkaMQ --> RecommendationService[Recommendation Calc Service] --> Others[...]
    KafkaMQ --> NotificationService[Notification Service] --> Others[...]

(Mermaid diagram translated and simplified for clarity)

Phase 5: Data Volume Explosion → Database Sharding and Search Engine

Challenge of Massive Data

Blog content and user data continue explosive growth. The core blogs table reaches TB levels. Even with read-write splitting, single-database or single-table query performance drops sharply.
User demand for content search increases; simple LIKE queries are no longer sufficient.

❓ Architect's Thinking Moment: Database capacity and query performance are critical again, what now?

(Is sharding inevitable? By what dimension? What technology for full-text search?)

✅ Evolution Direction: Data Sharding + Introduce Professional Search Engine

Implement Database Sharding:
- Horizontally split the largest tables (like blogs, users). Common strategies include sharding by User ID or Content ID hash.
- Introduce database sharding middleware (like ShardingSphere, Vitess, MyCat) to manage sharding routing rules, shielding low-level details from the application layer.
Introduce Elasticsearch for Full-Text Search:
- Synchronize searchable blog content (title, body, etc.) into an Elasticsearch cluster.
- Utilize ES's powerful inverted index and tokenization capabilities for efficient, accurate full-text search. Sync mechanisms could be CDC (Canal/Debezium) or dual writes.
Consider Hot/Cold Data Separation:
- For infrequently accessed old blog data, consider archiving from primary storage (MySQL/ES) to lower-cost object storage (like AWS S3, Alibaba Cloud OSS), reducing online storage pressure.

Evolution of Data Storage Layer:

Primary Online Storage: [MySQL (Sharded Cluster)] + [Elasticsearch (Search Cluster)]
Archive Storage: [Object Storage (S3/OSS)]

Phase 6: Journey to Globalization → Challenge of Multi-Active Architecture

New Requirements for Business Expansion Abroad

The platform needs to serve global users, but access latency for overseas users to domestic data centers is too high, affecting experience.
Higher availability requirements: service must not be interrupted even if a single data center fails.

❓ Architect's Thinking Moment: How to achieve low global latency and cross-region disaster recovery?

(Multi-region deployment is necessary, but how to sync data? How to route user requests?)

✅ Evolution Direction: Build Multi-Active Data Centers + CDN Acceleration

Multi-Region Deployment (Multi-Active Architecture):
- Deploy independent, fully functional service clusters in different geographic regions (e.g., US East, Europe, Singapore).
- The database layer needs solutions supporting cross-region replication and consistency, such as Global Databases (AWS Aurora Global Database, Google Spanner, CockroachDB) or self-built sync solutions (possibly sacrificing strong consistency).
CDN Acceleration for Static Resources:
- Deploy static assets like images, CSS, JavaScript to a CDN (Content Delivery Network). Utilize its global edge nodes to provide users with nearby access, significantly reducing latency.
Global Traffic Management:
- Use Intelligent DNS or Global Server Load Balancing (GSLB) services to route user requests to the nearest or healthiest regional data center based on user location, network latency, or service health.

Final Architecture Form (Schematic):

graph LR
    UserNA[North America User] --> GSLB --> DCNorthAmerica[NA Datacenter Cluster]
    UserAsia[Asia User] --> GSLB --> DCAsia[Asia Datacenter Cluster]
    UserEU[Europe User] --> GSLB --> DCEurope[Europe Datacenter Cluster]
    DCNorthAmerica <--> GlobalDB[(Global Database / Sync)]
    DCAsia <--> GlobalDB
    DCEurope <--> GlobalDB
    UserNA --> CDN[(CDN Edge)]
    UserAsia --> CDN
    UserEU --> CDN
    CDN --> ObjectStorage[Object Storage Origin]

(Mermaid diagram translated and simplified)

Summary: A Typical Architectural Evolution Path

Phase	Core Problem	Key Solution	Representative Tech/Pattern
0. Monolith	Simple Business	Monolith App + Single DB	Flask/Spring Boot, MySQL
1. Cache	Read Performance Bottleneck	Intro Cache + DB Index Opt.	Redis, Cache-Aside
2. Horizontal Scale	Web Server Pressure	Load Balancer + Stateless	Nginx, Redis (Session)
3. R/W Split	DB Write Bottleneck	Master-Slave + Middleware	MySQL Replication, ProxySQL/ShardingSphere
4. Microserv	Monolith Complexity	Service Split + Async	RPC/REST, Kafka/RabbitMQ, API Gateway
5. Data Scale	Huge Data / Search Needs	Sharding + Search Engine	ShardingSphere/Vitess, Elasticsearch
6. Global	Low Latency / High Avail	Multi-Active + CDN	Global DB, GSLB, CDN

Learning Method Suggestions

Hands-on Practice is Crucial: For each phase, try building a minimal demo using cloud services (AWS/Azure/Alibaba Cloud free tiers or small instances) or local Docker/K8s to experience configuration and effects firsthand.
In-depth Comparative Thinking: Actively compare pros and cons of similar technologies, e.g., "What scenarios are Kafka vs. RabbitMQ suitable for?", "What's the difference between Redis Sentinel and Cluster modes?".
Simulate Failure Scenarios: If possible, try using chaos engineering tools (like Chaos Mesh) or manual methods to simulate node failures, network latency, etc., and observe the system's reaction and recovery capabilities.

By simulating this real path of business growth and technological evolution, we can better understand the trade-offs behind various architectural design decisions, which is the core value of an architect.