Case 7: Recommendation System
title: Recommendation System Architecture Design: From Collaborative Filtering to Deep Learning - System Design Course 7 description: Analyze the core principles and architectural evolution of recommendation systems, covering recall, ranking, real-time recommendations, feature engineering, and the application of deep learning in recommendations.
Course 7: Recommendation System Architecture Evolution Case Study
Goal: Recommendation systems are the core engine for many internet products, and their architectural evolution profoundly reflects the close integration of algorithms and engineering. This course will guide you through the gradual optimization of a typical recommendation system, enabling you to master core architectural capabilities such as recall strategies, ranking models, real-time personalization, cold start handling, and collaborative design between engineering and algorithms.
Phase 0: The Primitive Era (Static Recommendation for Everyone)
System Description
- Scenario: A startup e-commerce website needs to display some products to users on the homepage.
- Functionality: Very simple, the homepage statically displays the Top 10 best-selling products. All users see exactly the same thing.
- Tech Stack:
- Backend: Python/Java (Querying the database directly within business logic)
- Data: MySQL (A
products
table containing asales_count
field)
Current Architecture Diagram
graph LR
Client --> API;
API -- "SELECT * FROM products ORDER BY sales_count DESC LIMIT 10" --> DB["(MySQL)"];
Pain Points at this moment:
- Low Conversion Rate: User interests vary greatly; a one-size-fits-all recommendation is ineffective.
- Monotonous Experience: Users see almost the same thing every time, lacking freshness.
- Resource Waste: Hot items get excessive exposure, while a large number of long-tail items (which might meet the needs of some users) are buried.
Phase 1: Early Personalization → Attempting Collaborative Filtering (CF)
Challenge Emerges
- Business demands require improving user conversion rates and experience, necessitating personalized recommendations based on users' historical behavior (like browsing and purchase history).
❓ Architect's Thinking Moment: How to take the first step towards personalization? What's the most classic method?
(Directly correlate user behavior? Based on user similarity or item similarity? What about computation load? Real-time requirements?)
✅ Evolution Direction: Introduce Item-Based Collaborative Filtering (Item-CF)
- Core Idea (Item-CF): The core idea is "users who like item A often also like item B".
- Offline Similarity Calculation: Use MapReduce or Spark jobs to analyze user behavior logs (e.g., user-item interaction matrix) and calculate similarities between items (e.g., cosine similarity, Jaccard similarity).
- Store Similarity: Store the calculated item similarity matrix in a high-speed cache (like Redis) for fast online querying. E.g., store as
item_A -> {item_B: 0.8, item_C: 0.6, ...}
.
- Online Recommendation Logic:
- When a user visits, retrieve their list of recently interacted items.
- Query the list of similar items for these items from Redis.
- Aggregate and rank the similar items (e.g., weighted by similarity), remove items the user has already interacted with, and obtain the final recommendation list.
- Architecture Adjustment: Introduce offline calculation and online cache querying.
graph TD
subgraph "Offline Calculation (Batch Processing)"
Logs("User Behavior Logs") --> Hadoop("Hadoop/Spark Item-CF Calculation");
Hadoop --> Redis("Redis Storing Item Similarity");
end
subgraph "Online Recommendation (Real-time Serving)"
Client --> API("Recommendation API");
API -- Get user recent behavior --> UserDB["(User Behavior DB)"];
API -- Query similar items --> Redis;
API -- Aggregate & Rank --> Result("Recommendation Result");
Result --> Client;
end
New Problems Introduced: Item-CF relies on user history, performing poorly for new users or users with sparse behavior (cold start problem); recommendations might be limited to areas the user already knows, lacking diversity.
Phase 2: Single Strategy Isn't Enough → Multi-Recall Channels and Fusion
New Challenge: Insufficient Coverage and Diversity
- A single Item-CF strategy cannot cover all users (especially new ones).
- Recommendations tend to focus on a few popular or familiar areas, lacking surprise and diversity.
- Need to incorporate more information sources and strategies to enrich the results.
❓ Architect's Thinking Moment: How to expand the recommendation scope and introduce more possibilities?
(Besides Item-CF, what other information can be used? Global hotspots? User profiles? How to combine different recommendation sources?)
✅ Evolution Direction: Build a Multi-Recall Engine + Result Fusion
- Expand Recall Channels: Run multiple recall strategies in parallel, each responsible for filtering candidate items from different perspectives.
- Channel 1: Item-CF (Similar items based on user history, ensuring personalization depth)
- Channel 2: Global Hot (Best-selling/popular items based on global statistics, ensuring basic coverage, addressing cold start)
- Channel 3: User Profile Matching (Based on user tags like age, gender, location, interest tags, etc., matching items preferred by corresponding demographics)
- Channel 4: (Optional) LBS Recall (Recommend nearby relevant items or services based on user location)
- Channel 5: (Optional) New Item Recall (Recommend recently added new items, increasing diversity)
- Recall Result Fusion and Truncation:
- Each recall channel independently retrieves a batch of candidate items (e.g., each channel recalls Top N, N adjustable based on channel characteristics).
- Merge and deduplicate items recalled from all channels.
- Can use simple merging or weighted fusion based on channel confidence or business goals.
- Finally, truncate to keep a certain number (e.g., 500-1000) of candidate items for the subsequent ranking stage.
Architecture Adjustment (Adding Recall Layer):
graph TD
subgraph "Recall Layer (Recall Engine)"
RecallService("Recall Service") --> ItemCF("Item-CF Channel");
RecallService --> Hot("Hot Items Channel");
RecallService --> Profile("User Profile Channel");
RecallService --> LBS("LBS Channel");
RecallService --> New("New Items Channel");
ItemCF & Hot & Profile & LBS & New --> Fusion("Fusion Layer Dedupe/Truncate");
end
Fusion --> Rank("Ranking Layer");
Effect: Significantly improves recommendation coverage and diversity, but the recall stage is only a coarse selection; item ranking is still rough, potentially leading to low click-through rates.
Phase 3: Coarse Ranking Isn't Precise Enough → Introduce Machine Learning Ranking Models
Challenge Escalates: Click-Through Rate (CTR) Needs Improvement Urgently
- Multi-recall expands the candidate set, but the order in which items are displayed greatly impacts user click intention. Simple sorting by recall score or popularity is ineffective.
- Need to more accurately predict the user's Click-Through Rate (CTR) for each recalled item, ranking the most likely clicked items higher.
❓ Architect's Thinking Moment: How to select the Top N items a user is most likely to click from hundreds of candidates?
(Need finer ranking. What features? Which model? Are online prediction performance requirements high?)
✅ Evolution Direction: Build a Ranking Model (Rank Model) for CTR Prediction
- Feature Engineering: Building rich features is fundamental to model performance.
- User Features: Age, gender, location, historical behavior stats (e.g., historical CTR, purchase preferences), real-time user behavior features (e.g., currently viewed product).
- Item Features: Category, price, brand, historical CTR, sales volume, rating.
- Context Features: Time (hour, weekday/weekend), device type, network environment, source page.
- Model Selection and Training:
- Initial options: Logistic Regression (LR) or GBDT+LR (Facebook's classic solution), fast training, good interpretability.
- Advanced options: Deep Learning models, like Wide&Deep (Google), DeepFM, DIN (Alibaba), etc., can better capture non-linear feature interactions, usually yielding better results but more complex to train and deploy.
- Models are trained offline using historical impression and click logs.
- Online Inference:
- Deploy the trained model as a model service (e.g., using TensorFlow Serving, Triton Inference Server, ONNX Runtime, or a custom gRPC/HTTP service).
- In the online recommendation flow, after the recall layer gets candidate items, query relevant features in real-time (may require Feature Store support), call the model service to predict CTR scores for each candidate item.
- Perform descending sort based on the predicted CTR scores to get the final recommendation list shown to the user.
Architecture Adjustment (Adding Ranking Layer & Model Service):
graph TD
Recall("Recall Layer Output Candidate Set") --> FeatureEng("Feature Engineering Real-time Feature Query");
FeatureEng --> ModelServing("Model Service TF Serving/Triton");
ModelServing -- CTR Prediction Score --> RankSort("Ranking Module");
RankSort --> API("API Return Sorted Results");
API --> Client;
subgraph "Offline Training"
Logs --> TrainData("Sample Joining/Feature Processing");
TrainData --> ModelTrain("Model Training GBDT/DeepFM");
ModelTrain -- Deploy --> ModelServing;
end
subgraph "Online Dependencies"
FeatureEng -- Query --> FeatureStore("Feature Store Redis/HBase");
end
Effect: Significantly improves recommendation precision and user CTR, but the model is trained offline and cannot quickly respond to real-time changes in user interest.
Phase 4: User Interests Change Quickly → Explore Real-time Feedback and Online Learning
New Challenge: Capturing User's "Instant Interest"
- User interest can change in a short time (e.g., suddenly wanting to buy something). Models relying on offline training (typically T+1 updates) cannot capture this real-time change, leading to recommendation lag.
- How to make the recommendation system adapt faster to user's real-time behavior feedback?
❓ Architect's Thinking Moment: How to enable the model to learn new user preferences faster?
(Offline training cycle is too long. Can the model be updated online? How to build real-time features? How to balance "exploring" new content and "exploiting" known preferences?)
✅ Evolution Direction: Introduce Real-time Features + Online Learning / EE Strategies
- Build Real-time Feature Pipeline:
- User's real-time actions (clicks, views, add-to-carts, searches, etc.) are sent via tracking to Kafka.
- Use a stream processing engine (like Flink or Spark Streaming) to consume Kafka data in real-time.
- Calculate user's short-term behavior statistics (e.g., "product categories clicked in the last 5 minutes", "keywords searched in the current session"), update them into the online Feature Store for real-time use by the ranking model.
- Model Online Update (Online Learning):
- Adopt model training paradigms supporting online updates, stream-updating model parameters based on real-time user feedback (click/no-click).
- Common algorithms: FTRL (Follow The Regularized Leader), suitable for online LR models with large-scale sparse features.
- This usually requires more complex architectural support.
- Explore & Exploit (EE) Strategy:
- Introduce EE algorithms (like Thompson Sampling, LinUCB - Multi-Armed Bandit algorithms) during or after the ranking stage to dynamically adjust item display order.
- The goal is to balance "exploiting" known user preferences (recommending items with high predicted CTR) and "exploring" potentially interesting new content (giving new or low-CTR items some exposure opportunities), thus discovering new user interests faster and optimizing long-term benefits.
Architecture Adjustment (Strengthening Real-time Pipeline):
graph TD
subgraph "Real-time Feature Pipeline"
Behavior("User Real-time Behavior") --> Kafka;
Kafka --> Flink("Flink/Spark Streaming");
Flink -- Update --> FeatureStore("Real-time Feature Store Redis/HBase");
end
subgraph "Online Ranking Pipeline"
Recall --> FeatureEng("Feature Engineering");
FeatureEng -- Query Real-time/Offline Features --> FeatureStore;
FeatureEng --> ModelServing("Online Model");
ModelServing --> RankSort;
RankSort --> EE("EE Strategy Adjustment");
EE --> API;
end
subgraph "Online Learning Pipeline (Optional)"
Feedback("User Feedback Click/No-Click") --> Kafka;
Kafka --> OnlineUpdate("Online Model Update Module FTRL");
OnlineUpdate -- Update Parameters --> ModelServing;
end
Effect: Improves recommendation real-time responsiveness and adaptation to short-term user interest changes, but cold start and diversity issues still need dedicated solutions.
Phase 5: What About New Users and New Items? → Tackling Cold Start and Diversity
Persistent Problems: Cold Start and Filter Bubbles
- New User Cold Start: User just registered, no behavior data available; collaborative filtering and behavior-based ranking models fail.
- New Item Cold Start: New product just listed, no user interaction data; difficult to get recommended.
- Diversity and Exploration: Recommendation systems easily fall into the "narrower and narrower" trap (filter bubble), where users only see things they are familiar with. How to improve diversity and surprise?
❓ Architect's Thinking Moment: How to give new users and items a chance to be recommended? How to break the filter bubble?
(No behavior data, what else is available? Item content itself? User registration info? How to balance accuracy and diversity during ranking?)
✅ Evolution Direction: Utilize Content Features + Multi-Objective Optimization + Hybrid Strategies
- Introduce Content Features and Content Similarity:
- Use NLP techniques (like Word2Vec, BERT) or CV techniques (like CNN) to extract item Content Embeddings, e.g., from product titles, descriptions, images.
- New Item Cold Start: Calculate content similarity between new and old items, recommend the new item to users who like similar items.
- New User Cold Start: Based on information provided during registration (like selected interest tags), recommend content-matching items.
- Content features can also serve as input features for the ranking model.
- Multi-Objective Optimization in Ranking:
- The ranking model optimizes multiple objectives simultaneously, not just CTR, such as: CTR, CVR (Conversion Rate), GMV (Gross Merchandise Volume), Recommendation Diversity, User Dwell Time, etc.
- Achieve multi-objective balance by adjusting weights for different objectives or using more complex model structures.
- Can perform Re-ranking after the initial sort, e.g., using the MMR (Maximal Marginal Relevance) algorithm to enhance diversity while maintaining relevance.
- Develop Specific Cold Start and Exploration Strategies:
- New User Strategy: Recommend global hot items or items preferred by similar demographics based on registration info, attributes, location, etc.
- New Item Strategy: Provide some forced exposure opportunities (traffic tilting) to quickly accumulate interaction data; or recommend based on content similarity.
- Exploration Mechanism: Combine with EE algorithms to proactively recommend some cross-domain, potentially interesting new items to users.
Architecture Adjustment (Improving Strategies & Objectives):
graph TD
subgraph "Content Processing (Offline)"
ItemContent("Product Content Text/Image") --> NLP_CV("NLP/CV Models");
NLP_CV -- Generate Vectors --> ContentEmbeddingDB("Content Embedding DB FAISS/Milvus");
end
subgraph "Online Recommendation Pipeline"
Recall -- "(Add Content Recall Channel)" --> Fusion;
Fusion --> FeatureEng;
FeatureEng -- "(Add Content Features)" --> ModelServing("Multi-Objective Ranking Model");
ModelServing --> ReRank("Re-ranking Module MMR/Diversity Adjust");
ReRank --> API;
%% Cold Start Strategy Fusion
ColdStartStrategy("Cold Start/Exploration Strategy") --> Fusion;
ColdStartStrategy --> ReRank;
end
Summary: The Evolutionary Path of Recommendation System Architecture
Phase | Core Challenge | Key Solution | Representative Tech/Pattern |
---|---|---|---|
0. Static | No Personalization | Static Sorting by Popularity | SQL ORDER BY |
1. CF Trial | User Interest Variation | Item-Based Collaborative Filtering | MapReduce/Spark (Offline), Redis (Online Cache) |
2. Multi-Recall | Low Coverage/Diversity | Multi-Strategy Recall + Fusion | User Profile, LBS, Global Hot, Weighted Fusion |
3. Rank Model | Low CTR | ML Ranking Model (CTR Prediction) | LR, GBDT, DeepFM, Wide&Deep, TF Serving, Feature Store |
4. Real-time | Interest Change Lag | Real-time Features + Online Learn/EE | Kafka, Flink/Spark Streaming, FTRL, Bandit Algo |
5. Cold Start | New User/Item/Diversity | Content Features + Multi-Obj Opt + Explore | NLP/CV Embedding, MMR, Cold Start Rules |
``` |