Architecture
System design and component overview for personalized video feeds
Interactive Architecture Diagram
The system is designed around a request-driven architecture with intelligent caching and asynchronous event processing to meet stringent latency and scale requirements.
API Gateway
Entry point for feed requests, responsible for routing traffic to the appropriate feed path and enforcing basic traffic controls.
Key Details
• Rate limiting: 3,000 RPS peak capacity
• Feature flag evaluation (routes to personalized or non-personalized service)
• Request validation (user_id, tenant_id format)
Legend
Components
Data Flows
Caching Strategy
Core Principle
Cache derived, decision-ready state rather than raw inputs or final outputs. Raw user_signals are never cached or read on the request path.
Non-Goals
- Never cache raw user_signals
- Do not default to per-user feed caching
| Cache Item | Key Pattern | TTL | Cardinality | Purpose |
|---|---|---|---|---|
| Tenant Config | config:{tenant_id} | 15 min | Low | Store tenant personalization settings and weights |
| Video Metadata | video:{video_id} | 60 s | Medium | Cache individual video attributes for ranking |
| Candidate Pool | candidates:{tenant_id} | 60 s | Low | Broad unranked video set per tenant for ranking |
| User Profile (derived) | profile:{user_id_hash} | 5 min | High | Precomputed preferences: category affinity, engagement patterns |
| Hot-user Feed (optional) | feed:hot:{tenant_id}:{user_id_hash} | 60 s | Medium | Optional full feed cache for very high-traffic users only |
Request Paths
Personalized Feed Request Path
- 1API Gateway receives request, validates params, routes to Personalized Feed Service
- 2Service checks Redis for tenant config (cache miss → DB fallback)
- 3Service checks Redis for user profile—derived state (category affinity, engagement patterns)
- 4Service checks Redis for candidate pool (broad unranked videos for tenant)
- 5Ranking engine performs in-process scoring using cached profile and video metadata
- 6Service returns ranked feed with metadata. No database reads occur if cache hits.
Non-Personalized Feed Request Path
- 1API Gateway receives request, routes to Non-Personalized Feed Service
- 2Service checks Redis for tenant config (cache miss → DB fallback)
- 3Service checks Redis for candidate pool (tenant-level, shared across users)
- 4Service sorts by editorial boost and recency (no user profile lookup needed)
- 5Service returns feed. Response may be HTTP/CDN cacheable due to tenant-level sharing.
Latency Budget
Personalized Feed Latency Budget
| Component | p95 (ms) | p99 (ms) | Notes |
|---|---|---|---|
| API Gateway | 5 | 10 | Request validation and routing |
| Redis reads (3 keys) | 3 | 8 | Parallel MGET; bounded by network RTT |
| DB fallback (cache miss) | 20 | 50 | Occurs <5% of requests (no joins) |
| In-process ranking | 8 | 15 | Simple scoring, no external calls |
| Response serialization | 2 | 5 | JSON encoding |
| Total (cache hit) | 18 | 38 | Hot path: Redis-only |
| Total (cache miss) | 38 | 88 | Cold path: DB + Redis |
Non-Personalized Feed Latency Budget
| Component | p95 (ms) | p99 (ms) | Notes |
|---|---|---|---|
| API Gateway | 5 | 10 | Request validation and routing |
| Redis reads (2 keys) | 2 | 6 | Tenant-level keys; 99%+ hit rate |
| DB fallback (cache miss) | 15 | 40 | Occurs <1% of requests |
| Editorial sort | 3 | 8 | Simple sort by boost + recency |
| Response serialization | 2 | 5 | JSON encoding |
| Total (cache hit) | 12 | 29 | Hot path: Redis-only |
| Total (cache miss) | 27 | 69 | Cold path: DB + Redis |
Why p99 stays bounded
- Bounded cache operations: Redis MGET latency is predictable (<8ms p99)
- No raw event aggregation: User profiles are precomputed; no DB scans on request path
- No DB joins: Candidate pools and profiles are denormalized in cache
- Graceful degradation: If Redis is unavailable, DB fallback adds ~50ms p99 overhead
Freshness Budget
Personalized Feed Freshness Budget
| Freshness Dimension | Mechanism | Upper Bound | Meets Requirement |
|---|---|---|---|
| New content visible | Candidate pool TTL (60s) | ≤ 60 s | Yes |
| User-signal updates reflected | Event pipeline processes with ≤5 min lag; profiles refreshed after processing | ≤ 5 min | Yes |
| Tenant config changes | Config TTL (15 min) + manual invalidation | ≤ 15 min | Yes |
Non-Personalized Feed Freshness Budget
| Freshness Dimension | Mechanism | Upper Bound | Meets Requirement |
|---|---|---|---|
| New content visible | Candidate pool TTL (60s) | ≤ 60 s | Yes |
| Tenant config changes | Config TTL (15 min) + manual invalidation | ≤ 15 min | Yes |
| HTTP/CDN cache | Optional edge caching (30s TTL) | ≤ 30 s (if enabled) | Yes |
Scalability Considerations
Horizontal Scaling: Feed service is stateless and can scale horizontally behind a load balancer. Each instance connects to shared Redis and PostgreSQL.
Database Sharding: User_signals table can be sharded by user_id_hash for write scalability. Videos table sharded by tenant_id for isolation.
Redis Cluster: Redis can run in cluster mode for high availability and additional capacity. Separate read replicas for hot data.
CDN Edge Caching: For popular content, CDN can cache feed responses at the edge (30-second TTL), further reducing backend load.