Architecture

System design and component overview for personalized video feeds

Interactive Architecture Diagram

The system is designed around a request-driven architecture with intelligent caching and asynchronous event processing to meet stringent latency and scale requirements.

API Gateway

Entry point for feed requests, responsible for routing traffic to the appropriate feed path and enforcing basic traffic controls.

Key Details

• Rate limiting: 3,000 RPS peak capacity

• Feature flag evaluation (routes to personalized or non-personalized service)

• Request validation (user_id, tenant_id format)

Legend

Components

Services & Storage
Core Processing

Data Flows

Request Path (real-time)
Synchronous Calls (on-demand)
Async Processing (background)
Fallback Path (conditional)

Caching Strategy

Core Principle

Cache derived, decision-ready state rather than raw inputs or final outputs. Raw user_signals are never cached or read on the request path.

Non-Goals

  • Never cache raw user_signals
  • Do not default to per-user feed caching
Cache ItemKey PatternTTLCardinalityPurpose
Tenant Configconfig:{tenant_id}15 minLowStore tenant personalization settings and weights
Video Metadatavideo:{video_id}60 sMediumCache individual video attributes for ranking
Candidate Poolcandidates:{tenant_id}60 sLowBroad unranked video set per tenant for ranking
User Profile (derived)profile:{user_id_hash}5 minHighPrecomputed preferences: category affinity, engagement patterns
Hot-user Feed (optional)feed:hot:{tenant_id}:{user_id_hash}60 sMediumOptional full feed cache for very high-traffic users only

Request Paths

Personalized Feed Request Path

  1. 1API Gateway receives request, validates params, routes to Personalized Feed Service
  2. 2Service checks Redis for tenant config (cache miss → DB fallback)
  3. 3Service checks Redis for user profile—derived state (category affinity, engagement patterns)
  4. 4Service checks Redis for candidate pool (broad unranked videos for tenant)
  5. 5Ranking engine performs in-process scoring using cached profile and video metadata
  6. 6Service returns ranked feed with metadata. No database reads occur if cache hits.

Non-Personalized Feed Request Path

  1. 1API Gateway receives request, routes to Non-Personalized Feed Service
  2. 2Service checks Redis for tenant config (cache miss → DB fallback)
  3. 3Service checks Redis for candidate pool (tenant-level, shared across users)
  4. 4Service sorts by editorial boost and recency (no user profile lookup needed)
  5. 5Service returns feed. Response may be HTTP/CDN cacheable due to tenant-level sharing.

Latency Budget

Personalized Feed Latency Budget

Componentp95 (ms)p99 (ms)Notes
API Gateway510Request validation and routing
Redis reads (3 keys)38Parallel MGET; bounded by network RTT
DB fallback (cache miss)2050Occurs <5% of requests (no joins)
In-process ranking815Simple scoring, no external calls
Response serialization25JSON encoding
Total (cache hit)1838Hot path: Redis-only
Total (cache miss)3888Cold path: DB + Redis

Non-Personalized Feed Latency Budget

Componentp95 (ms)p99 (ms)Notes
API Gateway510Request validation and routing
Redis reads (2 keys)26Tenant-level keys; 99%+ hit rate
DB fallback (cache miss)1540Occurs <1% of requests
Editorial sort38Simple sort by boost + recency
Response serialization25JSON encoding
Total (cache hit)1229Hot path: Redis-only
Total (cache miss)2769Cold path: DB + Redis

Why p99 stays bounded

  • Bounded cache operations: Redis MGET latency is predictable (<8ms p99)
  • No raw event aggregation: User profiles are precomputed; no DB scans on request path
  • No DB joins: Candidate pools and profiles are denormalized in cache
  • Graceful degradation: If Redis is unavailable, DB fallback adds ~50ms p99 overhead

Freshness Budget

Personalized Feed Freshness Budget

Freshness DimensionMechanismUpper BoundMeets Requirement
New content visibleCandidate pool TTL (60s)≤ 60 sYes
User-signal updates reflectedEvent pipeline processes with ≤5 min lag; profiles refreshed after processing≤ 5 minYes
Tenant config changesConfig TTL (15 min) + manual invalidation≤ 15 minYes

Non-Personalized Feed Freshness Budget

Freshness DimensionMechanismUpper BoundMeets Requirement
New content visibleCandidate pool TTL (60s)≤ 60 sYes
Tenant config changesConfig TTL (15 min) + manual invalidation≤ 15 minYes
HTTP/CDN cacheOptional edge caching (30s TTL)≤ 30 s (if enabled)Yes

Scalability Considerations

1

Horizontal Scaling: Feed service is stateless and can scale horizontally behind a load balancer. Each instance connects to shared Redis and PostgreSQL.

2

Database Sharding: User_signals table can be sharded by user_id_hash for write scalability. Videos table sharded by tenant_id for isolation.

3

Redis Cluster: Redis can run in cluster mode for high availability and additional capacity. Separate read replicas for hot data.

4

CDN Edge Caching: For popular content, CDN can cache feed responses at the edge (30-second TTL), further reducing backend load.