Architecture
System design and component overview for personalized video feeds
Interactive Architecture Diagram
The system is designed around a request-driven architecture with intelligent caching and asynchronous event processing to meet stringent latency and scale requirements.
API Gateway
Entry point for all feed requests. Handles rate limiting, validation, and routing decisions.
Key Details
• Rate limiting: 3,000 RPS peak capacity
• Request validation (user_id, tenant_id format)
• Feature flag evaluation (per-tenant personalization)
• Authentication & authorization
• Routes to personalized or fallback service
Legend
Components
Data Flows
Component Descriptions
Mobile SDK
Embedded in host applications (fitness apps, cooking apps, etc.). Sends hashed user IDs and optional demographic hints. Tracks user events (views, completions, skips, likes) and sends them asynchronously to the event pipeline.
API Gateway
Entry point for all feed requests. Performs rate limiting (3k RPS peak), request validation, and feature flag checks. Routes to personalized or non-personalized service based on tenant configuration and feature flags.
Personalized Feed Service
Core personalization logic. Fetches user signals, tenant configs, and video metadata. Calls ranking engine to score and sort videos. Returns personalized feed with metadata explaining ranking reasons.
Cache Layer (Redis)
High-performance caching using a shared Redis cluster. Shown as two nodes in the diagram for visual clarity, but uses the same infrastructure. Personalized feeds use user-specific keys (feed:{tenant_id}:{user_id_hash}) with ~95% hit rate. Non-personalized feeds use tenant-level keys (feed:non-personalized:{tenant_id}) with 99%+ hit rate since all users share the same feed. Feed results cached for 60 seconds, user signals for 5 minutes, tenant configs for 15 minutes.
Ranking Engine
Scoring algorithm that combines multiple signals: watch history match (category affinity), engagement patterns (completion rate), editorial boosts (CMS-set), and demographic hints. Uses tenant-specific weights to calculate final scores.
Event Pipeline
Asynchronous event processing using Kafka or SQS. Batches user events and writes to user_signals database. 5-minute lag is acceptable, allowing for efficient batching and reducing write load on the database.
Databases
PostgreSQL for videos, user_signals, and tenant_configs. Videos table indexed by (tenant_id, created_at) for fast lookups. User_signals indexed by (user_id_hash, timestamp) with automatic 90-day retention via TTL or scheduled jobs.
Caching Strategy
Personalized Feed Results
TTL: 60 seconds | Key: feed:{tenant_id}:{user_id_hash}
Short TTL ensures content freshness (≤60s requirement). Handles most traffic at peak, reducing database load by ~95%. Each user has their own cache entry.
Non-Personalized Feed Results
TTL: 60 seconds | Key: feed:non-personalized:{tenant_id}
Tenant-level caching means all users without personalization share the same cached feed. Achieves 99%+ cache hit rate since many new users and fallback scenarios use this path. Critical for handling user onboarding spikes.
User Signals
TTL: 5 minutes | Key: signals:{user_id_hash}
Matches the acceptable lag for user signal updates. Balances freshness with cache efficiency.
Tenant Configs
TTL: 15 minutes | Key: config:{tenant_id}
Infrequently changed, so longer TTL is acceptable. Reduces config lookup overhead.
Video Metadata
TTL: 60 seconds | Key: video:{video_id}
Ensures new videos appear quickly. Individual video caching reduces repeated queries.
Scalability Considerations
Horizontal Scaling: Feed service is stateless and can scale horizontally behind a load balancer. Each instance connects to shared Redis and PostgreSQL.
Database Sharding: User_signals table can be sharded by user_id_hash for write scalability. Videos table sharded by tenant_id for isolation.
Redis Cluster: Redis can run in cluster mode for high availability and additional capacity. Separate read replicas for hot data.
CDN Edge Caching: For popular content, CDN can cache feed responses at the edge (30-second TTL), further reducing backend load.