Data Model
Database schemas, indexes, and retention policies
videos
Core table storing video content metadata for all tenants.
CREATE TABLE videos (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id VARCHAR(50) NOT NULL,
title VARCHAR(255) NOT NULL,
description TEXT,
thumbnail_url VARCHAR(500),
video_url VARCHAR(500) NOT NULL,
duration_seconds INTEGER NOT NULL,
category VARCHAR(50) NOT NULL,
editorial_boost FLOAT DEFAULT 1.0,
maturity_rating VARCHAR(20) DEFAULT 'general',
geo_restrictions JSONB,
metadata JSONB,
created_at TIMESTAMP DEFAULT NOW(),
updated_at TIMESTAMP DEFAULT NOW(),
deleted_at TIMESTAMP,
CONSTRAINT videos_tenant_id_fkey
FOREIGN KEY (tenant_id)
REFERENCES tenants(id)
ON DELETE CASCADE
);
-- Indexes for performance
CREATE INDEX idx_videos_tenant_created
ON videos(tenant_id, created_at DESC)
WHERE deleted_at IS NULL;
CREATE INDEX idx_videos_category
ON videos(tenant_id, category)
WHERE deleted_at IS NULL;
CREATE INDEX idx_videos_editorial_boost
ON videos(editorial_boost DESC)
WHERE deleted_at IS NULL;Key Fields
- editorial_boost: Multiplier set by CMS (1.0-2.0 range)
- maturity_rating: Content filtering (general, teen, mature)
- geo_restrictions: JSON array of allowed countries
- metadata: Flexible JSON for creator info, tags, etc.
Index Strategy
- idx_videos_tenant_created: Fast recent video lookups per tenant
- idx_videos_category: Category filtering optimization
- Partial index: Excludes deleted videos (soft delete pattern)
user_signals
Stores user interaction events for personalization. Auto-expires after 90 days.
CREATE TABLE user_signals (
id BIGSERIAL PRIMARY KEY,
user_id_hash VARCHAR(64) NOT NULL,
tenant_id VARCHAR(50) NOT NULL,
video_id UUID NOT NULL,
event_type VARCHAR(20) NOT NULL,
engagement_pct FLOAT,
timestamp TIMESTAMP DEFAULT NOW(),
demographics JSONB,
CONSTRAINT user_signals_video_fkey
FOREIGN KEY (video_id)
REFERENCES videos(id)
ON DELETE CASCADE,
CONSTRAINT valid_event_types
CHECK (event_type IN ('view', 'complete', 'skip', 'like', 'share'))
);
-- Composite index for user signal lookups
CREATE INDEX idx_user_signals_user_time
ON user_signals(user_id_hash, timestamp DESC);
CREATE INDEX idx_user_signals_video
ON user_signals(video_id, timestamp DESC);
-- Partition by month for efficient data expiration
CREATE TABLE user_signals_2026_01 PARTITION OF user_signals
FOR VALUES FROM ('2026-01-01') TO ('2026-02-01');
CREATE TABLE user_signals_2026_02 PARTITION OF user_signals
FOR VALUES FROM ('2026-02-01') TO ('2026-03-01');
-- Auto-delete old partitions with cron job
-- DROP TABLE user_signals_2025_10; -- 90 days agoEvent Types
- view: User started watching (threshold: 3+ seconds)
- complete: User watched >80% of video
- skip: User stopped watching before 20%
- like: Explicit positive signal
- share: Strong engagement indicator
Privacy & Retention
- user_id_hash: SHA-256 hash, never stores raw user IDs
- 90-day retention: Monthly partitions dropped automatically
- No PII: Demographics are aggregated only (age_group, not age)
Partitioning Strategy
- Monthly partitions for efficient data expiration
- Old partitions dropped instead of DELETE queries (much faster)
- Reduces table bloat and maintains consistent performance
tenant_configs
Per-tenant configuration for personalization weights and feature flags.
CREATE TABLE tenant_configs (
tenant_id VARCHAR(50) PRIMARY KEY,
personalization_enabled BOOLEAN DEFAULT true,
personalization_weights JSONB NOT NULL,
content_filters JSONB,
rate_limit_rps INTEGER DEFAULT 600,
cache_ttl_seconds INTEGER DEFAULT 60,
created_at TIMESTAMP DEFAULT NOW(),
updated_at TIMESTAMP DEFAULT NOW(),
CONSTRAINT tenant_configs_tenant_fkey
FOREIGN KEY (tenant_id)
REFERENCES tenants(id)
ON DELETE CASCADE
);
-- Example personalization_weights JSON:
-- {
-- "watch_history": 0.5,
-- "engagement": 0.3,
-- "editorial": 0.2
-- }
-- Example content_filters JSON:
-- {
-- "max_maturity": "teen",
-- "excluded_categories": ["violent", "horror"],
-- "geo_allowed": ["US", "CA", "UK"]
-- }Configuration Fields
- personalization_enabled: Master kill switch per tenant
- personalization_weights: Customizable scoring coefficients
- content_filters: Maturity, category, geo restrictions
- rate_limit_rps: Per-tenant rate limit override
Weight Constraints
- All weights must sum to 1.0 (validated by application layer)
- Default: watch_history=0.5, engagement=0.3, editorial=0.2
- Tenants can customize to emphasize different signals
tenants
Master table for tenant (host app) metadata.
CREATE TABLE tenants (
id VARCHAR(50) PRIMARY KEY,
name VARCHAR(100) NOT NULL,
api_key_hash VARCHAR(64) NOT NULL UNIQUE,
status VARCHAR(20) DEFAULT 'active',
tier VARCHAR(20) DEFAULT 'standard',
created_at TIMESTAMP DEFAULT NOW(),
updated_at TIMESTAMP DEFAULT NOW(),
CONSTRAINT valid_status
CHECK (status IN ('active', 'suspended', 'deleted')),
CONSTRAINT valid_tier
CHECK (tier IN ('standard', 'premium', 'enterprise'))
);
CREATE INDEX idx_tenants_status
ON tenants(status)
WHERE status = 'active';Entity Relationship Diagram
Interactive diagram showing relationships between database tables. Drag to pan and scroll to zoom.
Relationships
- • Each tenant has many videos (1:N)
- • Each tenant has one config (1:1)
- • Each video has many user signals from different users (1:M)
- • Signals scale with user activity, not video count
- • Foreign keys enforce referential integrity
- • Cascade deletes clean up dependent records
Query Patterns & Performance
Fetch User Signals
-- Retrieves last 30 days of user activity
SELECT video_id, event_type, engagement_pct, timestamp
FROM user_signals
WHERE user_id_hash = '06d6cbdcfc221d2f4460c17193442b9db221f30950f1c17af4e73e6e1788002b'
AND timestamp > NOW() - INTERVAL '30 days'
ORDER BY timestamp DESC
LIMIT 100;
-- Performance: ~5ms with idx_user_signals_user_time
-- Returns: User's recent viewing history for scoringFetch Video Candidates
-- Get recent videos for a tenant
SELECT id, title, category, editorial_boost, created_at
FROM videos
WHERE tenant_id = 'tenant1'
AND deleted_at IS NULL
AND created_at > NOW() - INTERVAL '30 days'
ORDER BY created_at DESC
LIMIT 1000;
-- Performance: ~15ms with idx_videos_tenant_created
-- Returns: Pool of videos to score and rankCategory Affinity Analysis
-- Calculate user's category preferences
SELECT v.category,
COUNT(*) as view_count,
AVG(us.engagement_pct) as avg_engagement
FROM user_signals us
JOIN videos v ON us.video_id = v.id
WHERE us.user_id_hash = '06d6cbdcfc221d2f4460c17193442b9db221f30950f1c17af4e73e6e1788002b'
AND us.event_type IN ('view', 'complete')
AND us.timestamp > NOW() - INTERVAL '30 days'
GROUP BY v.category
ORDER BY view_count DESC, avg_engagement DESC;
-- Performance: ~10ms with proper indexes
-- Used by: Ranking algorithm for category matchingData Retention & Cleanup
90-Day User Signal Retention
Monthly partitions of user_signals table are automatically dropped after 90 days. Scheduled job runs daily to check and remove expired partitions.
Soft Delete Videos
Videos are soft-deleted (deleted_at timestamp) rather than hard-deleted to maintain referential integrity with user_signals. Hard delete after user_signals expire.
Vacuum & Analyze
PostgreSQL VACUUM and ANALYZE run weekly to reclaim space and update statistics for optimal query planning. Critical for maintaining sub-250ms query performance.