Data Model

Database schemas, indexes, and retention policies

videos

Core table storing video content metadata for all tenants.

CREATE TABLE videos (
  id                UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  tenant_id         VARCHAR(50) NOT NULL,
  title             VARCHAR(255) NOT NULL,
  description       TEXT,
  thumbnail_url     VARCHAR(500),
  video_url         VARCHAR(500) NOT NULL,
  duration_seconds  INTEGER NOT NULL,
  category          VARCHAR(50) NOT NULL,
  editorial_boost   FLOAT DEFAULT 1.0,
  maturity_rating   VARCHAR(20) DEFAULT 'general',
  geo_restrictions  JSONB,
  metadata          JSONB,
  created_at        TIMESTAMP DEFAULT NOW(),
  updated_at        TIMESTAMP DEFAULT NOW(),
  deleted_at        TIMESTAMP,
  
  CONSTRAINT videos_tenant_id_fkey 
    FOREIGN KEY (tenant_id) 
    REFERENCES tenants(id) 
    ON DELETE CASCADE
);

-- Indexes for performance
CREATE INDEX idx_videos_tenant_created 
  ON videos(tenant_id, created_at DESC) 
  WHERE deleted_at IS NULL;

CREATE INDEX idx_videos_category 
  ON videos(tenant_id, category) 
  WHERE deleted_at IS NULL;

CREATE INDEX idx_videos_editorial_boost 
  ON videos(editorial_boost DESC) 
  WHERE deleted_at IS NULL;

Key Fields

  • editorial_boost: Multiplier set by CMS (1.0-2.0 range)
  • maturity_rating: Content filtering (general, teen, mature)
  • geo_restrictions: JSON array of allowed countries
  • metadata: Flexible JSON for creator info, tags, etc.

Index Strategy

  • idx_videos_tenant_created: Fast recent video lookups per tenant
  • idx_videos_category: Category filtering optimization
  • Partial index: Excludes deleted videos (soft delete pattern)

user_signals

Stores user interaction events for personalization. Auto-expires after 90 days.

CREATE TABLE user_signals (
  id              BIGSERIAL PRIMARY KEY,
  user_id_hash    VARCHAR(64) NOT NULL,
  tenant_id       VARCHAR(50) NOT NULL,
  video_id        UUID NOT NULL,
  event_type      VARCHAR(20) NOT NULL,
  engagement_pct  FLOAT,
  timestamp       TIMESTAMP DEFAULT NOW(),
  demographics    JSONB,
  
  CONSTRAINT user_signals_video_fkey 
    FOREIGN KEY (video_id) 
    REFERENCES videos(id) 
    ON DELETE CASCADE,
    
  CONSTRAINT valid_event_types 
    CHECK (event_type IN ('view', 'complete', 'skip', 'like', 'share'))
);

-- Composite index for user signal lookups
CREATE INDEX idx_user_signals_user_time 
  ON user_signals(user_id_hash, timestamp DESC);

CREATE INDEX idx_user_signals_video 
  ON user_signals(video_id, timestamp DESC);

-- Partition by month for efficient data expiration
CREATE TABLE user_signals_2026_01 PARTITION OF user_signals
  FOR VALUES FROM ('2026-01-01') TO ('2026-02-01');

CREATE TABLE user_signals_2026_02 PARTITION OF user_signals
  FOR VALUES FROM ('2026-02-01') TO ('2026-03-01');

-- Auto-delete old partitions with cron job
-- DROP TABLE user_signals_2025_10; -- 90 days ago

Event Types

  • view: User started watching (threshold: 3+ seconds)
  • complete: User watched >80% of video
  • skip: User stopped watching before 20%
  • like: Explicit positive signal
  • share: Strong engagement indicator

Privacy & Retention

  • user_id_hash: SHA-256 hash, never stores raw user IDs
  • 90-day retention: Monthly partitions dropped automatically
  • No PII: Demographics are aggregated only (age_group, not age)

Partitioning Strategy

  • Monthly partitions for efficient data expiration
  • Old partitions dropped instead of DELETE queries (much faster)
  • Reduces table bloat and maintains consistent performance

tenant_configs

Per-tenant configuration for personalization weights and feature flags.

CREATE TABLE tenant_configs (
  tenant_id                 VARCHAR(50) PRIMARY KEY,
  personalization_enabled   BOOLEAN DEFAULT true,
  personalization_weights   JSONB NOT NULL,
  content_filters           JSONB,
  rate_limit_rps            INTEGER DEFAULT 600,
  cache_ttl_seconds         INTEGER DEFAULT 60,
  created_at                TIMESTAMP DEFAULT NOW(),
  updated_at                TIMESTAMP DEFAULT NOW(),
  
  CONSTRAINT tenant_configs_tenant_fkey 
    FOREIGN KEY (tenant_id) 
    REFERENCES tenants(id) 
    ON DELETE CASCADE
);

-- Example personalization_weights JSON:
-- {
--   "watch_history": 0.5,
--   "engagement": 0.3,
--   "editorial": 0.2
-- }

-- Example content_filters JSON:
-- {
--   "max_maturity": "teen",
--   "excluded_categories": ["violent", "horror"],
--   "geo_allowed": ["US", "CA", "UK"]
-- }

Configuration Fields

  • personalization_enabled: Master kill switch per tenant
  • personalization_weights: Customizable scoring coefficients
  • content_filters: Maturity, category, geo restrictions
  • rate_limit_rps: Per-tenant rate limit override

Weight Constraints

  • All weights must sum to 1.0 (validated by application layer)
  • Default: watch_history=0.5, engagement=0.3, editorial=0.2
  • Tenants can customize to emphasize different signals

tenants

Master table for tenant (host app) metadata.

CREATE TABLE tenants (
  id              VARCHAR(50) PRIMARY KEY,
  name            VARCHAR(100) NOT NULL,
  api_key_hash    VARCHAR(64) NOT NULL UNIQUE,
  status          VARCHAR(20) DEFAULT 'active',
  tier            VARCHAR(20) DEFAULT 'standard',
  created_at      TIMESTAMP DEFAULT NOW(),
  updated_at      TIMESTAMP DEFAULT NOW(),
  
  CONSTRAINT valid_status 
    CHECK (status IN ('active', 'suspended', 'deleted')),
    
  CONSTRAINT valid_tier 
    CHECK (tier IN ('standard', 'premium', 'enterprise'))
);

CREATE INDEX idx_tenants_status 
  ON tenants(status) 
  WHERE status = 'active';

Entity Relationship Diagram

Interactive diagram showing relationships between database tables. Drag to pan and scroll to zoom.

Relationships

  • • Each tenant has many videos (1:N)
  • • Each tenant has one config (1:1)
  • • Each video has many user signals from different users (1:M)
  • • Signals scale with user activity, not video count
  • • Foreign keys enforce referential integrity
  • • Cascade deletes clean up dependent records

Query Patterns & Performance

Fetch User Signals

-- Retrieves last 30 days of user activity
SELECT video_id, event_type, engagement_pct, timestamp
FROM user_signals
WHERE user_id_hash = '06d6cbdcfc221d2f4460c17193442b9db221f30950f1c17af4e73e6e1788002b'
  AND timestamp > NOW() - INTERVAL '30 days'
ORDER BY timestamp DESC
LIMIT 100;

-- Performance: ~5ms with idx_user_signals_user_time
-- Returns: User's recent viewing history for scoring

Fetch Video Candidates

-- Get recent videos for a tenant
SELECT id, title, category, editorial_boost, created_at
FROM videos
WHERE tenant_id = 'tenant1'
  AND deleted_at IS NULL
  AND created_at > NOW() - INTERVAL '30 days'
ORDER BY created_at DESC
LIMIT 1000;

-- Performance: ~15ms with idx_videos_tenant_created
-- Returns: Pool of videos to score and rank

Category Affinity Analysis

-- Calculate user's category preferences
SELECT v.category, 
       COUNT(*) as view_count,
       AVG(us.engagement_pct) as avg_engagement
FROM user_signals us
JOIN videos v ON us.video_id = v.id
WHERE us.user_id_hash = '06d6cbdcfc221d2f4460c17193442b9db221f30950f1c17af4e73e6e1788002b'
  AND us.event_type IN ('view', 'complete')
  AND us.timestamp > NOW() - INTERVAL '30 days'
GROUP BY v.category
ORDER BY view_count DESC, avg_engagement DESC;

-- Performance: ~10ms with proper indexes
-- Used by: Ranking algorithm for category matching

Data Retention & Cleanup

1

90-Day User Signal Retention

Monthly partitions of user_signals table are automatically dropped after 90 days. Scheduled job runs daily to check and remove expired partitions.

2

Soft Delete Videos

Videos are soft-deleted (deleted_at timestamp) rather than hard-deleted to maintain referential integrity with user_signals. Hard delete after user_signals expire.

3

Vacuum & Analyze

PostgreSQL VACUUM and ANALYZE run weekly to reclaim space and update statistics for optimal query planning. Critical for maintaining sub-250ms query performance.