Legionis: Architecture Plan v3 (Definitive)

0.1 The Four Layers

┌─────────────────────────────────────────────────────────────────┐
│  LAYER 4: AGENT LAYER                                           │
│  81 specialists | 10 departments | Modular provisioning         │
│  Cross-department intelligence                                   │
│  Defensibility: MEDIUM (deep SKILL.md, knowledge packs)          │
├─────────────────────────────────────────────────────────────────┤
│  LAYER 3: COMMUNICATION LAYER  ★ KEY DIFFERENTIATOR             │
│  Intelligent routing | Agent-to-agent cascade                    │
│  Cooperation profiles | Invisible presentation                   │
│  Conversation context continuity                                 │
│  "The nervous system of the AI workforce"                        │
│  Defensibility: HIGH (no competitor has this)                    │
├─────────────────────────────────────────────────────────────────┤
│  LAYER 2: CONTEXT LAYER                                         │
│  Decisions & strategic bets | Feedback loop                      │
│  Cross-reference graph | Auto-context injection                  │
│  Stored in USER's cloud | Compounding value                     │
│  Month 1: Useful → Month 3: Valuable → Month 6+: Irreplaceable │
│  Defensibility: VERY HIGH (organic switching cost)               │
├─────────────────────────────────────────────────────────────────┤
│  LAYER 1: COMPUTE LAYER                                         │
│  BYOT (zero markup) | Quality Toggle (Haiku/Sonnet/Opus)        │
│  Managed Tokens (15% prepaid Token Banks per DR-2026-004)        │
│  Defensibility: LOW (commoditized by design)                     │
└───────────┬──────────────────┬──────────────────┬───────────────┘
            │                  │                  │
            ▼                  ▼                  ▼
    ┌──────────────┐  ┌──────────────┐  ┌──────────────┐
    │ LLM Providers │  │Cloud Storage │  │Cloud Services│
    │ Anthropic     │  │Google Drive  │  │Stripe, Clerk │
    │ OpenAI        │  │OneDrive      │  │Neon, R2      │
    │ (user's keys) │  │Dropbox       │  │Typesense     │
    └──────────────┘  └──────────────┘  └──────────────┘

0.2 Layer Descriptions

Layer	Role	Maps To (Implementation)
Agent	81 specialists across 10 departments with deep SKILL.md files, knowledge packs, and team personalities. Modular provisioning lets users assemble exactly the team they need. Cross-department intelligence means agents from different teams can be invoked together.	Sections 2 (Agent Runtime), 3 (Prompt Compilation), 8 (Team Personalities), 9 (Knowledge Packs), 14 (Agent Roster)
Communication	The nervous system. Intelligent routing sends requests to the right agent. Agent-to-agent cascade enables delegation (consultation, delegation, review, debate). Cooperation profiles define how agents interact. Invisible presentation means the user sees a unified team, not plumbing. Conversation context continuity ensures agents share the thread.	Sections 4 (Gateway Orchestration), 2.4 (Sub-Agent Spawning), 7 (API Routes)
Context	Organizational memory that compounds. Decisions, strategic bets, assumptions, feedback, and learnings are stored in the user's cloud and cross-referenced. Auto-context injection means agents automatically recall relevant history before producing deliverables.	Sections 5 (Context Layer), 10 (Cloud Storage)
Compute	The transparent foundation. BYOT means users bring their own API keys with zero markup. Quality Toggle lets users choose cost/quality tradeoff (Haiku for speed, Sonnet for balance, Opus for depth). Managed Tokens via 15% prepaid Token Banks (DR-2026-004) provide a convenience option.	Sections 2.2 (BYOT Routing), 1.2 (Technology Stack)

0.3 Three External Connections

The platform connects to three categories of external services:

Connection	Services	Layer	User Owns?
LLM Providers	Anthropic, OpenAI (more via Vercel AI SDK)	Compute	Yes (BYOT keys)
Cloud Storage	Google Drive (MVP), OneDrive, Dropbox (Growth)	Context	Yes (their cloud account)
Cloud Services	Stripe, Clerk, Neon, R2, Typesense, Sentry, PostHog	All layers	No (platform infrastructure)

0.4 Defensibility Assessment

Layer	Defensibility	Rationale
Compute	Low	Commoditized by design. BYOT and cloud storage are transparency features, not moats. We chose trust over lock-in.
Agent	Medium	81 agents with deep SKILL.md files (300-440 lines each), 34 knowledge packs, and team personalities. Significant effort to replicate, but ultimately copyable given time.
Communication	High	Intelligent routing, agent-to-agent cascade, cooperation profiles, and invisible presentation. No competitor has built this. "Other platforms give you 10 specialists in 10 separate rooms. Legionis puts them in the same room."
Context	Very High	Organizational memory compounds organically through usage. After 3+ months of decisions, bets, feedback, and learnings, switching cost is earned, not engineered. The cross-reference graph creates intelligence that is unique to each customer's org.

Core thesis: "Intelligence is a commodity. Coordination is the moat."

0.5 Compounding Flywheel

The four layers create a reinforcing cycle:

User starts with one team (Agent Layer)
  → Agents collaborate via Communication Layer
    → Deliverables save to user's cloud (Context Layer)
      → Context accumulates across interactions
        → Trust deepens (they own everything)
          → Pay only what they use via BYOT (Compute Layer)
            → User adds more teams → cycle deepens

Each turn of the flywheel increases the value of all layers. The longer a user stays, the more irreplaceable the Context Layer becomes, and the more valuable the Communication Layer's ability to leverage that context across agents.

1. System Architecture Overview

1.1 High-Level Architecture

┌─────────────────────────────────────────────────────────────────────────────┐
│                              CLIENT (Browser)                               │
│  Next.js 14+ App Router │ Tailwind + Radix │ Zustand + TanStack Query      │
│  Tiptap Editor │ EventSource (SSE) │ Meeting Mode Renderer                  │
└────────────────────────────────┬────────────────────────────────────────────┘
                                 │ HTTPS / SSE
                                 ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                        VERCEL (Next.js API Routes)                          │
│                                                                             │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐   │
│  │ /api/chat     │  │ /api/plt     │  │ /api/gateway │  │ /api/skill   │   │
│  │ maxDur: 60s   │  │ maxDur: 120s │  │ maxDur: 120s │  │ maxDur: 30s  │   │
│  └──────┬───────┘  └──────┬───────┘  └──────┬───────┘  └──────┬───────┘   │
│         │                  │                  │                  │           │
│         ▼                  ▼                  ▼                  ▼           │
│  ┌─────────────────────────────────────────────────────────────────────┐    │
│  │                     AGENT RUNTIME ENGINE                            │    │
│  │                                                                     │    │
│  │  ┌──────────────────┐  ┌──────────────────┐  ┌──────────────────┐  │    │
│  │  │ Prompt Compiler   │  │ Tool Registry     │  │ Delegation       │  │    │
│  │  │ (3-layer cached)  │  │ (Cloud-safe)      │  │ Engine           │  │    │
│  │  └──────────────────┘  └──────────────────┘  └──────────────────┘  │    │
│  │                                                                     │    │
│  │  ┌──────────────────┐  ┌──────────────────┐  ┌──────────────────┐  │    │
│  │  │ Vercel AI SDK     │  │ BYOT Key Router   │  │ Auto-Context     │  │    │
│  │  │ generateText()    │  │ Per-request keys   │  │ Injector         │  │    │
│  │  │ streamText()      │  │ 24+ providers      │  │ (DB-backed)      │  │    │
│  │  └──────────────────┘  └──────────────────┘  └──────────────────┘  │    │
│  └─────────────────────────────────────────────────────────────────────┘    │
│                                                                             │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐       │
│  │ Clerk Auth   │  │ Stripe      │  │ Sentry      │  │ PostHog     │       │
│  │ JWT + Orgs   │  │ Billing     │  │ Errors      │  │ Analytics   │       │
│  └─────────────┘  └─────────────┘  └─────────────┘  └─────────────┘       │
└────────────────────────────────────┬────────────────────────────────────────┘
                                     │
                 ┌───────────────────┼───────────────────┐
                 │                   │                   │
                 ▼                   ▼                   ▼
┌──────────────────┐  ┌──────────────────┐  ┌──────────────────┐
│   Neon PostgreSQL │  │  Cloudflare R2   │  │   Typesense      │
│   (Context layer) │  │  (Prompts/files) │  │   (Full-text)    │
│   RLS per tenant  │  │  Zero egress     │  │   (Search)       │
└──────────────────┘  └──────────────────┘  └──────────────────┘
                 │
                 ▼
┌──────────────────────────────────────────────────────────────────┐
│                   CLOUD STORAGE (User's Data)                    │
│  Google Drive API v3  │  OneDrive Graph API  │  Dropbox API v2  │
│  (MVP)                │  (Growth)            │  (Growth)        │
└──────────────────────────────────────────────────────────────────┘

1.2 Technology Stack Summary

Layer	Technology	Version	Rationale
Frontend	Next.js (App Router)	14+	SSR, streaming, Vercel-native
Styling	Tailwind CSS + Radix UI	4.x	Utility-first, accessible
State	Zustand + TanStack Query	Latest	Minimal client, smart server
Editor	Tiptap (ProseMirror)	Latest	Markdown, collaborative-ready
Agent SDK	Vercel AI SDK (`ai`)	6.x	Pure API, multi-provider, 7.87M/wk
Providers	`@ai-sdk/anthropic` + `@ai-sdk/openai`	Latest	Claude primary, OpenAI secondary
Database	Neon PostgreSQL	16	Serverless, branching, scale-to-zero
ORM	Drizzle	Latest	Type-safe, minimal abstraction
Storage	Cloudflare R2	N/A	S3-compatible, zero egress
Search	Typesense Cloud	Latest	Typo-tolerant, faceted
Auth	Clerk	Latest	Pre-built UI, social login, orgs
Payments	Stripe Billing	Latest	Usage-based, checkout, portal
Hosting	Vercel (Fluid Compute)	Latest	Next.js native, 300s default timeout
Monitoring	Sentry + Better Stack	Latest	Errors + logs + uptime
Analytics	PostHog	Free tier	Product analytics, feature flags

1.3 Infrastructure Cost

Service	Plan	Monthly Cost
Vercel Pro	1 seat	$20
Neon Launch	PostgreSQL	$19
Cloudflare R2	~10GB	$0.15
Typesense Starter	0.5GB RAM	$40
Clerk Pro	10K MAU free	$25
Sentry Team	Error tracking	$29
PostHog	Free tier	$0
Better Stack	Starter	$29
Total		~$162/mo

AI costs: $0 (users bring their own API keys)

2. Agent Runtime Architecture

2.1 Core Runtime: Vercel AI SDK

The agent runtime is built on Vercel AI SDK v6.x. Each agent interaction is a generateText() or streamText() call with custom tools and system prompts.

// lib/agent-runtime.ts
import { generateText, streamText, tool } from 'ai';
import { anthropic } from '@ai-sdk/anthropic';
import { openai } from '@ai-sdk/openai';
export async function invokeAgent(params: AgentInvocation): Promise {
  const { agentKey, userMessage, workspaceId, apiKey, provider } = params;
  // 1. Compile system prompt (3-layer cached)
  const systemPrompt = await compilePrompt(agentKey, workspaceId, userMessage);
  // 2. Select model based on user's provider + key
  const model = selectModel(provider, apiKey);
  // 3. Load cloud-safe tools for this agent
  const tools = loadTools(agentKey, workspaceId);
  // 4. Execute agent
  const result = await generateText({
    model,
    system: systemPrompt,
    tools,
    maxSteps: 10,
    prompt: userMessage,
    providerOptions: {
      anthropic: { cacheControl: { type: 'ephemeral' } },
    },
    onStepFinish({ text, toolCalls, toolResults, usage }) {
      // Track per-step metrics
      trackStepMetrics(params.traceContext, { toolCalls, usage });
    },
  });
  // 5. Post-processing
  await postProcess(result, params);  return formatResult(result);
}

2.2 Model Selection and BYOT Routing

// lib/model-router.ts
function selectModel(provider: string, apiKey: string) {
  switch (provider) {
    case 'anthropic':
      return anthropic({ apiKey })('claude-sonnet-4-5');
    case 'openai':
      return openai({ apiKey })('gpt-4o');
    default:
      throw new Error(Unsupported provider: ${provider});
  }
}

Users configure their API keys per-provider. The key is decrypted at runtime from the user_api_keys table (envelope encryption with KMS) and passed per-request. This means:

Zero token costs as COGS
Per-user rate limits (their Anthropic/OpenAI tier)
Model flexibility (users choose their price/quality tradeoff)
Natural multi-tenancy (no shared API key concerns)

2.3 Cloud-Safe Tool Definitions

Six custom tools replace the CLI's local filesystem operations:

// lib/tools/index.ts
import { tool } from 'ai';
import { z } from 'zod';
export function createAgentTools(workspace: Workspace) {
  return {
    readFile: tool({
      description: 'Read a file from the workspace cloud storage',
      parameters: z.object({
        filePath: z.string().describe('Relative path within workspace'),
        fileId: z.string().optional().describe('Cloud storage file ID'),
      }),
      execute: async ({ filePath, fileId }) => {
        const id = fileId || await resolvePathToId(filePath, workspace);
        return await cloudStorage.readFile(id, workspace);
      },
    }),
    writeFile: tool({
      description: 'Write a file to the workspace cloud storage',
      parameters: z.object({
        filePath: z.string(),
        content: z.string(),
      }),
      execute: async ({ filePath, content }) => {
        const fileId = await cloudStorage.upsertFile(filePath, content, workspace);
        await searchIndex.upsertDocument(fileId, content, workspace.id);
        return { fileId, path: filePath };
      },
    }),
    editFile: tool({
      description: 'Make string replacements in a workspace file',
      parameters: z.object({
        filePath: z.string(),
        oldString: z.string(),
        newString: z.string(),
      }),
      execute: async ({ filePath, oldString, newString }) => {
        const content = await cloudStorage.readFile(filePath, workspace);
        const updated = content.replace(oldString, newString);
        return await cloudStorage.updateFile(filePath, updated, workspace);
      },
    }),
    globFiles: tool({
      description: 'Find files matching a pattern in the workspace',
      parameters: z.object({
        pattern: z.string(),
      }),
      execute: async ({ pattern }) => {
        return await cloudStorage.listFiles(pattern, workspace);
      },
    }),
    grepContent: tool({
      description: 'Search file contents in the workspace',
      parameters: z.object({
        query: z.string(),
        filePattern: z.string().optional(),
      }),
      execute: async ({ query, filePattern }) => {
        return await typesense.search(query, workspace.id, filePattern);
      },
    }),    spawnSubAgent: tool({
      description: 'Spawn a sub-agent for specialized input',
      parameters: z.object({
        agentId: z.string(),
        task: z.string(),
        delegationPattern: z.enum([
          'consultation', 'delegation', 'review', 'debate'
        ]).optional(),
      }),
      execute: async ({ agentId, task, delegationPattern }, context) => {
        return await spawnSubAgent({
          agentId,
          task,
          pattern: delegationPattern || 'consultation',
          parentContext: context,
          workspace,
        });
      },
    }),
  };
}

2.4 Sub-Agent Spawning (Delegation Protocol)

The OS v3 delegation protocol maps directly to the SaaS runtime:

// lib/delegation.ts
async function spawnSubAgent(params: SubAgentParams): Promise {
  const { agentId, task, pattern, parentContext, workspace } = params;
  // Enforce depth limit (max 2 levels)
  if (parentContext.depth >= 2) {
    return { error: 'Max sub-agent depth reached. Provide analysis inline.' };
  }
  // Load sub-agent persona
  const persona = await loadAgentPersona(agentId);
  // Build system prompt with delegation context
  const systemPrompt = await compilePrompt(agentId, workspace.id, task, {
    delegationPattern: pattern,
    parentAgent: parentContext.agentKey,
  });
  // Execute sub-agent
  const result = await generateText({
    model: selectModel(parentContext.provider, parentContext.apiKey),
    system: systemPrompt,
    tools: createAgentTools(workspace),
    maxSteps: 5, // Sub-agents get fewer steps
    prompt: buildDelegationPrompt(pattern, task),
  });
  // Log sub-agent invocation with parent trace
  await logInvocation({
    type: 'agent',
    agentOrSkill: agentId,
    parentSpanId: parentContext.spanId,
    requestId: parentContext.requestId,
    ...result.usage,
  });
  return { response: result.text, agentId };
}function buildDelegationPrompt(pattern: string, task: string): string {
  const prefixes: Record = {
    consultation: task,
    delegation: [DELEGATION] ${task},
    review: [REVIEW] ${task},
    debate: [DEBATE] ${task},
  };
  return prefixes[pattern] || task;
}

3. System Prompt Compilation Pipeline

3.1 Three-Layer Cached Architecture

The system prompt is structured for maximum cache efficiency with Anthropic's prompt caching:

┌──────────────────────────────────────────────────────────────┐ │ Layer 1: CORE PROTOCOL (~1,500 tokens) │ │ - Always cached (cache_control: ephemeral) │ │ - Compiled from 10 core rules into single document │ │ - Identical across all agents │ │ - Changes only on OS version updates │ ├──────────────────────────────────────────────────────────────┤ │ Layer 2: AGENT PERSONA (~500 tokens) │ │ - Cached per agent type per session │ │ - Extracted from SKILL.md identity sections │ │ - Includes team personality injection point │ │ - Changes only on agent definition updates │ ├──────────────────────────────────────────────────────────────┤ │ Layer 3: TASK CONTEXT (~200-500 tokens) │ │ - Domain rules (conditional on skill being invoked) │ │ - Auto-injected context (decisions, feedback, bets) │ │ - User's org context (company name, product areas) │ │ - Team personality principles (if configured) │ │ - Changes per request │ └──────────────────────────────────────────────────────────────┘

Total per call: ~2,200-2,500 tokens (vs. ~19,650 naive) Cost with caching: $0.71/mo per user (Sonnet) vs $35/mo naive

3.2 Build Pipeline: compile-prompts.ts

The build pipeline reads canonical SKILL.md files and compiles them for SaaS deployment. This follows the "one source, two build targets" principle: the same SKILL.md files serve both CLI (full content) and SaaS (extracted/compressed).

// scripts/compile-prompts.ts
// Run at build time or on OS version update
import { readFile, writeFile } from 'fs/promises';
import { glob } from 'fast-glob';
interface CompiledPrompt {
  agentKey: string;
  layer1CoreProtocol: string;   // Shared across all agents
  layer2Persona: string;         // Per-agent identity
  domainRules: Record;  // Loaded conditionally
  metadata: AgentMetadata;
}
// Step 1: Compile Core Protocol from 10 Tier 1 rules
async function compileCoreProtocol(): Promise {
  const rules = [
    'agent-spawn-protocol.md',   // Response format, identity
    'no-estimates.md',           // No fabricated numbers
    'v2v-flow.md',               // 6-phase summary
    'context-management.md',     // Save/recall/capture
    'intelligent-routing.md',    // Domain routing
    'delegation-protocol.md',    // 4 delegation patterns
    'principles-enforcement.md', // 8 operating principles
    'meeting-mode.md',           // Multi-agent presentation
    'parallel-execution.md',     // When to parallelize
    'skill-awareness.md',        // Omitted (covered by persona)
  ];
  // Read and extract essential sections from each rule
  // Compile into a single ~1,500 token document
  const sections = await Promise.all(
    rules.map(r => extractEssentials(rules/${r}))
  );
  return ## Agent Operating Protocol\n\n${sections.join('\n\n')};
}
// Step 2: Extract agent persona from SKILL.md
async function extractPersona(skillPath: string): Promise {
  const content = await readFile(skillPath, 'utf-8');
  const parsed = parseSkillMd(content);
  // Extract ONLY identity-essential sections (~80-100 lines → ~500 tokens)
  return [
    # ${parsed.emoji} ${parsed.displayName},
    '',
    ## Identity,
    parsed.coreAccountability,
    '',
    ## How I Think,
    parsed.howIThink,       // 3-5 bullets, unique per agent
    '',
    ## RACI,
    parsed.raci,            // A/R/C items
    '',
    ## Key Deliverables,
    parsed.deliverables,    // Table of 4-5 items
    '',
    ## Collaboration,
    parsed.collaboration,   // 3-4 key relationships
    '',
    ## Primary Skills,
    parsed.skills,          // 5-7 skills with when-to-use
    '',
    ## V2V Phase,
    parsed.primaryPhases,
  ].join('\n');
}
// Step 3: Compile domain rules (Tier 2)
async function compileDomainRules(): Promise> {
  return {
    decisions: await condense('rules/decision-system.md', 300),
    strategy: await condense('rules/strategy-documents.md', 300),
    roadmaps: await condense('rules/roadmaps.md', 300),
    gtm: await condense('rules/gtm-documents.md', 300),
    requirements: await condense('rules/requirements.md', 300),
    context: await condense('rules/auto-context.md', 200)
      + '\n' + await condense('rules/context-graph.md', 200),
  };
}
// Step 4: Main compilation
async function main() {
  const coreProtocol = await compileCoreProtocol();
  // All OS agents
  const osAgentPaths = await glob('skills/*/SKILL.md');
  const osPersonas = await Promise.all(
    osAgentPaths.map(async path => ({
      key: extractAgentKey(path),
      persona: await extractPersona(path),
      metadata: await extractMetadata(path),
    }))
  );
  // All Extension Team agents
  const extAgentPaths = await glob('Extension Teams/*/SKILL.md');
  const extPersonas = await Promise.all(
    extAgentPaths.map(async path => ({
      key: extractAgentKey(path),
      persona: await extractPersona(path),
      metadata: await extractMetadata(path),
    }))
  );
  const domainRules = await compileDomainRules();
  // Output compiled prompts for SaaS deployment
  const output: CompiledPrompts = {
    coreProtocol,
    agents: [...osPersonas, ...extPersonas],
    domainRules,
    version: getOsVersion(),
    compiledAt: new Date().toISOString(),
  };
  // Write to R2-deployable format
  await writeFile('compiled/prompts.json', JSON.stringify(output, null, 2));
  // Also write SQL migration for prompt_templates table
  await generatePromptMigration(output);  console.log(Compiled ${output.agents.length} agents);
  console.log(Core protocol: ${countTokens(coreProtocol)} tokens);
  console.log(Domain rules: ${Object.keys(domainRules).length} domains);
}

3.3 Runtime Prompt Assembly

// lib/prompt-compiler.ts
import { getCompiledPrompts } from './compiled-prompts';
const promptCache = new Map();
export async function compilePrompt(
  agentKey: string,
  workspaceId: string,
  userMessage: string,
  options?: {
    delegationPattern?: string;
    parentAgent?: string;
    teamPersonality?: TeamPersonality;
  }
): Promise {
  const compiled = getCompiledPrompts();
  const messages: SystemMessage[] = [];
  // Layer 1: Core Protocol (always cached, identical across agents)
  messages.push({
    role: 'system',
    content: compiled.coreProtocol,
    providerOptions: {
      anthropic: { cacheControl: { type: 'ephemeral' } },
    },
  });
  // Layer 2: Agent Persona (cached per agent type)
  const agent = compiled.agents.find(a => a.key === agentKey);
  if (!agent) throw new Error(Unknown agent: ${agentKey});
  let personaContent = agent.persona;
  // Inject team personality if configured (see Section 8)
  if (options?.teamPersonality) {
    personaContent += \n\n## Team Operating Principles\n${options.teamPersonality.principles};
  }
  messages.push({
    role: 'system',
    content: personaContent,
    providerOptions: {
      anthropic: { cacheControl: { type: 'ephemeral' } },
    },
  });
  // Layer 3: Task Context (per-request, partially cached)
  const taskContext = await buildTaskContext(
    agentKey, workspaceId, userMessage, options
  );
  if (taskContext) {
    messages.push({
      role: 'system',
      content: taskContext,
      providerOptions: {
        anthropic: { cacheControl: { type: 'ephemeral' } },
      },
    });
  }
  return messages;
}
async function buildTaskContext(
  agentKey: string,
  workspaceId: string,
  userMessage: string,
  options?: any
): Promise {
  const parts: string[] = [];
  // Domain rules (based on detected skill)
  const skill = detectSkillFromMessage(userMessage);
  if (skill) {
    const domain = skillToDomain(skill);
    const compiled = getCompiledPrompts();
    if (compiled.domainRules[domain]) {
      parts.push(compiled.domainRules[domain]);
    }
  }
  // Auto-context injection (from database)
  const topics = extractTopics(userMessage);
  if (topics.length > 0) {
    const context = await queryAutoContext(workspaceId, topics);
    if (context) parts.push(context);
  }
  // Delegation context
  if (options?.delegationPattern) {
    parts.push(Delegation: [${options.delegationPattern.toUpperCase()}] from ${options.parentAgent});
  }  return parts.length > 0 ? parts.join('\n\n---\n\n') : null;
}

3.4 Token Budget Enforcement

Component	Target	Hard Limit	Notes
Core Protocol (L1)	1,500	2,000	Must exceed 1,024 for Sonnet caching
Agent Persona (L2)	500	700	Identity-essential content only
Domain Rules (L3)	300	500	Loaded conditionally per skill
Auto-Context (L3)	200	500	Max 5 context items
Team Personality (L3)	100	200	Principles injection
Total System Prompt	2,500	3,700	Per API call

3.5 Cost Impact (Anthropic Prompt Caching)

Scenario	Sonnet Monthly	vs. Naive
Full uncompressed, no caching	$35.37	Baseline
Compressed, no caching	$4.50	-87%
Compressed + cached	$0.71	-98%

4. Gateway Orchestration (PLT Meeting Mode)

4.1 Architecture

Gateways coordinate multi-agent sessions. The PLT (Product Leadership Team) is the most complex, spawning 3-4 agents in parallel and synthesizing their perspectives.

User Request: "@plt Should we delay launch for SSO?"
                    │
                    ▼
┌──────────────────────────────────────────────────────────────┐
│                    PLT GATEWAY HANDLER                        │
│                    maxDuration: 120s                          │
│                                                              │
│  1. Assess complexity → FULL PLT                             │
│  2. Select agents: vp-product, dir-pm, dir-pmm, prod-ops    │
│  3. Auto-context: query "launch", "SSO" from context DB     │
│                                                              │
│  4. PARALLEL EXECUTION (Promise.all)                         │
│     ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐    │
│     │ VP Prod   │ │ Dir PM   │ │ Dir PMM  │ │ ProdOps  │    │
│     │ maxSteps:5│ │ maxSteps:5│ │ maxSteps:5│ │ maxSteps:5│  │
│     │ ~15-25s   │ │ ~15-25s  │ │ ~15-25s  │ │ ~15-25s  │    │
│     └──────────┘ └──────────┘ └──────────┘ └──────────┘    │
│              Wall clock: max(agent times) ≈ 20-30s           │
│                                                              │
│  5. FORMAT: Meeting Mode (each agent speaks in first person) │
│  6. SYNTHESIZE: VP Product summarizes agreement/tension      │
│  7. LOG: agent_invocations table + ROI tracking              │
└──────────────────────────────────────────────────────────────┘

4.2 Implementation

// app/api/plt/route.ts
export const maxDuration = 120; // 2 minutes, generous for PLT
export async function POST(request: Request) {
  const { topic, workspaceId } = await request.json();
  const { apiKey, provider } = await getUserApiKey(request);
  const model = selectModel(provider, apiKey);
  // Assess complexity and select agents
  const agentIds = assessPLTComplexity(topic);
  // e.g., ['vp-product', 'director-product-management',
  //        'director-product-marketing', 'product-operations']
  // Auto-context injection
  const autoContext = await queryAutoContext(workspaceId, extractTopics(topic));
  // Phase 1: Parallel agent execution
  const agentResults = await Promise.all(
    agentIds.map(agentId => {
      const systemPrompt = compilePrompt(agentId, workspaceId, topic);
      return generateText({
        model,
        system: systemPrompt,
        tools: createAgentTools(workspace),
        maxSteps: 5,
        prompt: ${autoContext ? ## Auto-Context\n${autoContext}\n\n : ''}${topic},
      });
    })
  );
  // Phase 2: Format Meeting Mode
  const meetingMode = formatMeetingMode(agentIds, agentResults);
  // Phase 3: Synthesis (optional — can also stream)
  const synthesizer = agentIds[0]; // VP Product typically synthesizes
  const synthesis = await generateText({
    model,
    system: You are ${getAgentIdentity(synthesizer).displayName}. Synthesize the PLT discussion.,
    prompt: Synthesize these perspectives:\n\n${meetingMode},
    maxSteps: 1,
  });
  // Post-processing
  await logGatewaySession({
    type: 'gateway',
    gateway: 'plt',
    agentsSpawned: agentIds,
    requestId: createRequestId(),
    workspaceId,
  });  return Response.json({
    meetingMode,
    synthesis: synthesis.text,
    roi: calculatePLTRoi(agentResults),
  });
}

4.3 Vercel Constraints (Resolved)

Concern	Resolution
Timeout	Fluid Compute: 300s default, 800s max. PLT targets <60s p95. 5x headroom.
Memory	4GB / 2 vCPU on Pro. Adequate for 4 parallel agents.
Cost	~$0.00035/session (Active CPU billing pauses during I/O waits).
Payload	4.5MB limit is client→Vercel only. LLM calls are server-side.
Rate limits	Per-user BYOT keys. Max 1 PLT session at a time per user.

4.4 maxDuration Configuration

// Per-route timeout configuration
// app/api/chat/route.ts
export const maxDuration = 60;      // Single agent: 1 minute
// app/api/plt/route.ts
export const maxDuration = 120;     // PLT sessions: 2 minutes
// app/api/gateway/[gateway]/route.ts
export const maxDuration = 120;     // All gateways: 2 minutes// app/api/skill/[skill]/route.ts
export const maxDuration = 30;      // Skills: 30 seconds

4.5 maxSteps Configuration

Context	maxSteps	Rationale
Single agent (standalone)	10	Full tool loop capability
PLT-spawned agents	5	Bounds total PLT time
Skill execution	1	Skills are single-step
Sub-agent (delegation)	5	Focused tasks

5. Context Layer Implementation

5.1 CLI-to-Cloud Mapping

The context layer migrates from flat files to PostgreSQL while preserving the same semantics:

CLI Operation	CLI Implementation	Cloud Implementation
`/context-save`	Parse + update index.md + write file + update index.json	INSERT into decisions/bets/learnings + INSERT cross_references
`/context-recall`	Read index.json + filter topics	SELECT with GIN index on topics[] + Typesense
`/portfolio-status`	Read active-bets.md	SELECT portfolio_state JOIN strategic_bets
`/feedback-capture`	Write to context/feedback/	INSERT into feedback + auto-theme matching
Auto-registration	Write to documents/index.md	INSERT into documents table
Cross-references	Update crossReferences in index.json	INSERT into cross_references table

5.2 Auto-Context Injection (Database-Backed)

Before agents produce deliverables, relevant context is automatically injected:

// lib/auto-context.ts
export async function queryAutoContext(
  workspaceId: string,
  topics: string[]
): Promise {
  if (topics.length === 0) return null;
  // Query decisions, bets, and feedback in parallel
  const [decisions, bets, feedback] = await Promise.all([
    db.select()
      .from(schema.decisions)
      .where(and(
        eq(schema.decisions.workspaceId, workspaceId),
        arrayOverlaps(schema.decisions.topics, topics),
        isNull(schema.decisions.archivedAt),
      ))
      .orderBy(desc(schema.decisions.createdAt))
      .limit(5),
    db.select()
      .from(schema.strategicBets)
      .where(and(
        eq(schema.strategicBets.workspaceId, workspaceId),
        arrayOverlaps(schema.strategicBets.topics, topics),
        eq(schema.strategicBets.status, 'active'),
      ))
      .limit(3),
    db.select()
      .from(schema.feedback)
      .where(and(
        eq(schema.feedback.workspaceId, workspaceId),
        arrayOverlaps(schema.feedback.topics, topics),
      ))
      .orderBy(desc(schema.feedback.createdAt))
      .limit(5),
  ]);
  if (!decisions.length && !bets.length && !feedback.length) {
    return null;
  }  return formatAutoContext({ decisions, bets, feedback });
}

5.3 Cross-Reference Graph (Relational)

The CLI's JSON-based cross-references become a proper relational graph:

-- Find all connected items for a decision (bidirectional, 1 hop)
SELECT
  CASE
    WHEN cr.source_type = 'decision' AND cr.source_id = $1 THEN cr.target_type
    ELSE cr.source_type
  END as related_type,
  CASE
    WHEN cr.source_type = 'decision' AND cr.source_id = $1 THEN cr.target_id
    ELSE cr.source_id
  END as related_id,
  cr.relationship
FROM cross_references cr
WHERE cr.workspace_id = $2
  AND (
    (cr.source_type = 'decision' AND cr.source_id = $1)
    OR (cr.target_type = 'decision' AND cr.target_id = $1)
  );

5.4 Interaction Logging

Agent invocations are logged to the agent_invocations table with distributed tracing:

// lib/interaction-logger.ts
export async function logInvocation(params: InvocationLog) {
  await db.insert(schema.agentInvocations).values({
    workspaceId: params.workspaceId,
    userId: params.userId,
    conversationId: params.conversationId,
    invocationType: params.type,
    agentOrSkill: params.agentOrSkill,
    requestSummary: params.requestSummary,
    status: params.status,
    requestId: params.requestId,
    spanId: params.spanId,
    parentSpanId: params.parentSpanId,
    modelUsed: params.modelUsed,
    tokensIn: params.tokensIn,
    tokensOut: params.tokensOut,
    durationMs: params.durationMs,
    toolsUsed: params.toolsUsed,
    agentsSpawned: params.agentsSpawned,
    complexity: params.complexity,
    roiMinutesSaved: params.roiMinutesSaved,
    filesCreated: params.filesCreated,
    contextEntriesCreated: params.contextEntriesCreated,
  });
}

6. Data Model Updates (v3 Additions)

6.1 Team Personalities Table (NEW)

-- Team personality definitions CREATE TABLE team_personalities ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), slug VARCHAR(50) NOT NULL UNIQUE, -- 'v2v-operators', 'user-obsessed', etc. team VARCHAR(50) NOT NULL, -- 'product', 'design', 'architecture', 'marketing' name VARCHAR(255) NOT NULL, -- "Vision to Value Operators" personality_tag VARCHAR(100) NOT NULL, -- 2-3 word descriptor philosophy TEXT NOT NULL, -- 1-2 paragraph worldview principles JSONB NOT NULL DEFAULT '[]', -- Array of {id, name, statement, enforcement} version VARCHAR(20) NOT NULL DEFAULT '1.0.0', is_default BOOLEAN NOT NULL DEFAULT false, -- Default personality for the team created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW() );

-- Default personalities for each team INSERT INTO team_personalities (slug, team, name, personality_tag, philosophy, principles, is_default) VALUES ('v2v-operators', 'product', 'V2V Operating System', 'Vision to Value Operators', 'Product organizations exist to convert strategic vision into measurable customer value...', '[{"id":"P1","name":"End-to-End Ownership","statement":"..."},...]', true), ('user-obsessed', 'design', 'Design Operating Principles', 'User-Obsessed Craftspeople', 'Design exists to make the complex simple and the simple delightful...', '[{"id":"D1","name":"User-First Always","statement":"..."},...]', true), ('pragmatic-thinkers', 'architecture', 'Architecture Operating Principles', 'Pragmatic System Thinkers', 'Architecture exists to enable business capability through technology...', '[{"id":"A1","name":"Simplicity Over Cleverness","statement":"..."},...]', true), ('data-storytellers', 'marketing', 'Marketing Operating Principles', 'Data-Driven Storytellers', 'Marketing exists to connect product value with customer need...', '[{"id":"M1","name":"Customer Truth","statement":"..."},...]', true);

6.2 Workspace Personality Configuration (NEW)

-- Workspace-level personality overrides
CREATE TABLE workspace_personalities (
  id              UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  workspace_id    UUID NOT NULL REFERENCES workspaces(id) ON DELETE CASCADE,
  team            VARCHAR(50) NOT NULL,         -- 'product', 'design', etc.
  personality_id  UUID NOT NULL REFERENCES team_personalities(id),
  created_at      TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  UNIQUE(workspace_id, team)
);-- RLS
ALTER TABLE workspace_personalities ENABLE ROW LEVEL SECURITY;
CREATE POLICY workspace_isolation ON workspace_personalities
  USING (workspace_id = current_setting('app.current_workspace_id')::uuid);
ALTER TABLE workspace_personalities FORCE ROW LEVEL SECURITY;

6.3 Knowledge Packs Table (Updated)

-- Knowledge packs with team attribution CREATE TABLE knowledge_packs ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), slug VARCHAR(50) NOT NULL UNIQUE, name VARCHAR(255) NOT NULL, description TEXT, team VARCHAR(50), -- 'product', 'design', 'architecture', 'marketing', NULL=cross-team content TEXT NOT NULL, -- Full markdown content primary_agents TEXT[] NOT NULL DEFAULT '{}', version VARCHAR(20) NOT NULL DEFAULT '1.0.0', token_count INTEGER, -- Pre-computed for budget enforcement updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW() );

-- Seed with 22 knowledge packs (9 OS + 13 Extension Teams) -- OS packs: prioritization, pricing-frameworks, discovery-methods, -- metrics-frameworks, competitive-frameworks, gtm-playbooks, -- stakeholder-management, user-research, financial-modeling -- Design packs: design-systems, user-research-methods, accessibility, interaction-patterns -- Architecture packs: api-design, data-architecture, security-patterns, cloud-native -- Marketing packs: content-strategy, seo-frameworks, analytics-methodology, -- brand-management, campaign-optimization

6.4 Agent Registry Table (NEW)

-- Unified registry of all 39 agents + 5 gateways
CREATE TABLE agent_registry (
  id              UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  agent_key       VARCHAR(100) NOT NULL UNIQUE,   -- 'product-manager', 'ui-designer', etc.
  emoji           VARCHAR(10) NOT NULL,            -- Agent emoji
  display_name    VARCHAR(255) NOT NULL,           -- "Product Manager"
  short_name      VARCHAR(50) NOT NULL,            -- "PM"
  team            VARCHAR(50) NOT NULL,            -- 'product', 'design', 'architecture', 'marketing'
  agent_type      VARCHAR(20) NOT NULL             -- 'agent', 'gateway'
                    CHECK (agent_type IN ('agent', 'gateway')),
  persona_template_id UUID REFERENCES prompt_templates(id),
  knowledge_packs TEXT[] NOT NULL DEFAULT '{}',     -- Slugs of applicable knowledge packs
  primary_skills  TEXT[] NOT NULL DEFAULT '{}',     -- Skills this agent primarily uses
  domain_routing  TEXT[] NOT NULL DEFAULT '{}',     -- Keywords for auto-routing
  is_active       BOOLEAN NOT NULL DEFAULT true,
  tier_required   VARCHAR(20) NOT NULL DEFAULT 'trial'  -- 'trial', 'individual', 'team', 'enterprise'
                    CHECK (tier_required IN ('trial', 'individual', 'team', 'enterprise')),
  created_at      TIMESTAMPTZ NOT NULL DEFAULT NOW()
);-- Indexes for routing
CREATE INDEX idx_agent_team ON agent_registry(team);
CREATE INDEX idx_agent_domain ON agent_registry USING GIN(domain_routing);
CREATE INDEX idx_agent_type ON agent_registry(agent_type);

6.5 Interaction History Extension (Updated for v3)

The agent_invocations table from the original data model already handles interaction logging. The v3 addition is the delegation pattern tracking:

-- Add delegation tracking to agent_invocations
ALTER TABLE agent_invocations
  ADD COLUMN delegation_pattern VARCHAR(20)
    CHECK (delegation_pattern IN ('consultation', 'delegation', 'review', 'debate')),
  ADD COLUMN delegation_from VARCHAR(100),  -- Parent agent key
  ADD COLUMN delegation_deliverable TEXT;   -- What was delegated

7. API Route Architecture

7.1 Route Map

app/
├── api/
│   ├── chat/
│   │   └── route.ts              maxDuration: 60   Single agent conversation
│   ├── plt/
│   │   └── route.ts              maxDuration: 120  PLT Meeting Mode
│   ├── gateway/
│   │   └── [gateway]/
│   │       └── route.ts          maxDuration: 120  Any gateway (@product, @design, etc.)
│   ├── skill/
│   │   └── [skill]/
│   │       └── route.ts          maxDuration: 30   Skill invocation
│   ├── context/
│   │   ├── save/route.ts                           /context-save
│   │   ├── recall/route.ts                         /context-recall
│   │   ├── portfolio/route.ts                      /portfolio-status
│   │   ├── feedback/
│   │   │   ├── capture/route.ts                    /feedback-capture
│   │   │   └── recall/route.ts                     /feedback-recall
│   │   └── graph/route.ts                          Cross-reference queries
│   ├── workspace/
│   │   ├── route.ts                                CRUD workspaces
│   │   ├── connect/
│   │   │   └── [provider]/route.ts                 OAuth flows (Google Drive, etc.)
│   │   └── files/
│   │       └── route.ts                            File browser
│   ├── agents/
│   │   └── route.ts                                List available agents
│   ├── keys/
│   │   └── route.ts                                BYOT key management
│   ├── usage/
│   │   └── route.ts                                Usage dashboard
│   └── webhooks/
│       ├── clerk/route.ts                          User sync
│       └── stripe/route.ts                         Billing events

7.2 Middleware Stack

// middleware.ts
import { clerkMiddleware } from '@clerk/nextjs/server';
import { NextResponse } from 'next/server';
export default clerkMiddleware(async (auth, req) => {
  // 1. Rate limiting (in-memory + Redis fallback)
  const rateLimitResult = await checkRateLimit(req);
  if (!rateLimitResult.allowed) {
    return NextResponse.json(
      { error: { code: 'rate_limited', message: 'Rate limit exceeded' } },
      { status: 429, headers: { 'Retry-After': rateLimitResult.retryAfter } }
    );
  }
  // 2. Workspace context injection
  const workspaceId = req.headers.get('X-Workspace-ID');
  if (workspaceId && req.nextUrl.pathname.startsWith('/api/')) {
    // Set PostgreSQL RLS context
    await setWorkspaceContext(db, workspaceId);
  }
  // 3. Distributed tracing
  const requestId = req.headers.get('X-Request-ID') || crypto.randomUUID();
  const response = NextResponse.next();
  response.headers.set('X-Request-ID', requestId);  return response;
});

7.3 SSE Event Contract

type SSEEvent =
  | { type: 'token'; data: { text: string } }
  | { type: 'tool_call_start'; data: { tool: string; id: string } }
  | { type: 'tool_call_result'; data: { id: string; result: unknown } }
  | { type: 'agent_start'; data: { agent: string; emoji: string; display_name: string } }
  | { type: 'agent_complete'; data: { agent: string; roi_minutes: number } }
  | { type: 'file_created'; data: { file_id: string; path: string; action: string } }
  | { type: 'context_saved'; data: { type: string; id: string } }
  | { type: 'error'; data: { code: string; message: string; recoverable: boolean } }
  | { type: 'done'; data: { tokens_in: number; tokens_out: number; duration_ms: number } };

8. Team Personalities Infrastructure

8.1 Design Decisions

Aspect	Decision	Rationale
Phase 1 UX	None — Infrastructure only	Reduce MVP scope, no settings UI needed
Data model	Separate `team_personalities` table	Clean separation, swappable at query time
Prompt injection	Appended to Layer 2 (Agent Persona)	Minimal token overhead (~100 tokens)
Defaults	One default personality per team	Works out of the box
Swapping	API layer supports it; UI deferred	Infrastructure ready for Phase 2 UX

8.2 How Personalities Flow Through the System

Agent Spawn Request
       │
       ▼
┌──────────────────────────────────────────────────────┐
│  1. Look up agent in agent_registry                  │
│     → Get team: 'product'                            │
│                                                      │
│  2. Look up workspace personality override            │
│     workspace_personalities WHERE team = 'product'   │
│     → If found: use override personality_id          │
│     → If not: use default from team_personalities    │
│                                                      │
│  3. Load personality principles                       │
│     team_personalities WHERE id = personality_id     │
│     → Get principles JSON array                      │
│                                                      │
│  4. Inject into Layer 2 of system prompt             │
│     Append to agent persona:                         │
│     "## Team Operating Principles                    │
│      Personality: Vision to Value Operators           │
│      P1: End-to-End Ownership — ...                  │
│      P2: Decision Quality — ..."                     │
└──────────────────────────────────────────────────────┘

8.3 API Endpoints (Infrastructure, No UX)

// Phase 1: Read-only API for personalities
// app/api/personalities/route.ts
export async function GET(req: Request) {
  // List all available personalities
  const personalities = await db.select()
    .from(schema.teamPersonalities)
    .orderBy(schema.teamPersonalities.team);
  return Response.json({ data: personalities });
}
// app/api/workspace/[id]/personality/route.ts
export async function GET(req: Request) {
  // Get current workspace personality config
  const config = await db.select()
    .from(schema.workspacePersonalities)
    .where(eq(schema.workspacePersonalities.workspaceId, workspaceId));
  return Response.json({ data: config });
}export async function PUT(req: Request) {
  // Set workspace personality (for future UI)
  const { team, personalityId } = await req.json();
  await db.insert(schema.workspacePersonalities)
    .values({ workspaceId, team, personalityId })
    .onConflictDoUpdate({ target: [schema.workspacePersonalities.workspaceId, schema.workspacePersonalities.team] });
  return Response.json({ data: { success: true } });
}

8.4 Prompt Injection Format

When a team personality is active, it adds ~100 tokens to the agent persona (Layer 2):

```markdown

Team Operating Principles

Personality: Vision to Value Operators

End-to-End Ownership — One person accountable from vision to value

Decision Quality — Structured decisions under pressure

Customer Obsession — Start with customer, trace back to feature

Strategic Clarity — Clear bets with explicit assumptions

Outcome Focus — Measure outcomes, not outputs

Collaborative Excellence — Right people, right input, right time

Continuous Learning — Every outcome teaches something

Scalable Systems — Processes that grow with the org

```

9. Knowledge Pack Loading Strategy

9.1 Loading Rules

Knowledge packs are loaded on-demand based on the agent and task:

// lib/knowledge-loader.ts
export async function loadKnowledgePacks(
  agentKey: string,
  detectedSkill?: string
): Promise {
  // Get agent's primary knowledge packs
  const agent = await db.select()
    .from(schema.agentRegistry)
    .where(eq(schema.agentRegistry.agentKey, agentKey))
    .limit(1);
  const packSlugs = agent[0]?.knowledgePacks || [];
  // Load packs from DB or R2 cache
  const packs = await db.select()
    .from(schema.knowledgePacks)
    .where(inArray(schema.knowledgePacks.slug, packSlugs));  // Budget enforcement: max 2 packs per invocation, prioritize by relevance
  const sorted = rankByRelevance(packs, detectedSkill);
  return sorted.slice(0, 2).map(p => p.content);
}

9.2 Pack Inventory

Pack	Team	Primary Agents	Est. Tokens
prioritization	product	@pm, @pm-dir	~1,200
pricing-frameworks	product	@bizops, @vp-product	~1,100
discovery-methods	product	@pm, @ux-lead	~1,000
metrics-frameworks	product	@bizops, @value-realization	~1,100
competitive-frameworks	product	@ci, @pmm-dir	~1,000
gtm-playbooks	product	@pmm, @pmm-dir	~1,200
stakeholder-management	product	@pm-dir, @prod-ops	~900
user-research	product	@ux-lead, @pm	~1,100
financial-modeling	product	@bizops, @bizdev	~1,000
design-systems	design	@ui-designer, @visual-designer	~1,000
user-research-methods	design	@user-researcher	~1,100
accessibility	design	@ui-designer	~800
interaction-patterns	design	@interaction-designer	~900
api-design	architecture	@api-architect	~1,000
data-architecture	architecture	@data-architect	~1,100
security-patterns	architecture	@security-architect	~900
cloud-native	architecture	@cloud-architect	~1,000
content-strategy	marketing	@content-strategist	~1,000
seo-frameworks	marketing	@seo-specialist	~900
analytics-methodology	marketing	@analytics-specialist	~1,000
brand-management	marketing	@brand-strategist	~900
campaign-optimization	marketing	@paid-media, @email-marketing	~1,000

Knowledge packs are NOT included in the cached system prompt (they'd break the cache key). Instead, they're loaded into the conversation context when the agent's task requires framework application.

10. Cloud Storage Abstraction

10.1 Provider Abstraction Layer

// lib/cloud-storage/interface.ts
export interface CloudStorageProvider {
  readFile(fileId: string): Promise;
  writeFile(path: string, content: string, parentFolderId: string): Promise;
  updateFile(fileId: string, content: string): Promise;
  deleteFile(fileId: string): Promise;
  listFiles(folderId: string, query?: string): Promise;
  getMetadata(fileId: string): Promise;
  resolvePathToId(path: string, rootFolderId: string): Promise;
  createFolder(name: string, parentId: string): Promise;
}
// lib/cloud-storage/google-drive.ts
export class GoogleDriveProvider implements CloudStorageProvider {
  constructor(private accessToken: string) {}
  async readFile(fileId: string): Promise {
    const response = await fetch(
      https://www.googleapis.com/drive/v3/files/${fileId}?alt=media,
      { headers: { Authorization: Bearer ${this.accessToken} } }
    );
    return response.text();
  }
  async writeFile(path: string, content: string, parentFolderId: string): Promise {
    const metadata = {
      name: path.split('/').pop(),
      parents: [parentFolderId],
      mimeType: 'text/markdown',
    };
    // Multipart upload
    const form = new FormData();
    form.append('metadata', new Blob([JSON.stringify(metadata)], { type: 'application/json' }));
    form.append('file', new Blob([content], { type: 'text/markdown' }));
    const response = await fetch(
      'https://www.googleapis.com/upload/drive/v3/files?uploadType=multipart',
      { method: 'POST', headers: { Authorization: Bearer ${this.accessToken} }, body: form }
    );
    const result = await response.json();
    return result.id;
  }  // ... other methods
}

10.2 Token Management

OAuth tokens are encrypted with envelope encryption and auto-refreshed:

// lib/cloud-storage/token-manager.ts
export async function getProviderToken(
  workspaceId: string,
  provider: string
): Promise {
  const integration = await db.select()
    .from(schema.connectedIntegrations)
    .where(and(
      eq(schema.connectedIntegrations.workspaceId, workspaceId),
      eq(schema.connectedIntegrations.provider, provider),
    ))
    .limit(1);
  if (!integration[0]) throw new Error(No ${provider} connection);
  const token = await db.select()
    .from(schema.integrationTokens)
    .where(eq(schema.integrationTokens.integrationId, integration[0].id))
    .limit(1);
  // Decrypt access token
  let accessToken = await decrypt(token[0].encryptedAccessToken, token[0].encryptedDek);
  // Check expiry and refresh if needed
  if (token[0].expiresAt && new Date(token[0].expiresAt) < new Date()) {
    const refreshToken = await decrypt(token[0].encryptedRefreshToken, token[0].encryptedDek);
    accessToken = await refreshOAuthToken(provider, refreshToken);
    await updateEncryptedToken(token[0].id, accessToken);
  }  return accessToken;
}

10.3 Workspace File Structure

Files in the user's cloud storage follow the OS context structure:

User's Google Drive/
└── Legionis Workspace/           ← User-selected folder
    ├── context/
    │   ├── decisions/                 ← DR-YYYY-NNN.md files
    │   ├── bets/                      ← SB-YYYY-NNN.md files
    │   ├── feedback/                  ← FB-YYYY-NNN.md files
    │   ├── learnings/                 ← L-NNN entries
    │   ├── portfolio/                 ← Active bets tracking
    │   └── documents/                 ← Auto-registered deliverables
    ├── deliverables/                  ← PRDs, roadmaps, analyses
    └── .workspace.json                ← Workspace metadata

11. Caching Strategy

11.1 Multi-Level Cache Architecture

┌─────────────────────────────────────────────────────────────────┐
│                    CACHE HIERARCHY                               │
│                                                                  │
│  Level 1: Anthropic API Prompt Cache (5 min TTL)                │
│  ─ System prompt layers cached by Anthropic                      │
│  ─ 90% cost reduction on cache hits                              │
│  ─ Requires exact prefix match                                   │
│                                                                  │
│  Level 2: In-Memory Compiled Prompts (per Vercel instance)      │
│  ─ Compiled core protocol and agent personas                     │
│  ─ Refreshed on version update or cold start                     │
│  ─ Near-zero latency                                             │
│                                                                  │
│  Level 3: R2 Object Cache (persistent)                           │
│  ─ Compiled prompt JSON                                          │
│  ─ Knowledge pack content                                        │
│  ─ Updated via build pipeline (compile-prompts.ts)               │
│                                                                  │
│  Level 4: PostgreSQL (source of truth)                           │
│  ─ prompt_templates table                                        │
│  ─ knowledge_packs table                                         │
│  ─ team_personalities table                                      │
└─────────────────────────────────────────────────────────────────┘

11.2 Cache Invalidation Strategy

Trigger	Action
OS version update	Re-run compile-prompts.ts, update R2, invalidate L2
Agent persona edit (A/B test)	Update prompt_templates, invalidate L2 for that agent
Team personality change	Update workspace_personalities, invalidate L2 for team agents
Knowledge pack update	Update knowledge_packs, no prompt cache impact (not in system prompt)

12. MCP Integration Approach for SaaS

12.1 CLI vs Cloud Integration Model

Aspect	CLI (MCP)	Cloud (OAuth)
Protocol	MCP (stdio/SSE)	OAuth 2.0 + REST API
Authentication	API keys in env vars	Encrypted tokens in DB
Availability	User configures locally	User connects via OAuth flow
Tool registration	`.mcp.json` config file	`connected_integrations` table
Runtime detection	Check available tool list	Query integrations for workspace

12.2 Supported Integrations (Cloud)

Integration	API	OAuth Scopes	Agent Use Cases
Google Drive	Drive API v3	`drive.file`	File read/write (primary storage)
Jira	Jira REST API v3	`read:jira-work`, `write:jira-work`	Create issues from user stories
Slack	Slack Web API	`chat:write`, `channels:read`	Post updates, share decisions
GitHub	GitHub REST/GraphQL	`repo`, `issues`	Link commits to features
Linear	Linear API	`issues:read`, `issues:write`	Project management sync

12.3 Graceful Degradation

Agents detect available integrations at runtime. If a tool is not connected, agents produce text output with actionable "Next Steps (Manual)" sections, exactly as the MCP integration framework specifies.

13. Security Architecture

13.1 Security Layers

┌─────────────────────────────────────────────────────────┐
│  Layer 1: Authentication (Clerk)                         │
│  ─ JWT validation on every request                       │
│  ─ Session management with refresh                       │
│  ─ Social login + email/password                         │
├─────────────────────────────────────────────────────────┤
│  Layer 2: Authorization (RLS + Middleware)                │
│  ─ PostgreSQL RLS on all workspace-scoped tables         │
│  ─ Middleware sets workspace context per request          │
│  ─ Tier-based feature gating                             │
├─────────────────────────────────────────────────────────┤
│  Layer 3: Data Encryption                                │
│  ─ API keys: AES-256-GCM + envelope encryption (KMS)    │
│  ─ OAuth tokens: Same envelope encryption                │
│  ─ Data at rest: Neon TLS + encryption                   │
│  ─ Data in transit: TLS 1.3 everywhere                   │
├─────────────────────────────────────────────────────────┤
│  Layer 4: Prompt Security                                │
│  ─ Injection detection (pattern matching)                │
│  ─ User content sandboxing (XML boundary markers)        │
│  ─ Output validation (system prompt leak detection)      │
│  ─ Rate limiting on suspected injection probing          │
├─────────────────────────────────────────────────────────┤
│  Layer 5: Audit & Monitoring                             │
│  ─ All agent invocations logged with trace context       │
│  ─ All file operations logged                            │
│  ─ Security events in Sentry                             │
│  ─ Rate limit violations tracked                         │
└─────────────────────────────────────────────────────────┘

13.2 No Bash in Cloud

The CLI's Bash tool is removed in cloud mode. Agents cannot execute arbitrary shell commands. All file operations go through the cloud storage abstraction layer, which enforces:

Workspace scoping (agents can only access their workspace folder)
File type validation
Content size limits
Rate limiting on file operations

14. Agent Roster (81 Agents + 11 Gateways)

14.1 Product Org OS Agents (13)

Agent Key	Emoji	Display Name	Team
product-manager	📝	Product Manager	product
cpo	👑	Chief Product Officer	product
vp-product	📈	VP Product	product
director-product-management	📋	Director of Product Management	product
director-product-marketing	📣	Director of Product Marketing	product
product-marketing-manager	🎯	Product Marketing Manager	product
product-mentor	🎓	Product Mentor	product
bizops	🧮	BizOps	product
bizdev	🤝	Business Development	product
competitive-intelligence	🔭	Competitive Intelligence	product
product-operations	⚙️	Product Operations	product
ux-lead	🎨	UX Lead	product
value-realization	💰	Value Realization	product

14.2 Extension Team Agents (26)

Design Team (6):

Agent Key	Emoji	Display Name
design-dir	🎨	Director of Design
ui-designer	🖼️	UI Designer
visual-designer	🎨	Visual Designer
interaction-designer	🔄	Interaction Designer
user-researcher	🔍	User Researcher
motion-designer	🎬	Motion Designer

Architecture Team (6):

Agent Key	Emoji	Display Name
architecture-dir	🏗️	Chief Architect
api-architect	🔌	API Architect
data-architect	🗄️	Data Architect
security-architect	🔒	Security Architect
cloud-architect	☁️	Cloud Architect
ai-architect	🧠	AI/ML Architect

Marketing Team (14):

Agent Key	Emoji	Display Name
marketing-dir	📢	Director of Marketing
content-strategist	✍️	Content Strategist
copywriter	📄	Copywriter
seo-specialist	🔍	SEO Specialist
cro-specialist	📊	CRO Specialist
paid-media	💰	Paid Media Specialist
email-marketing	📧	Email Marketing Specialist
social-media	📱	Social Media Manager
growth-hacker	🚀	Growth Hacker
market-researcher	📈	Market Researcher
video-producer	🎥	Video Producer
pr-specialist	📰	PR Specialist
brand-strategist	🏷️	Brand Strategist
analytics-specialist	📊	Analytics Specialist

14.3 Gateways (5)

Gateway Key	Emoji	Display Name	Behavior
product	🏛️	Product Gateway	Routes to relevant owners, orchestrates execution
product-leadership-team	👥	PLT	Meeting Mode with multiple leadership perspectives
design	🎨	Design Gateway	Routes to design specialists
architecture	🏗️	Architecture Gateway	Routes to architecture specialists
marketing	📢	Marketing Gateway	Routes to marketing specialists

15. Distributed Tracing

15.1 Trace Structure

Request ID: req_abc123
│
├── span_001: PLT Gateway (gateway:plt)
│   ├── span_002: VP Product (agent:vp-product)
│   │   └── span_003: BizOps (sub-agent, consultation)
│   ├── span_004: Dir PM (agent:director-product-management)
│   ├── span_005: Dir PMM (agent:director-product-marketing)
│   └── span_006: ProdOps (agent:product-operations)
│
└── Post-processing: ROI calculation, interaction logging

15.2 Implementation

Every request gets a trace context that propagates through sub-agent spawns:

// lib/tracing.ts
export interface TraceContext {
  requestId: string;
  spanId: string;
  parentSpanId?: string;
  userId: string;
  workspaceId: string;
  operation: string;
  startedAt: string;
  depth: number;
}
export function createTraceContext(req: Request, operation: string): TraceContext {
  return {
    requestId: req.headers.get('X-Request-ID') || crypto.randomUUID(),
    spanId: crypto.randomUUID(),
    parentSpanId: req.headers.get('X-Parent-Span-ID'),
    userId: auth().userId!,
    workspaceId: req.headers.get('X-Workspace-ID')!,
    operation,
    startedAt: new Date().toISOString(),
    depth: 0,
  };
}export function childSpan(parent: TraceContext, operation: string): TraceContext {
  return {
    ...parent,
    spanId: crypto.randomUUID(),
    parentSpanId: parent.spanId,
    operation,
    startedAt: new Date().toISOString(),
    depth: parent.depth + 1,
  };
}

16. Vercel Deployment Configuration

16.1 vercel.json

{
  "$schema": "https://openapi.vercel.sh/vercel.json",
  "fluid": true,
  "regions": ["iad1"],
  "functions": {
    "app/api/plt/**": { "maxDuration": 120 },
    "app/api/gateway/**": { "maxDuration": 120 },
    "app/api/chat/**": { "maxDuration": 60 },
    "app/api/skill/**": { "maxDuration": 30 }
  }
}

16.2 Environment Variables

Database
DATABASE_URL=postgresql://...@ep-xxx.us-east-2.aws.neon.tech/neondb?sslmode=require
Auth
NEXT_PUBLIC_CLERK_PUBLISHABLE_KEY=pk_live_...
CLERK_SECRET_KEY=sk_live_...
Billing
STRIPE_SECRET_KEY=sk_live_...
STRIPE_WEBHOOK_SECRET=whsec_...
Storage
R2_ACCOUNT_ID=...
R2_ACCESS_KEY_ID=...
R2_SECRET_ACCESS_KEY=...
R2_BUCKET_NAME=project-saas-prompts
Search
TYPESENSE_API_KEY=...
TYPESENSE_HOST=xxx.typesense.net
Monitoring
SENTRY_DSN=https://...@sentry.io/...
NEXT_PUBLIC_POSTHOG_KEY=phc_...
BETTERSTACK_SOURCE_TOKEN=...
Encryption
KMS_KEY_ARN=arn:aws:kms:us-east-1:...
Cloud Storage OAuth
GOOGLE_CLIENT_ID=...
GOOGLE_CLIENT_SECRET=...

17. Entity Relationship Diagram (Updated for v3)

users 1─────M workspace_memberships M──────1 workspaces │ │ │ ┌────────┼─────────────────────────┐ │ │ │ │ └──M user_api_keys decisions strategic_bets feedback │ │ │ │ │ │ │ │ │ ┌──────┼────────────┼──────────────┼──────┐ │ │ │ │ cross_references feedback_themes │ │ assumptions learnings documents feedback_theme_links │ portfolio_state workspaces ──M connected_integrations ──1 integration_tokens workspaces ──M conversations ──M messages workspaces ──M agent_invocations workspaces ──M usage_events workspaces ──M roi_sessions workspaces ──M workspace_personalities ──1 team_personalities (NEW)

agent_registry (global) ──1 prompt_templates (global) knowledge_packs (global) team_personalities (global)

18. Performance Targets

Metric	Target	Measurement
Single agent response (p50)	<10s	Time to last token
Single agent response (p95)	<30s	Time to last token
PLT session (p50)	<30s	Time to formatted response
PLT session (p95)	<60s	Time to formatted response
Skill invocation (p50)	<5s	Time to completion
Context recall query	<500ms	Database query + format
Auto-context injection	<200ms	Topic extraction + query
Time to first token (streaming)	<3s	First SSE event
Cache hit rate (Anthropic)	>80%	Provider metadata tracking

19. MVP Scope Checklist

Feature	Status	Notes
61 skills	MVP	All skills available
39 agents (13 OS + 26 Extension)	MVP	Full roster
5 gateways	MVP	@product, @plt, @design, @architecture, @marketing
PLT Meeting Mode	MVP	Parallel agents + synthesis
Delegation Protocol (4 patterns)	MVP	Consultation, Delegation, Review, Debate
Context layer (PostgreSQL)	MVP	All context tables
Cross-reference graph	MVP	Relational implementation
Auto-context injection	MVP	Topic-based, database-backed
Team Personalities (infrastructure)	MVP	Data model + API, no UX
Knowledge packs (22)	MVP	Loaded from DB/R2
Google Drive integration	MVP	OAuth + file tools
BYOT (Claude + OpenAI)	MVP	Per-request key routing
System prompt caching	MVP	3-layer with Anthropic cache
Interaction logging	MVP	agent_invocations table
Distributed tracing	MVP	Request ID propagation
Prompt versioning	MVP	Database + feature flags
SSE streaming	MVP	Token-by-token + agent events
$10/mo individual, $8/seat team pricing	MVP	Stripe integration (1-month trial, no free tier)
OneDrive/Dropbox	Growth	Phase 2
Team collaboration	Enterprise	Phase 3
Hybrid CLI/Cloud sync	Enterprise	Phase 3

20. Risk Matrix

Risk	Likelihood	Impact	Mitigation
PLT exceeds 60s p95	Medium	Low	Cap maxSteps=5; 300s Fluid Compute buffer
Prompt compression degrades quality	Medium	Medium	A/B test; keep originals; tune iteratively
Google OAuth verification delayed	Medium	High	Apply early; use test mode for beta
BYOT key abuse (shared keys)	Low	Medium	Rate limit per key; abuse detection
Cloud storage API latency spikes	Low	Medium	Per-tool timeouts; circuit breaker
Cache hit rate below 70%	Low	Medium	Monitor; extend to 1-hour TTL
Multi-agent token costs surprise users	Medium	Low	Cost estimator in UI; model routing

Document Status: Active (v3.1 — definitive architecture + conceptual platform model) Last Updated: 2026-02-18 Gate Owner: Chief Architect Next Review: Pre-development kickoff

v3.1 Change Log: Added Section 0 (Conceptual Platform Architecture) with 4-layer model, defensibility assessment, compounding flywheel, and 3 external connections. Maps conceptual layers to implementation sections. Sourced from Platform Architecture Deck (23-slide presentation, Feb 2026).