Document Version: 3.1 Date: 2026-02-18 Owner: Chief Architect (Architecture Team) Status: Active — Definitive Architecture Document Product: Legionis
Incorporates: OS v3.0.0, Extension Teams v3, Vercel AI SDK decision, Vercel Fluid Compute analysis, System Prompt Architecture deep dive, Team Personalities infrastructure, PLT scope expansion decision, Platform Architecture Deck (4-layer model, defensibility, flywheel)
The platform is organized into four conceptual layers. Each layer is independently valuable; together they form a compounding system. This model governs how we talk about the architecture externally (positioning, sales, investor conversations) and how we reason about defensibility internally.
┌─────────────────────────────────────────────────────────────────┐
│ LAYER 4: AGENT LAYER │
│ 81 specialists | 10 departments | Modular provisioning │
│ Cross-department intelligence │
│ Defensibility: MEDIUM (deep SKILL.md, knowledge packs) │
├─────────────────────────────────────────────────────────────────┤
│ LAYER 3: COMMUNICATION LAYER ★ KEY DIFFERENTIATOR │
│ Intelligent routing | Agent-to-agent cascade │
│ Cooperation profiles | Invisible presentation │
│ Conversation context continuity │
│ "The nervous system of the AI workforce" │
│ Defensibility: HIGH (no competitor has this) │
├─────────────────────────────────────────────────────────────────┤
│ LAYER 2: CONTEXT LAYER │
│ Decisions & strategic bets | Feedback loop │
│ Cross-reference graph | Auto-context injection │
│ Stored in USER's cloud | Compounding value │
│ Month 1: Useful → Month 3: Valuable → Month 6+: Irreplaceable │
│ Defensibility: VERY HIGH (organic switching cost) │
├─────────────────────────────────────────────────────────────────┤
│ LAYER 1: COMPUTE LAYER │
│ BYOT (zero markup) | Quality Toggle (Haiku/Sonnet/Opus) │
│ Managed Tokens (15% prepaid Token Banks per DR-2026-004) │
│ Defensibility: LOW (commoditized by design) │
└───────────┬──────────────────┬──────────────────┬───────────────┘
│ │ │
▼ ▼ ▼
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ LLM Providers │ │Cloud Storage │ │Cloud Services│
│ Anthropic │ │Google Drive │ │Stripe, Clerk │
│ OpenAI │ │OneDrive │ │Neon, R2 │
│ (user's keys) │ │Dropbox │ │Typesense │
└──────────────┘ └──────────────┘ └──────────────┘
| Layer | Role | Maps To (Implementation) |
|---|---|---|
| Agent | 81 specialists across 10 departments with deep SKILL.md files, knowledge packs, and team personalities. Modular provisioning lets users assemble exactly the team they need. Cross-department intelligence means agents from different teams can be invoked together. | Sections 2 (Agent Runtime), 3 (Prompt Compilation), 8 (Team Personalities), 9 (Knowledge Packs), 14 (Agent Roster) |
| Communication | The nervous system. Intelligent routing sends requests to the right agent. Agent-to-agent cascade enables delegation (consultation, delegation, review, debate). Cooperation profiles define how agents interact. Invisible presentation means the user sees a unified team, not plumbing. Conversation context continuity ensures agents share the thread. | Sections 4 (Gateway Orchestration), 2.4 (Sub-Agent Spawning), 7 (API Routes) |
| Context | Organizational memory that compounds. Decisions, strategic bets, assumptions, feedback, and learnings are stored in the user's cloud and cross-referenced. Auto-context injection means agents automatically recall relevant history before producing deliverables. | Sections 5 (Context Layer), 10 (Cloud Storage) |
| Compute | The transparent foundation. BYOT means users bring their own API keys with zero markup. Quality Toggle lets users choose cost/quality tradeoff (Haiku for speed, Sonnet for balance, Opus for depth). Managed Tokens via 15% prepaid Token Banks (DR-2026-004) provide a convenience option. | Sections 2.2 (BYOT Routing), 1.2 (Technology Stack) |
The platform connects to three categories of external services:
| Connection | Services | Layer | User Owns? |
|---|---|---|---|
| LLM Providers | Anthropic, OpenAI (more via Vercel AI SDK) | Compute | Yes (BYOT keys) |
| Cloud Storage | Google Drive (MVP), OneDrive, Dropbox (Growth) | Context | Yes (their cloud account) |
| Cloud Services | Stripe, Clerk, Neon, R2, Typesense, Sentry, PostHog | All layers | No (platform infrastructure) |
| Layer | Defensibility | Rationale |
|---|---|---|
| Compute | Low | Commoditized by design. BYOT and cloud storage are transparency features, not moats. We chose trust over lock-in. |
| Agent | Medium | 81 agents with deep SKILL.md files (300-440 lines each), 34 knowledge packs, and team personalities. Significant effort to replicate, but ultimately copyable given time. |
| Communication | High | Intelligent routing, agent-to-agent cascade, cooperation profiles, and invisible presentation. No competitor has built this. "Other platforms give you 10 specialists in 10 separate rooms. Legionis puts them in the same room." |
| Context | Very High | Organizational memory compounds organically through usage. After 3+ months of decisions, bets, feedback, and learnings, switching cost is earned, not engineered. The cross-reference graph creates intelligence that is unique to each customer's org. |
Core thesis: "Intelligence is a commodity. Coordination is the moat."
The four layers create a reinforcing cycle:
User starts with one team (Agent Layer)
→ Agents collaborate via Communication Layer
→ Deliverables save to user's cloud (Context Layer)
→ Context accumulates across interactions
→ Trust deepens (they own everything)
→ Pay only what they use via BYOT (Compute Layer)
→ User adds more teams → cycle deepens
Each turn of the flywheel increases the value of all layers. The longer a user stays, the more irreplaceable the Context Layer becomes, and the more valuable the Communication Layer's ability to leverage that context across agents.
┌─────────────────────────────────────────────────────────────────────────────┐
│ CLIENT (Browser) │
│ Next.js 14+ App Router │ Tailwind + Radix │ Zustand + TanStack Query │
│ Tiptap Editor │ EventSource (SSE) │ Meeting Mode Renderer │
└────────────────────────────────┬────────────────────────────────────────────┘
│ HTTPS / SSE
▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ VERCEL (Next.js API Routes) │
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ /api/chat │ │ /api/plt │ │ /api/gateway │ │ /api/skill │ │
│ │ maxDur: 60s │ │ maxDur: 120s │ │ maxDur: 120s │ │ maxDur: 30s │ │
│ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │
│ │ │ │ │ │
│ ▼ ▼ ▼ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ AGENT RUNTIME ENGINE │ │
│ │ │ │
│ │ ┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐ │ │
│ │ │ Prompt Compiler │ │ Tool Registry │ │ Delegation │ │ │
│ │ │ (3-layer cached) │ │ (Cloud-safe) │ │ Engine │ │ │
│ │ └──────────────────┘ └──────────────────┘ └──────────────────┘ │ │
│ │ │ │
│ │ ┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐ │ │
│ │ │ Vercel AI SDK │ │ BYOT Key Router │ │ Auto-Context │ │ │
│ │ │ generateText() │ │ Per-request keys │ │ Injector │ │ │
│ │ │ streamText() │ │ 24+ providers │ │ (DB-backed) │ │ │
│ │ └──────────────────┘ └──────────────────┘ └──────────────────┘ │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Clerk Auth │ │ Stripe │ │ Sentry │ │ PostHog │ │
│ │ JWT + Orgs │ │ Billing │ │ Errors │ │ Analytics │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ │
└────────────────────────────────────┬────────────────────────────────────────┘
│
┌───────────────────┼───────────────────┐
│ │ │
▼ ▼ ▼
┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐
│ Neon PostgreSQL │ │ Cloudflare R2 │ │ Typesense │
│ (Context layer) │ │ (Prompts/files) │ │ (Full-text) │
│ RLS per tenant │ │ Zero egress │ │ (Search) │
└──────────────────┘ └──────────────────┘ └──────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────────┐
│ CLOUD STORAGE (User's Data) │
│ Google Drive API v3 │ OneDrive Graph API │ Dropbox API v2 │
│ (MVP) │ (Growth) │ (Growth) │
└──────────────────────────────────────────────────────────────────┘
| Layer | Technology | Version | Rationale |
|---|---|---|---|
| Frontend | Next.js (App Router) | 14+ | SSR, streaming, Vercel-native |
| Styling | Tailwind CSS + Radix UI | 4.x | Utility-first, accessible |
| State | Zustand + TanStack Query | Latest | Minimal client, smart server |
| Editor | Tiptap (ProseMirror) | Latest | Markdown, collaborative-ready |
| Agent SDK | Vercel AI SDK (ai) | 6.x | Pure API, multi-provider, 7.87M/wk |
| Providers | @ai-sdk/anthropic + @ai-sdk/openai | Latest | Claude primary, OpenAI secondary |
| Database | Neon PostgreSQL | 16 | Serverless, branching, scale-to-zero |
| ORM | Drizzle | Latest | Type-safe, minimal abstraction |
| Storage | Cloudflare R2 | N/A | S3-compatible, zero egress |
| Search | Typesense Cloud | Latest | Typo-tolerant, faceted |
| Auth | Clerk | Latest | Pre-built UI, social login, orgs |
| Payments | Stripe Billing | Latest | Usage-based, checkout, portal |
| Hosting | Vercel (Fluid Compute) | Latest | Next.js native, 300s default timeout |
| Monitoring | Sentry + Better Stack | Latest | Errors + logs + uptime |
| Analytics | PostHog | Free tier | Product analytics, feature flags |
| Service | Plan | Monthly Cost |
|---|---|---|
| Vercel Pro | 1 seat | $20 |
| Neon Launch | PostgreSQL | $19 |
| Cloudflare R2 | ~10GB | $0.15 |
| Typesense Starter | 0.5GB RAM | $40 |
| Clerk Pro | 10K MAU free | $25 |
| Sentry Team | Error tracking | $29 |
| PostHog | Free tier | $0 |
| Better Stack | Starter | $29 |
| Total | ~$162/mo |
AI costs: $0 (users bring their own API keys)
The agent runtime is built on Vercel AI SDK v6.x. Each agent interaction is a generateText() or streamText() call with custom tools and system prompts.
// lib/agent-runtime.ts
import { generateText, streamText, tool } from 'ai';
import { anthropic } from '@ai-sdk/anthropic';
import { openai } from '@ai-sdk/openai';export async function invokeAgent(params: AgentInvocation): Promise {
const { agentKey, userMessage, workspaceId, apiKey, provider } = params;
// 1. Compile system prompt (3-layer cached)
const systemPrompt = await compilePrompt(agentKey, workspaceId, userMessage);
// 2. Select model based on user's provider + key
const model = selectModel(provider, apiKey);
// 3. Load cloud-safe tools for this agent
const tools = loadTools(agentKey, workspaceId);
// 4. Execute agent
const result = await generateText({
model,
system: systemPrompt,
tools,
maxSteps: 10,
prompt: userMessage,
providerOptions: {
anthropic: { cacheControl: { type: 'ephemeral' } },
},
onStepFinish({ text, toolCalls, toolResults, usage }) {
// Track per-step metrics
trackStepMetrics(params.traceContext, { toolCalls, usage });
},
});
// 5. Post-processing
await postProcess(result, params);
return formatResult(result);
}
// lib/model-router.ts
function selectModel(provider: string, apiKey: string) {
switch (provider) {
case 'anthropic':
return anthropic({ apiKey })('claude-sonnet-4-5');
case 'openai':
return openai({ apiKey })('gpt-4o');
default:
throw new Error(Unsupported provider: ${provider});
}
}
Users configure their API keys per-provider. The key is decrypted at runtime from the user_api_keys table (envelope encryption with KMS) and passed per-request. This means:
Six custom tools replace the CLI's local filesystem operations:
// lib/tools/index.ts
import { tool } from 'ai';
import { z } from 'zod';export function createAgentTools(workspace: Workspace) {
return {
readFile: tool({
description: 'Read a file from the workspace cloud storage',
parameters: z.object({
filePath: z.string().describe('Relative path within workspace'),
fileId: z.string().optional().describe('Cloud storage file ID'),
}),
execute: async ({ filePath, fileId }) => {
const id = fileId || await resolvePathToId(filePath, workspace);
return await cloudStorage.readFile(id, workspace);
},
}),
writeFile: tool({
description: 'Write a file to the workspace cloud storage',
parameters: z.object({
filePath: z.string(),
content: z.string(),
}),
execute: async ({ filePath, content }) => {
const fileId = await cloudStorage.upsertFile(filePath, content, workspace);
await searchIndex.upsertDocument(fileId, content, workspace.id);
return { fileId, path: filePath };
},
}),
editFile: tool({
description: 'Make string replacements in a workspace file',
parameters: z.object({
filePath: z.string(),
oldString: z.string(),
newString: z.string(),
}),
execute: async ({ filePath, oldString, newString }) => {
const content = await cloudStorage.readFile(filePath, workspace);
const updated = content.replace(oldString, newString);
return await cloudStorage.updateFile(filePath, updated, workspace);
},
}),
globFiles: tool({
description: 'Find files matching a pattern in the workspace',
parameters: z.object({
pattern: z.string(),
}),
execute: async ({ pattern }) => {
return await cloudStorage.listFiles(pattern, workspace);
},
}),
grepContent: tool({
description: 'Search file contents in the workspace',
parameters: z.object({
query: z.string(),
filePattern: z.string().optional(),
}),
execute: async ({ query, filePattern }) => {
return await typesense.search(query, workspace.id, filePattern);
},
}),
spawnSubAgent: tool({
description: 'Spawn a sub-agent for specialized input',
parameters: z.object({
agentId: z.string(),
task: z.string(),
delegationPattern: z.enum([
'consultation', 'delegation', 'review', 'debate'
]).optional(),
}),
execute: async ({ agentId, task, delegationPattern }, context) => {
return await spawnSubAgent({
agentId,
task,
pattern: delegationPattern || 'consultation',
parentContext: context,
workspace,
});
},
}),
};
}
The OS v3 delegation protocol maps directly to the SaaS runtime:
// lib/delegation.ts
async function spawnSubAgent(params: SubAgentParams): Promise {
const { agentId, task, pattern, parentContext, workspace } = params; // Enforce depth limit (max 2 levels)
if (parentContext.depth >= 2) {
return { error: 'Max sub-agent depth reached. Provide analysis inline.' };
}
// Load sub-agent persona
const persona = await loadAgentPersona(agentId);
// Build system prompt with delegation context
const systemPrompt = await compilePrompt(agentId, workspace.id, task, {
delegationPattern: pattern,
parentAgent: parentContext.agentKey,
});
// Execute sub-agent
const result = await generateText({
model: selectModel(parentContext.provider, parentContext.apiKey),
system: systemPrompt,
tools: createAgentTools(workspace),
maxSteps: 5, // Sub-agents get fewer steps
prompt: buildDelegationPrompt(pattern, task),
});
// Log sub-agent invocation with parent trace
await logInvocation({
type: 'agent',
agentOrSkill: agentId,
parentSpanId: parentContext.spanId,
requestId: parentContext.requestId,
...result.usage,
});
return { response: result.text, agentId };
}
function buildDelegationPrompt(pattern: string, task: string): string {
const prefixes: Record = {
consultation: task,
delegation: [DELEGATION] ${task},
review: [REVIEW] ${task},
debate: [DEBATE] ${task},
};
return prefixes[pattern] || task;
}
The system prompt is structured for maximum cache efficiency with Anthropic's prompt caching:
┌──────────────────────────────────────────────────────────────┐
│ Layer 1: CORE PROTOCOL (~1,500 tokens) │
│ - Always cached (cache_control: ephemeral) │
│ - Compiled from 10 core rules into single document │
│ - Identical across all agents │
│ - Changes only on OS version updates │
├──────────────────────────────────────────────────────────────┤
│ Layer 2: AGENT PERSONA (~500 tokens) │
│ - Cached per agent type per session │
│ - Extracted from SKILL.md identity sections │
│ - Includes team personality injection point │
│ - Changes only on agent definition updates │
├──────────────────────────────────────────────────────────────┤
│ Layer 3: TASK CONTEXT (~200-500 tokens) │
│ - Domain rules (conditional on skill being invoked) │
│ - Auto-injected context (decisions, feedback, bets) │
│ - User's org context (company name, product areas) │
│ - Team personality principles (if configured) │
│ - Changes per request │
└──────────────────────────────────────────────────────────────┘Total per call: ~2,200-2,500 tokens (vs. ~19,650 naive)
Cost with caching: $0.71/mo per user (Sonnet) vs $35/mo naive
The build pipeline reads canonical SKILL.md files and compiles them for SaaS deployment. This follows the "one source, two build targets" principle: the same SKILL.md files serve both CLI (full content) and SaaS (extracted/compressed).
// scripts/compile-prompts.ts
// Run at build time or on OS version updateimport { readFile, writeFile } from 'fs/promises';
import { glob } from 'fast-glob';
interface CompiledPrompt {
agentKey: string;
layer1CoreProtocol: string; // Shared across all agents
layer2Persona: string; // Per-agent identity
domainRules: Record; // Loaded conditionally
metadata: AgentMetadata;
}
// Step 1: Compile Core Protocol from 10 Tier 1 rules
async function compileCoreProtocol(): Promise {
const rules = [
'agent-spawn-protocol.md', // Response format, identity
'no-estimates.md', // No fabricated numbers
'v2v-flow.md', // 6-phase summary
'context-management.md', // Save/recall/capture
'intelligent-routing.md', // Domain routing
'delegation-protocol.md', // 4 delegation patterns
'principles-enforcement.md', // 8 operating principles
'meeting-mode.md', // Multi-agent presentation
'parallel-execution.md', // When to parallelize
'skill-awareness.md', // Omitted (covered by persona)
];
// Read and extract essential sections from each rule
// Compile into a single ~1,500 token document
const sections = await Promise.all(
rules.map(r => extractEssentials(rules/${r}))
);
return ## Agent Operating Protocol\n\n${sections.join('\n\n')};
}
// Step 2: Extract agent persona from SKILL.md
async function extractPersona(skillPath: string): Promise {
const content = await readFile(skillPath, 'utf-8');
const parsed = parseSkillMd(content);
// Extract ONLY identity-essential sections (~80-100 lines → ~500 tokens)
return [
# ${parsed.emoji} ${parsed.displayName},
'',
## Identity,
parsed.coreAccountability,
'',
## How I Think,
parsed.howIThink, // 3-5 bullets, unique per agent
'',
## RACI,
parsed.raci, // A/R/C items
'',
## Key Deliverables,
parsed.deliverables, // Table of 4-5 items
'',
## Collaboration,
parsed.collaboration, // 3-4 key relationships
'',
## Primary Skills,
parsed.skills, // 5-7 skills with when-to-use
'',
## V2V Phase,
parsed.primaryPhases,
].join('\n');
}
// Step 3: Compile domain rules (Tier 2)
async function compileDomainRules(): Promise> {
return {
decisions: await condense('rules/decision-system.md', 300),
strategy: await condense('rules/strategy-documents.md', 300),
roadmaps: await condense('rules/roadmaps.md', 300),
gtm: await condense('rules/gtm-documents.md', 300),
requirements: await condense('rules/requirements.md', 300),
context: await condense('rules/auto-context.md', 200)
+ '\n' + await condense('rules/context-graph.md', 200),
};
}
// Step 4: Main compilation
async function main() {
const coreProtocol = await compileCoreProtocol();
// All OS agents
const osAgentPaths = await glob('skills/*/SKILL.md');
const osPersonas = await Promise.all(
osAgentPaths.map(async path => ({
key: extractAgentKey(path),
persona: await extractPersona(path),
metadata: await extractMetadata(path),
}))
);
// All Extension Team agents
const extAgentPaths = await glob('Extension Teams/*/SKILL.md');
const extPersonas = await Promise.all(
extAgentPaths.map(async path => ({
key: extractAgentKey(path),
persona: await extractPersona(path),
metadata: await extractMetadata(path),
}))
);
const domainRules = await compileDomainRules();
// Output compiled prompts for SaaS deployment
const output: CompiledPrompts = {
coreProtocol,
agents: [...osPersonas, ...extPersonas],
domainRules,
version: getOsVersion(),
compiledAt: new Date().toISOString(),
};
// Write to R2-deployable format
await writeFile('compiled/prompts.json', JSON.stringify(output, null, 2));
// Also write SQL migration for prompt_templates table
await generatePromptMigration(output);
console.log(Compiled ${output.agents.length} agents);
console.log(Core protocol: ${countTokens(coreProtocol)} tokens);
console.log(Domain rules: ${Object.keys(domainRules).length} domains);
}
// lib/prompt-compiler.ts
import { getCompiledPrompts } from './compiled-prompts';const promptCache = new Map();
export async function compilePrompt(
agentKey: string,
workspaceId: string,
userMessage: string,
options?: {
delegationPattern?: string;
parentAgent?: string;
teamPersonality?: TeamPersonality;
}
): Promise {
const compiled = getCompiledPrompts();
const messages: SystemMessage[] = [];
// Layer 1: Core Protocol (always cached, identical across agents)
messages.push({
role: 'system',
content: compiled.coreProtocol,
providerOptions: {
anthropic: { cacheControl: { type: 'ephemeral' } },
},
});
// Layer 2: Agent Persona (cached per agent type)
const agent = compiled.agents.find(a => a.key === agentKey);
if (!agent) throw new Error(Unknown agent: ${agentKey});
let personaContent = agent.persona;
// Inject team personality if configured (see Section 8)
if (options?.teamPersonality) {
personaContent += \n\n## Team Operating Principles\n${options.teamPersonality.principles};
}
messages.push({
role: 'system',
content: personaContent,
providerOptions: {
anthropic: { cacheControl: { type: 'ephemeral' } },
},
});
// Layer 3: Task Context (per-request, partially cached)
const taskContext = await buildTaskContext(
agentKey, workspaceId, userMessage, options
);
if (taskContext) {
messages.push({
role: 'system',
content: taskContext,
providerOptions: {
anthropic: { cacheControl: { type: 'ephemeral' } },
},
});
}
return messages;
}
async function buildTaskContext(
agentKey: string,
workspaceId: string,
userMessage: string,
options?: any
): Promise {
const parts: string[] = [];
// Domain rules (based on detected skill)
const skill = detectSkillFromMessage(userMessage);
if (skill) {
const domain = skillToDomain(skill);
const compiled = getCompiledPrompts();
if (compiled.domainRules[domain]) {
parts.push(compiled.domainRules[domain]);
}
}
// Auto-context injection (from database)
const topics = extractTopics(userMessage);
if (topics.length > 0) {
const context = await queryAutoContext(workspaceId, topics);
if (context) parts.push(context);
}
// Delegation context
if (options?.delegationPattern) {
parts.push(Delegation: [${options.delegationPattern.toUpperCase()}] from ${options.parentAgent});
}
return parts.length > 0 ? parts.join('\n\n---\n\n') : null;
}
| Component | Target | Hard Limit | Notes |
|---|---|---|---|
| Core Protocol (L1) | 1,500 | 2,000 | Must exceed 1,024 for Sonnet caching |
| Agent Persona (L2) | 500 | 700 | Identity-essential content only |
| Domain Rules (L3) | 300 | 500 | Loaded conditionally per skill |
| Auto-Context (L3) | 200 | 500 | Max 5 context items |
| Team Personality (L3) | 100 | 200 | Principles injection |
| Total System Prompt | 2,500 | 3,700 | Per API call |
| Scenario | Sonnet Monthly | vs. Naive |
|---|---|---|
| Full uncompressed, no caching | $35.37 | Baseline |
| Compressed, no caching | $4.50 | -87% |
| Compressed + cached | $0.71 | -98% |
Gateways coordinate multi-agent sessions. The PLT (Product Leadership Team) is the most complex, spawning 3-4 agents in parallel and synthesizing their perspectives.
User Request: "@plt Should we delay launch for SSO?"
│
▼
┌──────────────────────────────────────────────────────────────┐
│ PLT GATEWAY HANDLER │
│ maxDuration: 120s │
│ │
│ 1. Assess complexity → FULL PLT │
│ 2. Select agents: vp-product, dir-pm, dir-pmm, prod-ops │
│ 3. Auto-context: query "launch", "SSO" from context DB │
│ │
│ 4. PARALLEL EXECUTION (Promise.all) │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ VP Prod │ │ Dir PM │ │ Dir PMM │ │ ProdOps │ │
│ │ maxSteps:5│ │ maxSteps:5│ │ maxSteps:5│ │ maxSteps:5│ │
│ │ ~15-25s │ │ ~15-25s │ │ ~15-25s │ │ ~15-25s │ │
│ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │
│ Wall clock: max(agent times) ≈ 20-30s │
│ │
│ 5. FORMAT: Meeting Mode (each agent speaks in first person) │
│ 6. SYNTHESIZE: VP Product summarizes agreement/tension │
│ 7. LOG: agent_invocations table + ROI tracking │
└──────────────────────────────────────────────────────────────┘
// app/api/plt/route.ts
export const maxDuration = 120; // 2 minutes, generous for PLTexport async function POST(request: Request) {
const { topic, workspaceId } = await request.json();
const { apiKey, provider } = await getUserApiKey(request);
const model = selectModel(provider, apiKey);
// Assess complexity and select agents
const agentIds = assessPLTComplexity(topic);
// e.g., ['vp-product', 'director-product-management',
// 'director-product-marketing', 'product-operations']
// Auto-context injection
const autoContext = await queryAutoContext(workspaceId, extractTopics(topic));
// Phase 1: Parallel agent execution
const agentResults = await Promise.all(
agentIds.map(agentId => {
const systemPrompt = compilePrompt(agentId, workspaceId, topic);
return generateText({
model,
system: systemPrompt,
tools: createAgentTools(workspace),
maxSteps: 5,
prompt: ${autoContext ? ## Auto-Context\n${autoContext}\n\n : ''}${topic},
});
})
);
// Phase 2: Format Meeting Mode
const meetingMode = formatMeetingMode(agentIds, agentResults);
// Phase 3: Synthesis (optional — can also stream)
const synthesizer = agentIds[0]; // VP Product typically synthesizes
const synthesis = await generateText({
model,
system: You are ${getAgentIdentity(synthesizer).displayName}. Synthesize the PLT discussion.,
prompt: Synthesize these perspectives:\n\n${meetingMode},
maxSteps: 1,
});
// Post-processing
await logGatewaySession({
type: 'gateway',
gateway: 'plt',
agentsSpawned: agentIds,
requestId: createRequestId(),
workspaceId,
});
return Response.json({
meetingMode,
synthesis: synthesis.text,
roi: calculatePLTRoi(agentResults),
});
}
| Concern | Resolution |
|---|---|
| Timeout | Fluid Compute: 300s default, 800s max. PLT targets <60s p95. 5x headroom. |
| Memory | 4GB / 2 vCPU on Pro. Adequate for 4 parallel agents. |
| Cost | ~$0.00035/session (Active CPU billing pauses during I/O waits). |
| Payload | 4.5MB limit is client→Vercel only. LLM calls are server-side. |
| Rate limits | Per-user BYOT keys. Max 1 PLT session at a time per user. |
// Per-route timeout configuration
// app/api/chat/route.ts
export const maxDuration = 60; // Single agent: 1 minute// app/api/plt/route.ts
export const maxDuration = 120; // PLT sessions: 2 minutes
// app/api/gateway/[gateway]/route.ts
export const maxDuration = 120; // All gateways: 2 minutes
// app/api/skill/[skill]/route.ts
export const maxDuration = 30; // Skills: 30 seconds
| Context | maxSteps | Rationale |
|---|---|---|
| Single agent (standalone) | 10 | Full tool loop capability |
| PLT-spawned agents | 5 | Bounds total PLT time |
| Skill execution | 1 | Skills are single-step |
| Sub-agent (delegation) | 5 | Focused tasks |
The context layer migrates from flat files to PostgreSQL while preserving the same semantics:
| CLI Operation | CLI Implementation | Cloud Implementation |
|---|---|---|
/context-save | Parse + update index.md + write file + update index.json | INSERT into decisions/bets/learnings + INSERT cross_references |
/context-recall | Read index.json + filter topics | SELECT with GIN index on topics[] + Typesense |
/portfolio-status | Read active-bets.md | SELECT portfolio_state JOIN strategic_bets |
/feedback-capture | Write to context/feedback/ | INSERT into feedback + auto-theme matching |
| Auto-registration | Write to documents/index.md | INSERT into documents table |
| Cross-references | Update crossReferences in index.json | INSERT into cross_references table |
Before agents produce deliverables, relevant context is automatically injected:
// lib/auto-context.ts
export async function queryAutoContext(
workspaceId: string,
topics: string[]
): Promise {
if (topics.length === 0) return null; // Query decisions, bets, and feedback in parallel
const [decisions, bets, feedback] = await Promise.all([
db.select()
.from(schema.decisions)
.where(and(
eq(schema.decisions.workspaceId, workspaceId),
arrayOverlaps(schema.decisions.topics, topics),
isNull(schema.decisions.archivedAt),
))
.orderBy(desc(schema.decisions.createdAt))
.limit(5),
db.select()
.from(schema.strategicBets)
.where(and(
eq(schema.strategicBets.workspaceId, workspaceId),
arrayOverlaps(schema.strategicBets.topics, topics),
eq(schema.strategicBets.status, 'active'),
))
.limit(3),
db.select()
.from(schema.feedback)
.where(and(
eq(schema.feedback.workspaceId, workspaceId),
arrayOverlaps(schema.feedback.topics, topics),
))
.orderBy(desc(schema.feedback.createdAt))
.limit(5),
]);
if (!decisions.length && !bets.length && !feedback.length) {
return null;
}
return formatAutoContext({ decisions, bets, feedback });
}
The CLI's JSON-based cross-references become a proper relational graph:
-- Find all connected items for a decision (bidirectional, 1 hop)
SELECT
CASE
WHEN cr.source_type = 'decision' AND cr.source_id = $1 THEN cr.target_type
ELSE cr.source_type
END as related_type,
CASE
WHEN cr.source_type = 'decision' AND cr.source_id = $1 THEN cr.target_id
ELSE cr.source_id
END as related_id,
cr.relationship
FROM cross_references cr
WHERE cr.workspace_id = $2
AND (
(cr.source_type = 'decision' AND cr.source_id = $1)
OR (cr.target_type = 'decision' AND cr.target_id = $1)
);
Agent invocations are logged to the agent_invocations table with distributed tracing:
// lib/interaction-logger.ts
export async function logInvocation(params: InvocationLog) {
await db.insert(schema.agentInvocations).values({
workspaceId: params.workspaceId,
userId: params.userId,
conversationId: params.conversationId,
invocationType: params.type,
agentOrSkill: params.agentOrSkill,
requestSummary: params.requestSummary,
status: params.status,
requestId: params.requestId,
spanId: params.spanId,
parentSpanId: params.parentSpanId,
modelUsed: params.modelUsed,
tokensIn: params.tokensIn,
tokensOut: params.tokensOut,
durationMs: params.durationMs,
toolsUsed: params.toolsUsed,
agentsSpawned: params.agentsSpawned,
complexity: params.complexity,
roiMinutesSaved: params.roiMinutesSaved,
filesCreated: params.filesCreated,
contextEntriesCreated: params.contextEntriesCreated,
});
}
-- Team personality definitions
CREATE TABLE team_personalities (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
slug VARCHAR(50) NOT NULL UNIQUE, -- 'v2v-operators', 'user-obsessed', etc.
team VARCHAR(50) NOT NULL, -- 'product', 'design', 'architecture', 'marketing'
name VARCHAR(255) NOT NULL, -- "Vision to Value Operators"
personality_tag VARCHAR(100) NOT NULL, -- 2-3 word descriptor
philosophy TEXT NOT NULL, -- 1-2 paragraph worldview
principles JSONB NOT NULL DEFAULT '[]', -- Array of {id, name, statement, enforcement}
version VARCHAR(20) NOT NULL DEFAULT '1.0.0',
is_default BOOLEAN NOT NULL DEFAULT false, -- Default personality for the team
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);-- Default personalities for each team
INSERT INTO team_personalities (slug, team, name, personality_tag, philosophy, principles, is_default)
VALUES
('v2v-operators', 'product', 'V2V Operating System', 'Vision to Value Operators',
'Product organizations exist to convert strategic vision into measurable customer value...',
'[{"id":"P1","name":"End-to-End Ownership","statement":"..."},...]',
true),
('user-obsessed', 'design', 'Design Operating Principles', 'User-Obsessed Craftspeople',
'Design exists to make the complex simple and the simple delightful...',
'[{"id":"D1","name":"User-First Always","statement":"..."},...]',
true),
('pragmatic-thinkers', 'architecture', 'Architecture Operating Principles', 'Pragmatic System Thinkers',
'Architecture exists to enable business capability through technology...',
'[{"id":"A1","name":"Simplicity Over Cleverness","statement":"..."},...]',
true),
('data-storytellers', 'marketing', 'Marketing Operating Principles', 'Data-Driven Storytellers',
'Marketing exists to connect product value with customer need...',
'[{"id":"M1","name":"Customer Truth","statement":"..."},...]',
true);
-- Workspace-level personality overrides
CREATE TABLE workspace_personalities (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
workspace_id UUID NOT NULL REFERENCES workspaces(id) ON DELETE CASCADE,
team VARCHAR(50) NOT NULL, -- 'product', 'design', etc.
personality_id UUID NOT NULL REFERENCES team_personalities(id),
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), UNIQUE(workspace_id, team)
);
-- RLS
ALTER TABLE workspace_personalities ENABLE ROW LEVEL SECURITY;
CREATE POLICY workspace_isolation ON workspace_personalities
USING (workspace_id = current_setting('app.current_workspace_id')::uuid);
ALTER TABLE workspace_personalities FORCE ROW LEVEL SECURITY;
-- Knowledge packs with team attribution
CREATE TABLE knowledge_packs (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
slug VARCHAR(50) NOT NULL UNIQUE,
name VARCHAR(255) NOT NULL,
description TEXT,
team VARCHAR(50), -- 'product', 'design', 'architecture', 'marketing', NULL=cross-team
content TEXT NOT NULL, -- Full markdown content
primary_agents TEXT[] NOT NULL DEFAULT '{}',
version VARCHAR(20) NOT NULL DEFAULT '1.0.0',
token_count INTEGER, -- Pre-computed for budget enforcement
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);-- Seed with 22 knowledge packs (9 OS + 13 Extension Teams)
-- OS packs: prioritization, pricing-frameworks, discovery-methods,
-- metrics-frameworks, competitive-frameworks, gtm-playbooks,
-- stakeholder-management, user-research, financial-modeling
-- Design packs: design-systems, user-research-methods, accessibility, interaction-patterns
-- Architecture packs: api-design, data-architecture, security-patterns, cloud-native
-- Marketing packs: content-strategy, seo-frameworks, analytics-methodology,
-- brand-management, campaign-optimization
-- Unified registry of all 39 agents + 5 gateways
CREATE TABLE agent_registry (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
agent_key VARCHAR(100) NOT NULL UNIQUE, -- 'product-manager', 'ui-designer', etc.
emoji VARCHAR(10) NOT NULL, -- Agent emoji
display_name VARCHAR(255) NOT NULL, -- "Product Manager"
short_name VARCHAR(50) NOT NULL, -- "PM"
team VARCHAR(50) NOT NULL, -- 'product', 'design', 'architecture', 'marketing'
agent_type VARCHAR(20) NOT NULL -- 'agent', 'gateway'
CHECK (agent_type IN ('agent', 'gateway')),
persona_template_id UUID REFERENCES prompt_templates(id),
knowledge_packs TEXT[] NOT NULL DEFAULT '{}', -- Slugs of applicable knowledge packs
primary_skills TEXT[] NOT NULL DEFAULT '{}', -- Skills this agent primarily uses
domain_routing TEXT[] NOT NULL DEFAULT '{}', -- Keywords for auto-routing
is_active BOOLEAN NOT NULL DEFAULT true,
tier_required VARCHAR(20) NOT NULL DEFAULT 'trial' -- 'trial', 'individual', 'team', 'enterprise'
CHECK (tier_required IN ('trial', 'individual', 'team', 'enterprise')),
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);-- Indexes for routing
CREATE INDEX idx_agent_team ON agent_registry(team);
CREATE INDEX idx_agent_domain ON agent_registry USING GIN(domain_routing);
CREATE INDEX idx_agent_type ON agent_registry(agent_type);
The agent_invocations table from the original data model already handles interaction logging. The v3 addition is the delegation pattern tracking:
-- Add delegation tracking to agent_invocations
ALTER TABLE agent_invocations
ADD COLUMN delegation_pattern VARCHAR(20)
CHECK (delegation_pattern IN ('consultation', 'delegation', 'review', 'debate')),
ADD COLUMN delegation_from VARCHAR(100), -- Parent agent key
ADD COLUMN delegation_deliverable TEXT; -- What was delegated
app/
├── api/
│ ├── chat/
│ │ └── route.ts maxDuration: 60 Single agent conversation
│ ├── plt/
│ │ └── route.ts maxDuration: 120 PLT Meeting Mode
│ ├── gateway/
│ │ └── [gateway]/
│ │ └── route.ts maxDuration: 120 Any gateway (@product, @design, etc.)
│ ├── skill/
│ │ └── [skill]/
│ │ └── route.ts maxDuration: 30 Skill invocation
│ ├── context/
│ │ ├── save/route.ts /context-save
│ │ ├── recall/route.ts /context-recall
│ │ ├── portfolio/route.ts /portfolio-status
│ │ ├── feedback/
│ │ │ ├── capture/route.ts /feedback-capture
│ │ │ └── recall/route.ts /feedback-recall
│ │ └── graph/route.ts Cross-reference queries
│ ├── workspace/
│ │ ├── route.ts CRUD workspaces
│ │ ├── connect/
│ │ │ └── [provider]/route.ts OAuth flows (Google Drive, etc.)
│ │ └── files/
│ │ └── route.ts File browser
│ ├── agents/
│ │ └── route.ts List available agents
│ ├── keys/
│ │ └── route.ts BYOT key management
│ ├── usage/
│ │ └── route.ts Usage dashboard
│ └── webhooks/
│ ├── clerk/route.ts User sync
│ └── stripe/route.ts Billing events
// middleware.ts
import { clerkMiddleware } from '@clerk/nextjs/server';
import { NextResponse } from 'next/server';export default clerkMiddleware(async (auth, req) => {
// 1. Rate limiting (in-memory + Redis fallback)
const rateLimitResult = await checkRateLimit(req);
if (!rateLimitResult.allowed) {
return NextResponse.json(
{ error: { code: 'rate_limited', message: 'Rate limit exceeded' } },
{ status: 429, headers: { 'Retry-After': rateLimitResult.retryAfter } }
);
}
// 2. Workspace context injection
const workspaceId = req.headers.get('X-Workspace-ID');
if (workspaceId && req.nextUrl.pathname.startsWith('/api/')) {
// Set PostgreSQL RLS context
await setWorkspaceContext(db, workspaceId);
}
// 3. Distributed tracing
const requestId = req.headers.get('X-Request-ID') || crypto.randomUUID();
const response = NextResponse.next();
response.headers.set('X-Request-ID', requestId);
return response;
});
type SSEEvent =
| { type: 'token'; data: { text: string } }
| { type: 'tool_call_start'; data: { tool: string; id: string } }
| { type: 'tool_call_result'; data: { id: string; result: unknown } }
| { type: 'agent_start'; data: { agent: string; emoji: string; display_name: string } }
| { type: 'agent_complete'; data: { agent: string; roi_minutes: number } }
| { type: 'file_created'; data: { file_id: string; path: string; action: string } }
| { type: 'context_saved'; data: { type: string; id: string } }
| { type: 'error'; data: { code: string; message: string; recoverable: boolean } }
| { type: 'done'; data: { tokens_in: number; tokens_out: number; duration_ms: number } };
| Aspect | Decision | Rationale |
|---|---|---|
| Phase 1 UX | None — Infrastructure only | Reduce MVP scope, no settings UI needed |
| Data model | Separate team_personalities table | Clean separation, swappable at query time |
| Prompt injection | Appended to Layer 2 (Agent Persona) | Minimal token overhead (~100 tokens) |
| Defaults | One default personality per team | Works out of the box |
| Swapping | API layer supports it; UI deferred | Infrastructure ready for Phase 2 UX |
Agent Spawn Request
│
▼
┌──────────────────────────────────────────────────────┐
│ 1. Look up agent in agent_registry │
│ → Get team: 'product' │
│ │
│ 2. Look up workspace personality override │
│ workspace_personalities WHERE team = 'product' │
│ → If found: use override personality_id │
│ → If not: use default from team_personalities │
│ │
│ 3. Load personality principles │
│ team_personalities WHERE id = personality_id │
│ → Get principles JSON array │
│ │
│ 4. Inject into Layer 2 of system prompt │
│ Append to agent persona: │
│ "## Team Operating Principles │
│ Personality: Vision to Value Operators │
│ P1: End-to-End Ownership — ... │
│ P2: Decision Quality — ..." │
└──────────────────────────────────────────────────────┘
// Phase 1: Read-only API for personalities
// app/api/personalities/route.ts
export async function GET(req: Request) {
// List all available personalities
const personalities = await db.select()
.from(schema.teamPersonalities)
.orderBy(schema.teamPersonalities.team);
return Response.json({ data: personalities });
}// app/api/workspace/[id]/personality/route.ts
export async function GET(req: Request) {
// Get current workspace personality config
const config = await db.select()
.from(schema.workspacePersonalities)
.where(eq(schema.workspacePersonalities.workspaceId, workspaceId));
return Response.json({ data: config });
}
export async function PUT(req: Request) {
// Set workspace personality (for future UI)
const { team, personalityId } = await req.json();
await db.insert(schema.workspacePersonalities)
.values({ workspaceId, team, personalityId })
.onConflictDoUpdate({ target: [schema.workspacePersonalities.workspaceId, schema.workspacePersonalities.team] });
return Response.json({ data: { success: true } });
}
When a team personality is active, it adds ~100 tokens to the agent persona (Layer 2):
```markdown
Knowledge packs are loaded on-demand based on the agent and task:
// lib/knowledge-loader.ts
export async function loadKnowledgePacks(
agentKey: string,
detectedSkill?: string
): Promise {
// Get agent's primary knowledge packs
const agent = await db.select()
.from(schema.agentRegistry)
.where(eq(schema.agentRegistry.agentKey, agentKey))
.limit(1); const packSlugs = agent[0]?.knowledgePacks || [];
// Load packs from DB or R2 cache
const packs = await db.select()
.from(schema.knowledgePacks)
.where(inArray(schema.knowledgePacks.slug, packSlugs));
// Budget enforcement: max 2 packs per invocation, prioritize by relevance
const sorted = rankByRelevance(packs, detectedSkill);
return sorted.slice(0, 2).map(p => p.content);
}
| Pack | Team | Primary Agents | Est. Tokens |
|---|---|---|---|
| prioritization | product | @pm, @pm-dir | ~1,200 |
| pricing-frameworks | product | @bizops, @vp-product | ~1,100 |
| discovery-methods | product | @pm, @ux-lead | ~1,000 |
| metrics-frameworks | product | @bizops, @value-realization | ~1,100 |
| competitive-frameworks | product | @ci, @pmm-dir | ~1,000 |
| gtm-playbooks | product | @pmm, @pmm-dir | ~1,200 |
| stakeholder-management | product | @pm-dir, @prod-ops | ~900 |
| user-research | product | @ux-lead, @pm | ~1,100 |
| financial-modeling | product | @bizops, @bizdev | ~1,000 |
| design-systems | design | @ui-designer, @visual-designer | ~1,000 |
| user-research-methods | design | @user-researcher | ~1,100 |
| accessibility | design | @ui-designer | ~800 |
| interaction-patterns | design | @interaction-designer | ~900 |
| api-design | architecture | @api-architect | ~1,000 |
| data-architecture | architecture | @data-architect | ~1,100 |
| security-patterns | architecture | @security-architect | ~900 |
| cloud-native | architecture | @cloud-architect | ~1,000 |
| content-strategy | marketing | @content-strategist | ~1,000 |
| seo-frameworks | marketing | @seo-specialist | ~900 |
| analytics-methodology | marketing | @analytics-specialist | ~1,000 |
| brand-management | marketing | @brand-strategist | ~900 |
| campaign-optimization | marketing | @paid-media, @email-marketing | ~1,000 |
Knowledge packs are NOT included in the cached system prompt (they'd break the cache key). Instead, they're loaded into the conversation context when the agent's task requires framework application.
// lib/cloud-storage/interface.ts
export interface CloudStorageProvider {
readFile(fileId: string): Promise;
writeFile(path: string, content: string, parentFolderId: string): Promise;
updateFile(fileId: string, content: string): Promise;
deleteFile(fileId: string): Promise;
listFiles(folderId: string, query?: string): Promise;
getMetadata(fileId: string): Promise;
resolvePathToId(path: string, rootFolderId: string): Promise;
createFolder(name: string, parentId: string): Promise;
}// lib/cloud-storage/google-drive.ts
export class GoogleDriveProvider implements CloudStorageProvider {
constructor(private accessToken: string) {}
async readFile(fileId: string): Promise {
const response = await fetch(
https://www.googleapis.com/drive/v3/files/${fileId}?alt=media,
{ headers: { Authorization: Bearer ${this.accessToken} } }
);
return response.text();
}
async writeFile(path: string, content: string, parentFolderId: string): Promise {
const metadata = {
name: path.split('/').pop(),
parents: [parentFolderId],
mimeType: 'text/markdown',
};
// Multipart upload
const form = new FormData();
form.append('metadata', new Blob([JSON.stringify(metadata)], { type: 'application/json' }));
form.append('file', new Blob([content], { type: 'text/markdown' }));
const response = await fetch(
'https://www.googleapis.com/upload/drive/v3/files?uploadType=multipart',
{ method: 'POST', headers: { Authorization: Bearer ${this.accessToken} }, body: form }
);
const result = await response.json();
return result.id;
}
// ... other methods
}
OAuth tokens are encrypted with envelope encryption and auto-refreshed:
// lib/cloud-storage/token-manager.ts
export async function getProviderToken(
workspaceId: string,
provider: string
): Promise {
const integration = await db.select()
.from(schema.connectedIntegrations)
.where(and(
eq(schema.connectedIntegrations.workspaceId, workspaceId),
eq(schema.connectedIntegrations.provider, provider),
))
.limit(1); if (!integration[0]) throw new Error(No ${provider} connection);
const token = await db.select()
.from(schema.integrationTokens)
.where(eq(schema.integrationTokens.integrationId, integration[0].id))
.limit(1);
// Decrypt access token
let accessToken = await decrypt(token[0].encryptedAccessToken, token[0].encryptedDek);
// Check expiry and refresh if needed
if (token[0].expiresAt && new Date(token[0].expiresAt) < new Date()) {
const refreshToken = await decrypt(token[0].encryptedRefreshToken, token[0].encryptedDek);
accessToken = await refreshOAuthToken(provider, refreshToken);
await updateEncryptedToken(token[0].id, accessToken);
}
return accessToken;
}
Files in the user's cloud storage follow the OS context structure:
User's Google Drive/
└── Legionis Workspace/ ← User-selected folder
├── context/
│ ├── decisions/ ← DR-YYYY-NNN.md files
│ ├── bets/ ← SB-YYYY-NNN.md files
│ ├── feedback/ ← FB-YYYY-NNN.md files
│ ├── learnings/ ← L-NNN entries
│ ├── portfolio/ ← Active bets tracking
│ └── documents/ ← Auto-registered deliverables
├── deliverables/ ← PRDs, roadmaps, analyses
└── .workspace.json ← Workspace metadata
┌─────────────────────────────────────────────────────────────────┐
│ CACHE HIERARCHY │
│ │
│ Level 1: Anthropic API Prompt Cache (5 min TTL) │
│ ─ System prompt layers cached by Anthropic │
│ ─ 90% cost reduction on cache hits │
│ ─ Requires exact prefix match │
│ │
│ Level 2: In-Memory Compiled Prompts (per Vercel instance) │
│ ─ Compiled core protocol and agent personas │
│ ─ Refreshed on version update or cold start │
│ ─ Near-zero latency │
│ │
│ Level 3: R2 Object Cache (persistent) │
│ ─ Compiled prompt JSON │
│ ─ Knowledge pack content │
│ ─ Updated via build pipeline (compile-prompts.ts) │
│ │
│ Level 4: PostgreSQL (source of truth) │
│ ─ prompt_templates table │
│ ─ knowledge_packs table │
│ ─ team_personalities table │
└─────────────────────────────────────────────────────────────────┘
| Trigger | Action |
|---|---|
| OS version update | Re-run compile-prompts.ts, update R2, invalidate L2 |
| Agent persona edit (A/B test) | Update prompt_templates, invalidate L2 for that agent |
| Team personality change | Update workspace_personalities, invalidate L2 for team agents |
| Knowledge pack update | Update knowledge_packs, no prompt cache impact (not in system prompt) |
| Aspect | CLI (MCP) | Cloud (OAuth) |
|---|---|---|
| Protocol | MCP (stdio/SSE) | OAuth 2.0 + REST API |
| Authentication | API keys in env vars | Encrypted tokens in DB |
| Availability | User configures locally | User connects via OAuth flow |
| Tool registration | .mcp.json config file | connected_integrations table |
| Runtime detection | Check available tool list | Query integrations for workspace |
| Integration | API | OAuth Scopes | Agent Use Cases |
|---|---|---|---|
| Google Drive | Drive API v3 | drive.file | File read/write (primary storage) |
| Jira | Jira REST API v3 | read:jira-work, write:jira-work | Create issues from user stories |
| Slack | Slack Web API | chat:write, channels:read | Post updates, share decisions |
| GitHub | GitHub REST/GraphQL | repo, issues | Link commits to features |
| Linear | Linear API | issues:read, issues:write | Project management sync |
Agents detect available integrations at runtime. If a tool is not connected, agents produce text output with actionable "Next Steps (Manual)" sections, exactly as the MCP integration framework specifies.
┌─────────────────────────────────────────────────────────┐
│ Layer 1: Authentication (Clerk) │
│ ─ JWT validation on every request │
│ ─ Session management with refresh │
│ ─ Social login + email/password │
├─────────────────────────────────────────────────────────┤
│ Layer 2: Authorization (RLS + Middleware) │
│ ─ PostgreSQL RLS on all workspace-scoped tables │
│ ─ Middleware sets workspace context per request │
│ ─ Tier-based feature gating │
├─────────────────────────────────────────────────────────┤
│ Layer 3: Data Encryption │
│ ─ API keys: AES-256-GCM + envelope encryption (KMS) │
│ ─ OAuth tokens: Same envelope encryption │
│ ─ Data at rest: Neon TLS + encryption │
│ ─ Data in transit: TLS 1.3 everywhere │
├─────────────────────────────────────────────────────────┤
│ Layer 4: Prompt Security │
│ ─ Injection detection (pattern matching) │
│ ─ User content sandboxing (XML boundary markers) │
│ ─ Output validation (system prompt leak detection) │
│ ─ Rate limiting on suspected injection probing │
├─────────────────────────────────────────────────────────┤
│ Layer 5: Audit & Monitoring │
│ ─ All agent invocations logged with trace context │
│ ─ All file operations logged │
│ ─ Security events in Sentry │
│ ─ Rate limit violations tracked │
└─────────────────────────────────────────────────────────┘
The CLI's Bash tool is removed in cloud mode. Agents cannot execute arbitrary shell commands. All file operations go through the cloud storage abstraction layer, which enforces:
| Agent Key | Emoji | Display Name | Team |
|---|---|---|---|
| product-manager | 📝 | Product Manager | product |
| cpo | 👑 | Chief Product Officer | product |
| vp-product | 📈 | VP Product | product |
| director-product-management | 📋 | Director of Product Management | product |
| director-product-marketing | 📣 | Director of Product Marketing | product |
| product-marketing-manager | 🎯 | Product Marketing Manager | product |
| product-mentor | 🎓 | Product Mentor | product |
| bizops | 🧮 | BizOps | product |
| bizdev | 🤝 | Business Development | product |
| competitive-intelligence | 🔭 | Competitive Intelligence | product |
| product-operations | ⚙️ | Product Operations | product |
| ux-lead | 🎨 | UX Lead | product |
| value-realization | 💰 | Value Realization | product |
Design Team (6):
| Agent Key | Emoji | Display Name |
|---|---|---|
| design-dir | 🎨 | Director of Design |
| ui-designer | 🖼️ | UI Designer |
| visual-designer | 🎨 | Visual Designer |
| interaction-designer | 🔄 | Interaction Designer |
| user-researcher | 🔍 | User Researcher |
| motion-designer | 🎬 | Motion Designer |
Architecture Team (6):
| Agent Key | Emoji | Display Name |
|---|---|---|
| architecture-dir | 🏗️ | Chief Architect |
| api-architect | 🔌 | API Architect |
| data-architect | 🗄️ | Data Architect |
| security-architect | 🔒 | Security Architect |
| cloud-architect | ☁️ | Cloud Architect |
| ai-architect | 🧠 | AI/ML Architect |
Marketing Team (14):
| Agent Key | Emoji | Display Name |
|---|---|---|
| marketing-dir | 📢 | Director of Marketing |
| content-strategist | ✍️ | Content Strategist |
| copywriter | 📄 | Copywriter |
| seo-specialist | 🔍 | SEO Specialist |
| cro-specialist | 📊 | CRO Specialist |
| paid-media | 💰 | Paid Media Specialist |
| email-marketing | 📧 | Email Marketing Specialist |
| social-media | 📱 | Social Media Manager |
| growth-hacker | 🚀 | Growth Hacker |
| market-researcher | 📈 | Market Researcher |
| video-producer | 🎥 | Video Producer |
| pr-specialist | 📰 | PR Specialist |
| brand-strategist | 🏷️ | Brand Strategist |
| analytics-specialist | 📊 | Analytics Specialist |
| Gateway Key | Emoji | Display Name | Behavior |
|---|---|---|---|
| product | 🏛️ | Product Gateway | Routes to relevant owners, orchestrates execution |
| product-leadership-team | 👥 | PLT | Meeting Mode with multiple leadership perspectives |
| design | 🎨 | Design Gateway | Routes to design specialists |
| architecture | 🏗️ | Architecture Gateway | Routes to architecture specialists |
| marketing | 📢 | Marketing Gateway | Routes to marketing specialists |
Request ID: req_abc123
│
├── span_001: PLT Gateway (gateway:plt)
│ ├── span_002: VP Product (agent:vp-product)
│ │ └── span_003: BizOps (sub-agent, consultation)
│ ├── span_004: Dir PM (agent:director-product-management)
│ ├── span_005: Dir PMM (agent:director-product-marketing)
│ └── span_006: ProdOps (agent:product-operations)
│
└── Post-processing: ROI calculation, interaction logging
Every request gets a trace context that propagates through sub-agent spawns:
// lib/tracing.ts
export interface TraceContext {
requestId: string;
spanId: string;
parentSpanId?: string;
userId: string;
workspaceId: string;
operation: string;
startedAt: string;
depth: number;
}export function createTraceContext(req: Request, operation: string): TraceContext {
return {
requestId: req.headers.get('X-Request-ID') || crypto.randomUUID(),
spanId: crypto.randomUUID(),
parentSpanId: req.headers.get('X-Parent-Span-ID'),
userId: auth().userId!,
workspaceId: req.headers.get('X-Workspace-ID')!,
operation,
startedAt: new Date().toISOString(),
depth: 0,
};
}
export function childSpan(parent: TraceContext, operation: string): TraceContext {
return {
...parent,
spanId: crypto.randomUUID(),
parentSpanId: parent.spanId,
operation,
startedAt: new Date().toISOString(),
depth: parent.depth + 1,
};
}
{
"$schema": "https://openapi.vercel.sh/vercel.json",
"fluid": true,
"regions": ["iad1"],
"functions": {
"app/api/plt/**": { "maxDuration": 120 },
"app/api/gateway/**": { "maxDuration": 120 },
"app/api/chat/**": { "maxDuration": 60 },
"app/api/skill/**": { "maxDuration": 30 }
}
}
Database
DATABASE_URL=postgresql://...@ep-xxx.us-east-2.aws.neon.tech/neondb?sslmode=requireAuth
NEXT_PUBLIC_CLERK_PUBLISHABLE_KEY=pk_live_...
CLERK_SECRET_KEY=sk_live_...Billing
STRIPE_SECRET_KEY=sk_live_...
STRIPE_WEBHOOK_SECRET=whsec_...Storage
R2_ACCOUNT_ID=...
R2_ACCESS_KEY_ID=...
R2_SECRET_ACCESS_KEY=...
R2_BUCKET_NAME=project-saas-promptsSearch
TYPESENSE_API_KEY=...
TYPESENSE_HOST=xxx.typesense.netMonitoring
SENTRY_DSN=https://...@sentry.io/...
NEXT_PUBLIC_POSTHOG_KEY=phc_...
BETTERSTACK_SOURCE_TOKEN=...Encryption
KMS_KEY_ARN=arn:aws:kms:us-east-1:...Cloud Storage OAuth
GOOGLE_CLIENT_ID=...
GOOGLE_CLIENT_SECRET=...
users 1─────M workspace_memberships M──────1 workspaces
│ │
│ ┌────────┼─────────────────────────┐
│ │ │ │
└──M user_api_keys decisions strategic_bets feedback │
│ │ │ │
│ │ │ │
┌──────┼────────────┼──────────────┼──────┐ │
│ │ │
cross_references feedback_themes
│ │
assumptions learnings documents feedback_theme_links
│
portfolio_state workspaces ──M connected_integrations ──1 integration_tokens
workspaces ──M conversations ──M messages
workspaces ──M agent_invocations
workspaces ──M usage_events
workspaces ──M roi_sessions
workspaces ──M workspace_personalities ──1 team_personalities (NEW)
agent_registry (global) ──1 prompt_templates (global)
knowledge_packs (global)
team_personalities (global)
| Metric | Target | Measurement |
|---|---|---|
| Single agent response (p50) | <10s | Time to last token |
| Single agent response (p95) | <30s | Time to last token |
| PLT session (p50) | <30s | Time to formatted response |
| PLT session (p95) | <60s | Time to formatted response |
| Skill invocation (p50) | <5s | Time to completion |
| Context recall query | <500ms | Database query + format |
| Auto-context injection | <200ms | Topic extraction + query |
| Time to first token (streaming) | <3s | First SSE event |
| Cache hit rate (Anthropic) | >80% | Provider metadata tracking |
| Feature | Status | Notes |
|---|---|---|
| 61 skills | MVP | All skills available |
| 39 agents (13 OS + 26 Extension) | MVP | Full roster |
| 5 gateways | MVP | @product, @plt, @design, @architecture, @marketing |
| PLT Meeting Mode | MVP | Parallel agents + synthesis |
| Delegation Protocol (4 patterns) | MVP | Consultation, Delegation, Review, Debate |
| Context layer (PostgreSQL) | MVP | All context tables |
| Cross-reference graph | MVP | Relational implementation |
| Auto-context injection | MVP | Topic-based, database-backed |
| Team Personalities (infrastructure) | MVP | Data model + API, no UX |
| Knowledge packs (22) | MVP | Loaded from DB/R2 |
| Google Drive integration | MVP | OAuth + file tools |
| BYOT (Claude + OpenAI) | MVP | Per-request key routing |
| System prompt caching | MVP | 3-layer with Anthropic cache |
| Interaction logging | MVP | agent_invocations table |
| Distributed tracing | MVP | Request ID propagation |
| Prompt versioning | MVP | Database + feature flags |
| SSE streaming | MVP | Token-by-token + agent events |
| $10/mo individual, $8/seat team pricing | MVP | Stripe integration (1-month trial, no free tier) |
| OneDrive/Dropbox | Growth | Phase 2 |
| Team collaboration | Enterprise | Phase 3 |
| Hybrid CLI/Cloud sync | Enterprise | Phase 3 |
| Risk | Likelihood | Impact | Mitigation |
|---|---|---|---|
| PLT exceeds 60s p95 | Medium | Low | Cap maxSteps=5; 300s Fluid Compute buffer |
| Prompt compression degrades quality | Medium | Medium | A/B test; keep originals; tune iteratively |
| Google OAuth verification delayed | Medium | High | Apply early; use test mode for beta |
| BYOT key abuse (shared keys) | Low | Medium | Rate limit per key; abuse detection |
| Cloud storage API latency spikes | Low | Medium | Per-tool timeouts; circuit breaker |
| Cache hit rate below 70% | Low | Medium | Monitor; extend to 1-hour TTL |
| Multi-agent token costs surprise users | Medium | Low | Cost estimator in UI; model routing |
Document Status: Active (v3.1 — definitive architecture + conceptual platform model) Last Updated: 2026-02-18 Gate Owner: Chief Architect Next Review: Pre-development kickoff
v3.1 Change Log: Added Section 0 (Conceptual Platform Architecture) with 4-layer model, defensibility assessment, compounding flywheel, and 3 external connections. Maps conceptual layers to implementation sections. Sourced from Platform Architecture Deck (23-slide presentation, Feb 2026).