Canon Engine
Architecture

Four sources.
One governed platform.

Canon Engine ingests knowledge from four source types, enriches non-vault sources with LLM processing, and serves everything through a unified query interface with source-type filtering.

Knowledge Source Types

Dual-layer storage model.

Every knowledge source is stored in two layers: a structured record with full metadata and enrichment output, and chunks in a shared pgvector table for semantic search. This allows both “get transcript by ID” and “find knowledge about X across all sources” through a single system.

CanonVault documents
IngestionFile watcher / git hook
EnrichmentNone — human-authored content is authoritative
GovernanceFull lifecycle: draft → in-review → accepted → superseded
TranscriptsMeeting recordings
IngestionWebhook (Gong, Zoom, Google Meet)
EnrichmentSummary, key decisions, action items, topics
GovernanceFactual record — no governance lifecycle
EmailEmail threads
IngestionGmail push / Pub/Sub
EnrichmentThread summary, decisions, action items, topics
GovernanceFactual record — no governance lifecycle
DocumentsGoogle Docs, PDFs, uploads
IngestionDrive watcher / upload API
EnrichmentSummary, key points, topics
GovernanceFactual record — no governance lifecycle
Governance Lifecycle

Knowledge matures.
It doesn't just accumulate.

Canon vault documents follow a formal lifecycle. Only accepted documents are authoritative. Draft and in-review documents are provisional. Superseded documents point to their replacements. Non-vault knowledge sources are factual records — they don't need governance because they describe events, not positions.

draft
Provisional. Written but not yet reviewed.
in-review
Under steward evaluation. May be automated.
accepted
Authoritative. The only state that drives execution.
superseded
Historical. Points to its replacement.

Stewardship rule: Humans steward Canon. No automated system may promote a document to accepted status without human approval.

Query Interfaces

Three ways to ask.

File-Based

Direct file reads at known vault paths. Zero latency, zero API cost. The primary interface for agent session protocols and governance checks.

use when: Session start, obligation registers, architecture references
Vector

Supabase pgvector with source-type filtering. Semantic search across all knowledge sources through a single function call.

use when: "What do we know about X?" — discovery across all sources
Structured

Direct Supabase queries by ID, account, date range. Full metadata on every record.

use when: "Get the transcript from Tuesday's client call" — specific retrieval
LLM Enrichment

Enrich what's raw.
Preserve what's governed.

Transcripts, emails, and documents receive automatic LLM enrichment on ingest — summaries, key decisions, action items, and topics. Canon vault documents receive no enrichment. They are human-authored governed content, preserved exactly as written. This distinction is a governance invariant, not a feature toggle.

// enrichment profiles
transcript → summary, decisions, actions, topics
email → thread summary, decisions, actions, topics
document → summary, key points, topics
canon → none (human-authored = authoritative)

Built to be replaced.

The Knowledge Platform Contract defines every interface. Read how substitutability is designed in from day one.

Read the Contract →Start a Conversation