Architecture

Four sources.
One governed platform.

Canon Engine ingests knowledge from four source types, enriches non-vault sources with LLM processing, and serves everything through a unified query interface with source-type filtering.

Knowledge Source Types

Dual-layer storage model.

Every knowledge source is stored in two layers: a structured record with full metadata and enrichment output, and chunks in a shared pgvector table for semantic search. This allows both “get transcript by ID” and “find knowledge about X across all sources” through a single system.

CanonVault documents

IngestionFile watcher / git hook

EnrichmentNone — human-authored content is authoritative

GovernanceFull lifecycle: draft → in-review → accepted → superseded

TranscriptsMeeting recordings

IngestionWebhook (Gong, Zoom, Google Meet)

EnrichmentSummary, key decisions, action items, topics

GovernanceFactual record — no governance lifecycle

EmailEmail threads

IngestionGmail push / Pub/Sub

EnrichmentThread summary, decisions, action items, topics

GovernanceFactual record — no governance lifecycle

DocumentsGoogle Docs, PDFs, uploads

IngestionDrive watcher / upload API

EnrichmentSummary, key points, topics

GovernanceFactual record — no governance lifecycle

Governance Lifecycle

Knowledge matures.
It doesn't just accumulate.

Canon vault documents follow a formal lifecycle. Only accepted documents are authoritative. Draft and in-review documents are provisional. Superseded documents point to their replacements. Non-vault knowledge sources are factual records — they don't need governance because they describe events, not positions.

draft

Provisional. Written but not yet reviewed.

in-review

Under steward evaluation. May be automated.

accepted

Authoritative. The only state that drives execution.

superseded

Historical. Points to its replacement.

Stewardship rule: Humans steward Canon. No automated system may promote a document to accepted status without human approval.

Query Interfaces

Three ways to ask.

File-Based

Direct file reads at known vault paths. Zero latency, zero API cost. The primary interface for agent session protocols and governance checks.

use when: Session start, obligation registers, architecture references

Vector

Supabase pgvector with source-type filtering. Semantic search across all knowledge sources through a single function call.

use when: "What do we know about X?" — discovery across all sources

Structured

Direct Supabase queries by ID, account, date range. Full metadata on every record.

use when: "Get the transcript from Tuesday's client call" — specific retrieval

LLM Enrichment

Enrich what's raw.
Preserve what's governed.

Transcripts, emails, and documents receive automatic LLM enrichment on ingest — summaries, key decisions, action items, and topics. Canon vault documents receive no enrichment. They are human-authored governed content, preserved exactly as written. This distinction is a governance invariant, not a feature toggle.

// enrichment profiles

transcript → summary, decisions, actions, topics

email → thread summary, decisions, actions, topics

document → summary, key points, topics

canon → none (human-authored = authoritative)

Built to be replaced.

The Knowledge Platform Contract defines every interface. Read how substitutability is designed in from day one.

Read the Contract →Start a Conversation

Four sources.One governed platform.

Dual-layer storage model.

Knowledge matures.It doesn't just accumulate.

Three ways to ask.

Enrich what's raw.Preserve what's governed.

Built to be replaced.

Four sources.
One governed platform.

Knowledge matures.
It doesn't just accumulate.

Enrich what's raw.
Preserve what's governed.