Building Spec-Driven, Policy-Enforced Agents with MCP, DSPy, and PKM
This cluster defines the structural and procedural contract for the book.
It specifies: - what constitutes a chapter, - required invariants for promotion, - lifecycle stages from Zettel → Chapter → Skill, - enforcement modes (local vs GitHub), - and failure modes of large, evolving knowledge bases.
No domain content belongs here.
This contract governs all other chapters.
A chapter is a coherent, promotable unit of knowledge.
A chapter is not prose length or topic count — it is a contractual unit.
MoC_Chapter_<N>_<Title>.md)docs/book/If a cluster cannot be promoted deterministically,
it is not yet a chapter.
Every chapter in this book MUST satisfy the following invariants.
These are enforced mechanically where possible.
Breaking an invariant blocks promotion.
Knowledge in this system progresses through explicit stages.
There are no implicit promotions.
If knowledge cannot survive promotion, it is not ready to be trusted.
This book is designed to resist known failure modes.
Treating agents or tools as peers instead of apprentices.
This book explicitly rejects that model.
Structure exists to preserve meaning under scale.
Modern computational workflows increasingly suffer from fragmentation:
tools do not speak to one another, data lives in silos, and reasoning happens in disconnected layers. As systems grow in complexity, the cognitive overhead on the human grows instead of shrinking.
Agentic automation emerges as a necessary response.
Instead of scripting every interaction explicitly, we allow structured agents to reason, plan, and execute using standardized interfaces and external tools.
The problem space is defined by:
This motivates the convergence of MCP, Spec-Driven Development, DSPy, and PKM.
The Model Context Protocol (MCP) defines a universal way for LLMs to interact with external tools, resources, and data systems. It transforms tools into discoverable, typed API-like interfaces that the model can reason about and invoke autonomously.
Key ideas:
MCP solves the fragmentation problem by standardizing how software exposes functionality to agents, enabling interoperability and composability at scale.
Spec-Driven Development (SDD) shifts the engineering focus from code-first to interface-first design. Instead of writing code and documenting it afterward, SDD requires that developers write structured specifications defining:
The implementations are then generated, validated, or scaffolded by tools like specify and codex.
This approach:
SDD is the natural complement to MCP, offering a disciplined way to define agent tool behavior.
DSPy provides a systematic method for converting declarative agent logic into optimized LLM programs. Unlike prompt engineering, DSPy compiles agent behavior through:
It is the missing optimization layer for agent tool use.
Where MCP defines what tools exist and how they behave, DSPy defines how the agent should think when using those tools.
DSPy improves:
DSPy is essential for scaling agent systems from handcrafted prototypes to robust, production-level behavior.
Personal Knowledge Management (PKM) systems like Obsidian are knowledge substrates, not just note containers. They form the long-term memory layer upon which agentic systems can reason and act.
Agents need:
Zettelkasten provides this through:
Agents + PKM create a hybrid intelligence loop:
your vault becomes the agent’s brain, and the agent becomes your cognitive amplifier.
The foundational concepts underpinning this book converge on a single insight:
modern agentic systems require rigorously defined tool interfaces (MCP), interface-first engineering (Spec-Driven Development), systematic optimization of reasoning processes (DSPy), and a durable human knowledge substrate (PKM).
When combined, these create a unified architecture where:
This chapter frames the vocabulary and mental models needed for the deeper engineering work ahead.
An agent constitution is a meta-level structure that governs how an AI agent behaves, reasons, and interacts with tools. It defines the rules of agency: what the system must do, may do, must not do, and how it should resolve ambiguity.
Without a constitution, agents drift. Their behavior becomes inconsistent, fragile, or inefficient. A constitution acts as:
For multi-tool MCP systems, the constitution prevents chaos by establishing stable norms the agent follows regardless of context.
Global principles define the stable behavioral invariants an agent must uphold. These principles do not change between tasks, domains, or toolsets.
Examples of global principles:
These principles establish the character of the agent.
They are higher-level than tool contracts and independent of implementation.
Safety rules constrain how an agent may act, regardless of its capabilities. They form the boundary layer for all decisions.
Categories of constraints:
These are non-negotiable: - Do not execute destructive or irreversible tool calls without explicit user approval. - Do not exceed specified tool limits. - Do not fabricate tool outputs.
These guide good behavior but allow flexibility: - Prefer interpretable steps. - Choose the safest viable strategy when uncertain.
These map to legal, organizational, or ethical requirements such as: - Data retention policies
- Privacy restrictions
- Corporate governance restrictions
- MCP-level tool access permissions
Together, these create a safety envelope for the agent’s reasoning.
Naming and versioning rules ensure that an agent’s tools, signatures, and reasoning modules remain navigable and stable.
These conventions ensure durability, reproducibility, and scalability across agent generations.
The constitution is the governing layer, but it does not replace specs, MCP interfaces, or DSPy modules. Instead, it coordinates them.
Specs define what tools do.
The constitution defines how the agent should behave when using them.
MCP exposes capabilities.
The constitution enforces behavioral norms when invoking those capabilities.
DSPy optimizes reasoning patterns.
The constitution sets the normative boundaries for acceptable reasoning.
Together, these create a coherent agent architecture where:
This unifies the agent, ensuring stability across tasks and environments.
A specification (spec) is a structured description of how a software component should behave.
It defines interfaces, constraints, inputs, outputs, and allowed behaviors—not implementation details.
Specs are not documentation.
They are executable contracts between human intent, tools, and agents.
A high-quality spec ensures:
A high-quality spec contains several core components:
These elements allow Codex, Specify, MCP tools, and DSPy modules to reason about the system with clarity.
Spec-First engineering shifts design from implementation-driven to interface-driven development.
Workflow:
specify.This prevents architectural drift and enhances modularity.
Specs translate directly into MCP tool definitions.
Mapping:
input_schemaoutput_schemaThis enables consistent, version-controlled tool behavior exposed to agents.
Every spec produces a DSPy signature, defining:
DSPy compiles these signatures into optimized LLM reasoning modules.
The SDD feedback loop ensures that specs, MCP tools, and DSPy modules evolve coherently.
Loop steps:
This creates a virtuous cycle of clarity → execution → refinement.
An MCP server exposes tools—structured, validated functions—that an AI agent can call.
The server is not an LLM; it is a capability layer.
Key roles:
The agent becomes the reasoning layer; the server becomes the action layer.
The MCP protocol defines how clients and servers communicate.
Core elements:
The protocol enforces structure so agents have predictable interaction patterns.
Every MCP tool includes:
Schemas enforce correctness and stability.
Ideally, they should be mechanically generated from specify specs.
MCP tools expose structured error signatures.
Types of errors:
Error models help agents reason safely under uncertainty.
Capabilities describe what the server offers:
Versioning supports:
Specs should evolve in lockstep with capability changes.
MCP tools should be stateless unless necessary.
Why?
But sessions may hold lightweight state such as:
State-heavy logic belongs outside the tool itself, often handled by DSPy modules.
MCP, DSPy, and PKM form a three-layer agentic system:
The PKM agent you’re designing will:
This triad unlocks a modular AI system that learns, acts, and evolves.
DSPy is a declarative optimization framework for teaching LLMs how to reason through clearly defined interfaces.
Unlike prompting, DSPy uses:
DSPy treats reasoning steps as trainable programs, not static prompts.
A DSPy signature defines the structure of a reasoning task.
Components include:
Signatures resemble SDD specifications and can be derived from specs directly.
DSPy modules are learned reasoning components that fulfill signatures.
Types include:
Modules can be composed into reasoning pipelines, enabling complex behaviors.
DSPy uses declarative optimization to improve reasoning:
DSPy evaluates candidate reasoning paths against scoring functions and rewrites the internal reasoning program.
DSPy modules can call MCP tools as part of their reasoning.
DSPy learns:
DSPy becomes the decision layer, MCP becomes the action layer.
DSPy supports multiple memory types:
Your PKM system becomes a semantic backbone, enabling:
DSPy composes signatures and modules into reasoning graphs.
A reasoning graph defines:
DSPy optimizes the entire graph, not individual calls.
DSPy becomes the reasoning engine of your personal PKM agent.
The agent integrates:
Benefits:
PKM provides long-term semantic memory for agent systems.
Unlike ephemeral model context, PKM persists over time and becomes the agent’s stable source of truth.
PKM allows: - durable knowledge accumulation
- predictable retrieval
- explainable reasoning
- memory across sessions
Zettelkasten introduces atomicity, linking, and evergreen structure, aligning naturally with agentic reasoning.
Zettelkasten is a machine-readable conceptual graph:
Because each note is atomic and explicitly linked, agents can traverse the knowledge graph predictably.
Obsidian functions as an operating system for PKM:
Its structure is ideal for integration with MCP tools and DSPy retrieval.
A PKM vault for agent use must be:
Recommended folders:
Pure Zettel/MoC/Projects/Resources/Agents must reliably locate notes and metadata.
Metadata gives structure to PKM:
This schema must be machine-readable so MCP tools and DSPy modules can use it.
MCP enables structured PKM interaction.
Recommended tools:
read_notewrite_notesearch_noteslist_linkscompute_embeddingsSafety considerations include preventing note corruption and enforcing schema rules.
DSPy retrieves PKM content using:
DSPy must decide: - when to retrieve
- what to retrieve
- how to integrate retrieved notes into reasoning
PKM becomes the agent’s identity layer.
The vault defines:
PKM should evolve but remain coherent.
The PKM–MCP–DSPy loop forms a continuous learning engine:
This creates a self-improving agent grounded in real, stored knowledge.
An agent is a system that can:
In this book’s context, an agent is not just:
Instead, it is a layered system whose behavior emerges from:
This book adopts a strict four-layer architecture for agents:
These layers are conceptually distinct and communicate via explicit interfaces.
flowchart TB
subgraph Constitution_Layer [Constitution Layer]
end
subgraph PKM_Layer [PKM Layer]
end
subgraph DSPy_Layer [DSPy Layer]
end
subgraph MCP_Layer [MCP Layer]
end
PKM_Layer --> DSPy_Layer
DSPy_Layer --> MCP_Layer
MCP_Layer --> DSPy_Layer
DSPy_Layer --> PKM_Layer
Constitution_Layer --- PKM_Layer
Constitution_Layer --- DSPy_Layer
Constitution_Layer --- MCP_Layer
The Constitution constrains all three operational layers, but is kept conceptually distinct so it can be versioned, audited, and reasoned about independently.
The PKM Layer is the agent’s long-term semantic memory and identity.
It contains:
The agent uses PKM to:
The PKM layer is read and written via MCP tools and interpreted via DSPy reasoning modules.
The DSPy Layer is responsible for reasoning and planning.
It operates over:
DSPy expresses reasoning as:
In this architecture, the DSPy layer decides:
The MCP Layer is the agent’s action surface.
It exposes:
MCP tools are:
The DSPy layer calls MCP tools; MCP never calls DSPy directly.
This preserves a clear separation between reasoning and action.
The Constitution Layer defines what the agent may and may not do.
It constrains:
The Constitution is:
DSPy modules and MCP tools must both respect constitutional constraints.
An agent run can be described as a lifecycle:
flowchart LR
UserQuery[User / Environment Input]
Context[PKM Context Retrieval]
Plan[DSPy Planning]
Act[MCP Tool Calls]
Evaluate[Evaluate & Check Constitution]
Update[Update PKM (Optional)]
Reply[Return Answer]
UserQuery --> Context --> Plan --> Act --> Evaluate --> Reply
Evaluate --> Update
Update --> Context
This loop is executed under the watch of the Constitution Layer, which can block or reshape plans and actions.
A reasoning graph for a PKM agent is a structured set of DSPy signatures and modules.
Example structure:
flowchart TD
N1[Interpret Intent]
N2[Retrieve PKM Context]
N3[Synthesize Draft Answer]
N4[Decide on Tool Use]
N5[Refine & Propose Updates]
N1 --> N2 --> N3 --> N4 --> N5
DSPy treats each node as a trainable, optimizable unit, enabling the agent to improve its behavior over time.
Agents can fail in several ways:
Recovery strategies include:
Robust agents are designed with explicit failure-handling pathways rather than relying on best-case behavior.
Consider a user query:
“Summarize my recent thinking about MCP servers and suggest the next three steps to implement my PKM agent.”
A single agent run might proceed as:
mcp, agents, and pkm.sequenceDiagram
participant U as User
participant D as DSPy Layer
participant P as PKM Layer
participant M as MCP Layer
participant C as Constitution
U->>D: Query: MCP servers + next steps
D->>P: Retrieve MCP + PKM notes
P-->>D: Relevant Zettels
D->>D: Plan reasoning graph
D->>M: Optional tool calls (inspect project state)
M-->>D: Tool results
D->>C: Check plan & actions
C-->>D: Approve / adjust
D->>P: Propose new planning Zettel
P-->>D: Confirm write (or user approval)
D-->>U: Answer + next steps + note update
This example illustrates how all four layers participate in a single coherent agent run.
write_notewrite_note: Why this is the first examplewrite_note is the smallest end-to-end slice that touches all layers:
Starting with an existing tool forces the loop to prove process quality instead of “feature velocity”.
write_note: Overwrite semanticsThe system forbids silent overwrites:
overwrite=true.overwrite=true, the response must explicitly report whether a file was overwritten (e.g., overwritten: true/false).Why it matters for PKM: - Notes are “knowledge artifacts”; silent overwrites destroy trust. - Agents must be constrained to require explicit intent for mutation.
write_note: Specs, tests, and examples form a contractA spec-only system drifts. A code-only system is opaque.
The stable triangle is:
If any corner changes, the other two must change too. Otherwise: the “tool” becomes a rumor.
write_note: From Zettels → Skills → CodeSkills should follow Zettels: the procedure is derived from already-reasoned design.
Related cluster: - [[MoC_From_Zettels_to_Skills]]
write_note: What to copy for the next toolWhen adding the next tool, copy this pattern:
The compounding phase works when “add tool” becomes mechanical.
search_notes is the primary way an agent scans the vault without risking mutation. It indexes only markdown notes with frontmatter, then surfaces ids, titles, tags, and context snippets so downstream tools can decide whether to read or ignore a candidate note.
The tool keeps parity between how humans browse the vault and how agents triage it: it honors titles and tags, finds matches in both metadata and body, and returns the filesystem path so callers can confirm provenance. Because it short-circuits at the requested limit and does not change any files, it is the safe default for discovery in compounding loops.
Inputs - query (required): non-empty string; search is case-insensitive. - limit (optional): integer 1..50; defaults to 10 and short-circuits once satisfied. - tags (optional): array of strings; all listed tags must be present on a note to be included.
Semantics - Matching runs over title, tags, and body text; notes without frontmatter are skipped. - Snippet centers on the first match with ~40 chars of lead-in and ~120 chars after; falls back to the first 160 chars or the title if needed. - Results preserve the traversal order of the vault and stop at limit.
Outputs - Array of objects with id, title, snippet, path (absolute), and tags. - No side effects; responses are pure views of current vault state.
Integration tests gate the behavior: - test_search_notes_returns_matches seeds two notes and asserts the matching note id, snippet inclusion of the query, and a real path. - test_search_notes_filters_by_tags proves tag filters are conjunctive: only notes containing all requested tags survive. - test_search_notes_rejects_missing_query enforces JSON Schema validation by raising ValueError when query is absent.
The walkthrough script examples/walkthrough/02_search_notes.py defaults to a temporary vault, seeds demo notes, calls search_notes with a limit of 5, and prints the returned metadata. Together, the tests and example form an executable contract for agents and humans.
search_notes only reads files. In PKMServer.call_tool it routes to vault.search_zettels which walks the vault, reads frontmatter and bodies, and assembles view-only dictionaries. No branches mutate disk.
JSON Schema validation and tag-filter checks fail fast on bad inputs, keeping agents deterministic. The hard limit cap (50) prevents unbounded traversal in constrained sessions. Because it never touches write_note, the safeguards D004/D005 remain untouched while agents still get the context they need to decide next actions.
Contract first: the tool spec in modules/pkm_tools/tools.yml defines inputs/outputs; PKMServer.list_tools enforces completeness (D001) and validate_tool_args enforces the JSON Schema boundary (D003). Tests under tests/integration keep the contract executable (D008).
The new Skill in .codex/skills/full-stack-search-notes/SKILL.md operationalizes the read-only loop: start with search_notes, decide on follow-on reads, and stop before any writes. This keeps reasoning (DSPy), contract (spec/tests), and execution (MCP server) separated per D007 while honoring vault-first authorship (D009/D010).
update_noteupdate_note safety modelBook Zettel/MCP with DSPy Theory and Application/ or _agent_inbox/; anything else is rejected.overwrite=true; otherwise the call fails, preventing silent updates.overwritten=true for traceability and downstream routing.update_noteoverwrite must be explicitly true; missing or false results in refusal.Book Zettel/MCP with DSPy Theory and Application/ or _agent_inbox/.id_index_statsid_index_stats exists: observability + determinismid_index_stats is a read-only observability hook over the vault id index. It answers: is the id space clean enough to allow id-based operations to be deterministic?
id_index_statsid_index_stats must return vault-relative POSIX paths for any duplicate evidence. Absolute paths leak machine-local details and break portability; relative paths keep the contract stable across environments.
vault_root.id_index_stats contract and semanticsInput contract (JSON Schema): - scope (optional string): vault-relative scope override for indexing; defaults to allowed write roots. - include_duplicates (boolean, default true): include duplicate listings.
Output contract: - scope: vault-relative scope string actually used. - total_notes: count of distinct note paths indexed. - unique_ids: count of unique ids across stems and frontmatter. - duplicates (optional): map of duplicate ids → vault-relative POSIX paths. - build_ms: time to build the index (ms) for observability.
Semantics: - Read-only inspection; no writes or mutations are allowed. - Index keys include both filenames and frontmatter ids, so collisions are visible even when stems differ. - Duplicate reporting obeys D012 by returning vault-relative paths only.
id_index_stats contracttest_id_index_stats_reports_duplicates_with_relative_paths seeds duplicate ids and asserts:
total_notes counts distinct note paths; unique_ids counts both stems and frontmatter ids.duplicates exists and lists vault-relative POSIX paths only (no drive letters, no backslashes), locking in D012.examples/walkthrough/05_id_index_stats.py demonstrates read-only usage and shows JSON output, doubling as an executable sample for contract expectations.id_index_stats enables safe id-based mutation laterid_index_stats is the gatekeeper for any future id-based mutation tool (e.g., update-by-id):
resolve_note_id; keeping stats separate and read-only avoids side effects while still validating the index health.append_to_noteappend_to_note: Why append is safer than overwritesoverwrite=true), making mutation deliberate even though the payload is additive.append_to_note: Consent semantics + D011 write fenceoverwrite=true, even though the action is additive; this is explicit consent for mutation.Book Zettel/MCP with DSPy Theory and Application/ or _agent_inbox/; _ensure_write_allowed enforces the fence and raises PermissionError otherwise.PKM_ENABLE_ID_INDEX=1 to avoid implicit note selection; otherwise path-only writes are allowed.Path escapes vault root).append_to_note: D012 relative paths in mutation outputspath relative to vault_root (_relative_to_root_posix), satisfying D012.append_to_note: Contract + edge casespath or id is required; providing both or neither raises ValueError.separator defaults to double newline; the appender normalizes trailing newlines so the appended payload does not double-insert or strip blank lines.bytes_added for observability.PKM_ENABLE_ID_INDEX=1; otherwise callers must pass a path inside the allowed roots.append_to_note: Tests and examples enforce the contractoverwrite=true), exactly one locator, ID lookup opt-in, write fence respected, and vault-relative path return.bytes_added aligns with the appended payload) and that original content remains intact.bytes_added for observability.append_to_note: How this supports safe agent journalingoverwrite=true) turns every journal write into an intentional act, reducing accidental spam.list_notes gives agents an inventory-first view of the vault, anchored to the default book scope so the first pass surfaces curated material without crafting a query. search_notes answers targeted questions; list_notes answers “what exists here?” with deterministic ordering and paging. Together they form a two-step discovery pattern: enumerate scoped, tagged assets via list_notes, then pivot to search_notes for deeper retrieval. Because both tools are read-only, agents can explore safely before deciding whether a mutation is needed.
Inputs - scope (optional): vault-relative root; defaults to Book Zettel/MCP with DSPy Theory and Application. - query (optional): case-insensitive substring against title, filename stem, and a 200-char preview. - tags (optional): array; all requested tags must be present on the note. - limit (optional): 1..500, default 50; applied after sorting. - offset (optional): >=0, default 0; applied after sorting. - sort (optional): one of path (default, stable lexicographic POSIX), title (lowercased), or mtime (ISO string).
Outputs - notes: array of objects {path, title, id|null, mtime, tags}; path is vault-relative POSIX per D012; title prefers frontmatter; id only if frontmatter exists. - total: count before paging. - scope: vault-relative scope actually used.
Determinism - Scope normalization rejects absolute paths or traversal, fixing the search root. - Sorting happens before paging, so offset/limit slices are stable across calls until files change. - Query and tag filters are pure predicates; no random sampling or pagination drift.
The tool never writes; it only reads markdown files and returns metadata. _normalize_scope rejects absolute paths, drive letters, and .., preventing traversal outside the vault. _resolve_within_vault guards against symlink escape, and missing scopes return empty results instead of failing. Paths in responses are rendered via _relative_to_root_posix, enforcing D012 vault-relative POSIX strings and stripping host-specific drives. Input guards enforce limit within 1..500 and non-negative offset, capping traversal. Because the walker ignores non-markdown files and honors required tags, the agent sees only scoped, compliant items.
Integration coverage in tests/integration/test_list_notes.py asserts the contract: - Default call uses the book scope, excludes files outside it, and sorts paths lexicographically. - Paging and filtering preserve total, keep offsets stable after sort, and prove substring queries work. - Scope traversal like ../secret raises ValueError; paths in responses are vault-relative POSIX with no drive prefixes.
The walkthrough examples/walkthrough/07_list_notes.py seeds a temp vault, runs list_notes, and prints the JSON response, providing an executable demo aligned with the spec.
list_notes grounds the agent in what actually exists before it speculates. The response carries real vault-relative paths, titles, ids (when present), and tags, so downstream reasoning can cite concrete artifacts instead of inventing notes. Deterministic sort and paging let the agent revisit slices without drift, keeping deliberation reproducible. Using the default book scope biases exploration toward curated material while still allowing scoped overrides. Coupled with search_notes, the agent can pivot from inventory to relevance without hallucinating structure or filenames.
Because results include path, title, id, mtime, and tags, list_notes can feed MoCs, tag indexes, and timeline views without another crawl. POSIX vault-relative paths drop directly into wikilinks or JSON outputs used by downstream compilers. Stable sort options (path, title, mtime) let agents build reproducible navigation structures. Tag-conjunctive filtering is already MoC-friendly: a MoC can request specific tags to generate focused tables of contents. As new navigation surfaces emerge, this tool supplies the canonical, scope-bounded inventory to populate them.
Append-only journaling is the safest memory substrate for agents because it: - preserves chronological trace without accidental rewrites (aligns with D004/D005). - keeps diffs auditable and reversible; every entry is an additive fact. - removes locking and race complexity common in in-place edits. - works with existing tools (append_to_note on top of a created note) so no new surface area is needed.
This pattern treats each journal entry as an immutable breadcrumb, enabling replay, debugging, and alignment checks over long sessions.
Convention for journaling within the D011 fence (Book Zettel/MCP with DSPy Theory and Application/): - Title format: Journal YYYY-MM-DD (sanitizes to Journal_YYYY-MM-DD.md), sortable and unambiguous. - All journal files live directly under the fence (no absolute paths; D012 keeps references vault-relative). - Reuse the same note for the day; all entries append to this file.
Rationale: - Predictable naming makes list_notes/search_notes cheap and targeted to a handful of candidates. - Staying inside the fence avoids accidental writes elsewhere in the vault.
Operational recipe: 1) Probe for the daily note; if absent, call write_note(title=Journal YYYY-MM-DD) to create it. Default overwrite=false enforces D004/D005. 2) Add entries with append_to_note(note_id=Journal_YYYY-MM-DD); do not flip overwrite=true unless rerunning an idempotent migration. 3) Keep entries timestamped inside the body so the audit trail is visible when reading.
Why this matters: - Separates creation from mutation, reducing blast radius when a creation step is retried. - Aligns with append-only intent while still allowing explicit recovery paths when necessary.
Preferred retrieval flow: - list_notes(prefix="Journal_YYYY") to bound the candidate set for the year. - search_notes(query="Journal YYYY-MM-DD") when exact day lookup is needed. - read_note(note_id=Journal_YYYY-MM-DD) to load the body for review or summarization.
Notes: - Tool responses return vault-relative paths (D012), preserving portability. - Avoid globbing across the whole vault; scoped queries keep latency low and respect D011.
The append-only loop can be wrapped later into append_journal_entry without altering behavior because: - Creation and appending are already separated; a wrapper can orchestrate write_note then append_to_note idempotently. - Naming is deterministic (Journal YYYY-MM-DD), so the wrapper can derive the target note without additional schema. - Retrieval remains unchanged; callers can still fall back to list_notes/search_notes/read_note if the wrapper is unavailable.
Design implication: build the wrapper as a thin orchestration layer, not a new storage primitive. Keep the append-only contract explicit in its schema and diagnostics.
Goal: rehearse the append-only loop in a temp vault before touching the canonical vault.
vault_root points to the sandbox location.Journal YYYY-MM-DD; if it does not exist, create it with write_note (no overwrite flag set).append_to_note, keeping timestamps in the entry body.list_notes/search_notes, then read_note the latest journal for memory refresh.This cluster explains how conceptual knowledge (Zettels) is transformed into operational procedures (Codex Skills) without losing architectural intent.
Zettels capture why and what. Skills encode how in a reusable, enforceable form.
This material belongs between Design Philosophy and Worked Implementations.
In later chapters, the ideas in this cluster are instantiated as Codex Skills in the companion repository.
These Skills encode: - repository hygiene - MCP tool addition workflows - safety and overwrite guarantees
They do not introduce new ideas. They are executable forms of architectural intent already captured here.
Readers may return to this cluster when evaluating whether a Skill preserves or violates the system’s design philosophy.
Zettels are optimized for reasoning, reflection, and explanation. They capture architectural intent, tradeoffs, and conceptual boundaries.
Skills, by contrast, are optimized for execution. They encode repeatable procedures that an agent can follow without reinterpretation.
The transition from Zettels to Skills marks the point where intent becomes operational.
Skills can be understood as procedural Zettels.
Like Zettels, they are: - small in scope - focused on a single idea - composable
Unlike Zettels, they are: - imperative - executable - constrained by acceptance criteria
This makes Skills the natural operational counterpart to a Zettelkasten-based design system.
Skills should never replace Zettels.
Zettels remain the source of truth for: - architectural rationale - design constraints - long-term understanding
Skills are derived artifacts. They operationalize decisions that have already been reasoned about and recorded.
Reversing this order leads to brittle systems and opaque agent behavior.
Skills do not operate in isolation. They execute within the constraints of a repository.
The repository provides: - directory structure - specs and schemas - tests - decision logs
This makes the repository the runtime environment for Skills, just as it is the constitution for agents.
GitHub-mode treats the vault as immutable during execution. Direct writes are forbidden because the repository functions as an execution sandbox rather than the source of truth for PKM.
Blocking writes prevents irreversible corruption, keeps side effects reviewable, and forces every change through deliberate human promotion.
Links: [[MoC_Execution_Context_GitHub_Mode]] · [[GitHub_Mode_Spool_Root_Semantics]]
Write spooling redirects every mutation attempt into a controlled spool root. The staging area uses deterministic paths so outputs stay predictable and git-reviewable without touching the canonical vault.
Spooling keeps autonomous agent work safe while preserving a clear handoff for human review and promotion.
Links: [[MoC_Execution_Context_GitHub_Mode]] · [[GitHub_Mode_Tool_Output_Invariants]]
GitHub-mode must produce identical outputs across local runs, CI, and GitHub Actions. Determinism depends on vault-relative paths, avoidance of working-directory assumptions, and explicit execution-context resolution before any tool call.
When tools spool to the staging root, outputs remain predictable, diffable, and ready for promotion.
Absolute paths anchor reasoning to a specific machine. When an agent plans a sequence of operations using absolute paths, those plans become non-portable:
C:\Users\Alice\vault\Notes\example.md/home/runner/work/repo/vault/Notes/example.md/Users/Bob/projects/vault/Notes/example.mdThe agent’s plan breaks when executed in a different environment. Even if tools succeed locally, they fail in CI or when the repository is cloned elsewhere.
Vault-relative paths stabilize reasoning across contexts:
Notes/example.md (relative to vault root)This enables agents to construct multi-step plans that remain valid regardless of where the repository is mounted. The execution-context layer resolves the vault root at runtime, ensuring tool calls reference the correct absolute locations transparently.
Local Mode (Unsafe):
Agent plan:
1. Read `/home/alice/vault/Book Zettel/Chapter_03.md`
2. Write summary to `/home/alice/vault/Summaries/Chapter_03_Summary.md`
This plan fails on Bob’s machine and in CI.
GitHub-mode (Correct):
Agent plan:
1. Read `Book Zettel/Chapter_03.md` (vault-relative)
2. Spool summary to `staging/Summaries/Chapter_03_Summary.md` (spool-relative)
This plan succeeds everywhere. The execution context resolves the vault root and spool root before invoking tools.
Links: [[MoC_Execution_Context_GitHub_Mode]] · [[GitHub_Mode_Failure_Modes]] · [[GitHub_Mode_Is_A_Different_Execution_Contract]]
Breaking the GitHub-mode contract reintroduces agent risk. Direct vault writes bypass human mediation, making PKM corruption and unreviewable side effects likely. Ignoring vault-relative paths or deterministic spooling causes divergent outputs between local runs and CI, eroding trust in automation.
These failures collapse the safety boundary that GitHub-mode is meant to enforce.
Mixed write semantics are more dangerous than hard failure. If a tool writes some outputs to the vault and others to the spool, the system enters an inconsistent state:
This failure mode undermines the core GitHub-mode guarantee: all vault changes are human-mediated. Partial writes violate this by hiding some changes from the promotion workflow.
Hard failure is correct behavior: If the agent cannot write to the spool, the entire operation should fail visibly. This preserves system integrity and forces explicit handling of the error condition.
The no-vault-writes invariant ([[Book Zettel/MCP with DSPy Theory and Application/Execution Contexts/GitHub Mode/GitHub_Mode_No_Vault_Writes]]) exists to prevent silent partial success. Enforcement must be absolute—no exceptions for “safe” subfolders or “temporary” writes.
Links: [[MoC_Execution_Context_GitHub_Mode]] · [[GitHub_Mode_Why_This_Is_Not_Optional]] · [[GitHub_Mode_Is_A_Different_Execution_Contract]]
GitHub-mode removes direct write authority and routes changes through spooling so humans can review, regenerate, and validate before promotion. The guardrails keep long-running or autonomous tasks aligned, and they keep CI and GitHub Actions deterministic with the same path rules and spool root.
Without this contract, safety, reproducibility, and multi-agent correctness degrade immediately.
Links: [[MoC_Execution_Context_GitHub_Mode]] · [[Book Zettel/MCP with DSPy Theory and Application/Execution Contexts/GitHub Mode/GitHub_Mode_No_Vault_Writes]]
GitHub-mode is NOT a restricted version of local mode. It is a distinct execution contract with a fundamentally different trust model.
In local mode, the agent writes directly to the vault. Human review is implicit—the user watches changes appear and manually rejects mistakes. This assumes synchronous human attention and full-time monitoring.
In GitHub-mode, the agent spools outputs to staging without touching the vault. Human review is explicit—changes are promoted only after approval. This enables unattended execution in CI, Actions, and scheduled workflows.
The vault write fence (D011) is not a convenience feature or an optimization. It is a correctness boundary. Without it:
Local mode trusts the user to monitor the agent continuously. GitHub-mode trusts the promotion workflow to mediate all changes.
These are incompatible assumptions. Conflating them erodes the guarantees that make GitHub-mode safe for automation.
GitHub-mode is not “local mode minus vault writes.” Treating it as such invites unsafe optimizations. For example:
GitHub-mode must be implemented as a distinct execution path, not a flag or runtime toggle.
All GitHub-mode guarantees must be enforced in code, not convention or policy. The execution context layer resolves which contract applies before any tool is invoked.
Links: [[MoC_Execution_Context_GitHub_Mode]] · [[Book Zettel/MCP with DSPy Theory and Application/Execution Contexts/GitHub Mode/GitHub_Mode_No_Vault_Writes]] · [[GitHub_Mode_Why_This_Is_Not_Optional]]
A cluster graduates once its core claims are runnable, verified, and reused by other workflows. Promotion is justified when the cluster is the smallest unit that reliably delivers value without manual curation.
Signals of graduation: - A stable interface exists (inputs, outputs, side effects). - The cluster can be tested end-to-end in CI. - Consumers depend on it as a contract, not as a narrative.
A skill is a contract that can run. The contract is defined by the behavior surface (inputs/outputs), and the execution proves the promise. The D009–D012 stance shifts from “document intent” to “execute intent”.
Executable contracts: - Encode invariants in tests and fixtures. - Treat prompts and tool chains as versioned interfaces. - Make deviations visible before they reach production.
Acceptance criteria define what it means for a skill to be “true”. They are minimal, testable, and tied to observable outcomes. A promoted skill ships only when criteria are precise enough to automate.
Criteria checklist: - Defined happy path and explicit edge cases. - Deterministic inputs and measurable outputs. - Failure modes documented with guardrails.
CI is the policy engine for promoted skills. It enforces contracts continuously rather than validating them once. The goal is to prevent drift, not to approve changes after the fact.
Enforcement stance: - Fail fast on contract violations. - Surface regressions in skill behavior, not just code coverage. - Treat flaky tests as skill instability.
Skill drift shows up as silent regressions between intent and execution. It happens when a skill’s contract is stable but its behavior mutates.
Common failure modes: - Tests cover the old contract, not the new behavior. - Upstream data changes invalidate assumptions. - Tooling updates shift the execution surface without re-certification.
Promotion gates define when a cluster can graduate to a skill. They pair D009– D012 intent with verifiable evidence.
Graduation checklist: - Contract and acceptance criteria committed. - CI suite enforces behavior across fixtures. - Owners and consumers agree on stability horizon. - Monitoring or retraining plan in place.
Zettels capture intent; skills operationalize it. Promotion turns architectural principles into runnable workflows with measurable outcomes. D009–D012 provide the rationale, while the skill package makes it enforceable.
Operationalization steps: - Distill the cluster into a single behavior contract. - Encode the contract in tests and CI gates. - Promote once the behavior is stable across scenarios.
A tool performs a bounded action with predictable output. An agent interprets intent, makes choices, and can drift without guardrails, so it needs supervision beyond a command line.
When you treat an agent like a deterministic tool, you skip review and context. The result is brittle automation that fails quietly and erodes trust.
Agents can propose and execute steps, but they do not own outcomes in the way a human teammate does. Treating them as peers blurs responsibility and invites silent failure.
Peer work depends on shared context, long-term judgment, and the ability to negotiate tradeoffs. Agents need explicit constraints and review to approximate that level of alignment.
Apprentices work with clear tasks, feedback loops, and incremental responsibility. Agents benefit from the same structure: explicit goals, bounded scope, and frequent check-ins.
Unsupervised autonomy amplifies misunderstandings. Treat autonomy as a graduation outcome after repeated, verified success.
Execution contexts define what an agent can touch, from read-only analysis to write access. The context should match the risk profile of the task.
High-risk contexts require tighter review and smaller steps. Low-risk contexts can tolerate more autonomy, but still require visibility.
A skill should teach how work is done, not just what to run. It packages conventions, guardrails, and examples so agents learn the local workflow.
When skills are framed as onboarding packets, the agent becomes easier to supervise and more predictable across tasks.
Continuous integration should surface misalignment early, not serve as a last-minute approval gate. The goal is to catch drift while it is still small.
CI can verify invariants, but it cannot judge intent. Treat CI as a supervisor that flags issues for human confirmation.
Promotion means expanding scope only after repeated, verified success. Each step increases responsibility and the cost of mistakes.
Use clear signals: consistent task completion, reduced review overhead, and correct use of constraints. Without these, promotion is premature.
Treating agents like tools skips review and context checks, which leads to subtle errors that look like automation success until they compound.
Treating agents like peers assumes judgment they do not have. This produces over-scoped tasks, vague requirements, and avoidable rework.
The junior-engineer model is scalable because it standardizes supervision: clear scopes, shared skills, and consistent review loops.
Graduated permissions and visible checkpoints allow more work to run in parallel without losing control or accountability.
Supervision is required because agent output cannot self-certify correctness or safety. The operator must validate results against explicit criteria and treat uncertainty as expected, not exceptional. This keeps accountability with humans and prevents silent drift.
Shadow execution keeps the agent in advisory mode while a human performs the actions. Dual-run compares the agent’s output against a trusted baseline and focuses on deltas, not intent. Use these patterns when actions are irreversible or when new behaviors require validation.
Review gates define explicit checkpoints with evidence requirements before work is accepted. Human signoff confirms the criteria were met and assigns accountability for release. Gates prevent bypass and ensure acceptance is a deliberate decision.
Do not automate when intent is ambiguous, effects are irreversible, or the system state cannot be observed. Refusal is the correct outcome when a human must clarify goals or accept responsibility for risk.
Feedback loops capture failures as concrete artifacts and turn them into updated notes, checklists, or tests. Corrections must be evidence-driven and linked to the gate that now enforces them, closing the loop without relying on ad hoc prompts.
A minimal, safe MCP agent pipeline can create book-ready Zettels by enforcing contracts at tool boundaries and requiring explicit write intent.
_agent_inbox with write_note.staging/.docs/book from vault MoCs.pkm + source[[MoC_Chapter_00_Book_Contract]]
[[Book Contract/MoC_Book_Contract]]
[[Book Contract/What_Constitutes_a_Chapter]]
[[Book Contract/Required_Chapter_Invariants]]
[[Book Contract/Promotion_Lifecycle_of_Knowledge]]
[[Book Contract/Failure_Modes_of_Large_Knowledge_Bases]]
[[MoC_Chapter_01_Foundations]]
[[Chapter_01_The_Problem_Space]]
[[Chapter_01_What_is_MCP]]
[[Chapter_01_Spec_Driven_Development]]
[[Chapter_01_Why_DSPy_Matters]]
[[Chapter_01_PKM_and_Agents]]
[[Chapter_01_Foundations_Summary]]
[[MoC_Chapter_02_Constitution_Layer]]
[[Chapter_02_Purpose_of_Agent_Constitution]]
[[Chapter_02_Global_Principles_and_Guarantees]]
[[Chapter_02_Safety_and_Compliance]]
[[Chapter_02_Naming_and_Versioning_Conventions]]
[[Chapter_02_Constitution_and_Spec_Interactions]]
[[MoC_Chapter_03_Spec_Driven_Development_Workflow]]
[[Chapter_03_What_is_a_Spec]]
[[Chapter_03_Anatomy_of_a_Spec]]
[[Chapter_03_Spec_First_Workflow]]
[[Chapter_03_Spec_to_MCP]]
[[Chapter_03_Spec_to_DSPy]]
[[Chapter_03_SDD_Feedback_Loop]]
[[MoC_Chapter_04_MCP_Server_Fundamentals]]
[[Chapter_04_What_is_an_MCP_Server]]
[[Chapter_04_MCP_Protocol_and_Sessions]]
[[Chapter_04_Tool_Definitions_and_Schemas]]
[[Chapter_04_MCP_Error_Models]]
[[Chapter_04_Server_Capabilities_and_Versioning]]
[[Chapter_04_State_and_Statelessness_in_MCP]]
[[Chapter_04_MCP_with_DSPy_and_PKM]]
[[MoC_Chapter_05_DSPy_Framework_Deep_Dive]]
[[Chapter_05_What_is_DSPy]]
[[Chapter_05_DSPy_Signatures]]
[[Chapter_05_DSPy_Modules]]
[[Chapter_05_DSPy_Optimization]]
[[Chapter_05_DSPy_orchestrating_MCP]]
[[Chapter_05_DSPy_Memory_and_PKM]]
[[Chapter_05_DSPy_Reasoning_Graphs]]
[[Chapter_05_DSPy_as_PKM_Agent]]
[[MoC_Chapter_06_PKM_Integration_with_Obsidian]]
[[Chapter_06_PKM_in_Agent_Systems]]
[[Chapter_06_Zettelkasten_as_Knowledge_Graph]]
[[Chapter_06_Obsidian_as_PKM_OS]]
[[Chapter_06_Designing_a_PKM_Vault]]
[[Chapter_06_Zettel_Metadata_and_Schema]]
[[Chapter_06_MCP_Tools_for_PKM]]
[[Chapter_06_DSPy_and_PKM_Retrieval]]
[[Chapter_06_PKM_as_Agent_Identity]]
[[Chapter_06_PKM_MCP_DSPy_Loop]]
[[MoC_Chapter_07_MCP_DSPy_PKM_Agent_Project]]
[[Chapter_07_What_is_an_Agent]]
[[Chapter_07_Four_Layer_Agent_Architecture]]
[[Chapter_07_PKM_Layer]]
[[Chapter_07_DSPy_Layer]]
[[Chapter_07_MCP_Layer]]
[[Chapter_07_Constitution_Layer]]
[[Chapter_07_Agent_Lifecycle]]
[[Chapter_07_PKM_Agent_Reasoning_Graph]]
[[Chapter_07_Agent_Failure_Modes_and_Recovery]]
[[Chapter_07_Example_PKM_Query_Cycle]]
write_note[[MoC_Full_Stack_Example_write_note]]
[[Full_Stack_write_note_Why_First]]
[[Full_Stack_write_note_Overwrite_Semantics]]
[[Full_Stack_write_note_Specs_Tests_Examples_Form_a_Contract]]
[[Full_Stack_write_note_From_Zettels_to_Skills_to_Code]]
[[Full_Stack_write_note_What_To_Copy_For_Next_Tool]]
[[MoC_Full_Stack_Example_search_notes]]
[[Full_Stack_search_notes_Why_Canonical_Read_Only]]
[[Full_Stack_search_notes_Contract]]
[[Full_Stack_search_notes_Tests_and_Examples]]
[[Full_Stack_search_notes_Safety_Read_Only]]
[[Full_Stack_search_notes_Skills_and_Constitution]]
update_note[[MoC_Full_Stack_Example_update_note]]
[[Full_Stack_update_note_Path_First_Safety_Model]]
[[Full_Stack_update_note_Consent_and_Allowed_Roots]]
id_index_stats[[MoC_Full_Stack_Example_id_index_stats]]
[[Full_Stack_id_index_stats_Observability_and_Determinism]]
[[Full_Stack_id_index_stats_D012_Vault_Relative_Paths]]
[[Full_Stack_id_index_stats_Tool_Contract_and_Semantics]]
[[Full_Stack_id_index_stats_Tests_and_Examples_Enforce_Contract]]
[[Full_Stack_id_index_stats_Future_Id_Based_Mutation_Safety]]
append_to_note[[MoC_Full_Stack_Example_append_to_note]]
[[Full_Stack_append_to_note_Why_Append_Safer_Than_Overwrite]]
[[Full_Stack_append_to_note_Consent_and_D011_Write_Fence]]
[[Full_Stack_append_to_note_Relative_Paths_and_D012]]
[[Full_Stack_append_to_note_Contract_and_Edge_Cases]]
[[Full_Stack_append_to_note_Tests_and_Examples_Enforce_Contract]]
[[Full_Stack_append_to_note_Journaling_Safety]]
[[MoC_Full_Stack_Example_list_notes]]
[[Full_Stack_list_notes_Discovery_Companion]]
[[Full_Stack_list_notes_Contract]]
[[Full_Stack_list_notes_Safety_and_Scope]]
[[Full_Stack_list_notes_Tests_and_Examples]]
[[Full_Stack_list_notes_Hallucination_Mitigation]]
[[Full_Stack_list_notes_Future_Navigation]]
[[MoC_Journaling_Pattern]]
[[Journaling_Pattern_Append_Only_Memory_Primitive]]
[[Journaling_Pattern_Daily_Note_Naming_and_Location]]
[[Journaling_Pattern_Create_If_Missing_and_Append_Safely]]
[[Journaling_Pattern_Retrieval_Patterns]]
[[Journaling_Pattern_Anti_Patterns]]
[[Journaling_Pattern_Prepping_for_Append_Journal_Entry]]
[[Journaling_Pattern_Temp_Vault_Walkthrough]]
[[MoC_Chapter_13_Promotion_Pipeline]]
[[MoC_From_Zettels_to_Skills]]
[[Zettels_Capture_Intent_Skills_Encode_Action]]
[[Skills_as_Procedural_Zettels]]
[[Why_Skills_Follow_Zettels_Not_Replace_Them]]
[[Repository_as_Execution_Context_for_Skills]]
[[MoC_Execution_Context_GitHub_Mode]]
[[Book Zettel/MCP with DSPy Theory and Application/Execution Contexts/GitHub Mode/GitHub_Mode_No_Vault_Writes]]
[[GitHub_Mode_Spool_Root_Semantics]]
[[GitHub_Mode_Tool_Output_Invariants]]
[[GitHub_Mode_Failure_Modes]]
[[GitHub_Mode_Why_This_Is_Not_Optional]]
[[GitHub_Mode_Is_A_Different_Execution_Contract]]
[[MoC_Chapter_15_Promotion_to_Skills_and_CI]]
[[When_a_Cluster_Becomes_a_Skill]]
[[Skills_as_Executable_Contracts]]
[[Skill_Acceptance_Criteria]]
[[CI_as_Enforcement_Not_Validation]]
[[Failure_Modes_When_Skills_Drift]]
[[Promotion_Gates_and_Graduation_Checklist]]
[[From_Zettels_to_Skills_Operationalizing_Architectural_Intent]]
[[MoC_Chapter_16_Agents_as_Junior_Engineers]]
[[Agents_Are_Not_Tools]]
[[Agents_Are_Not_Peers]]
[[Agents_as_Apprentices_Not_Autonomous_Actors]]
[[Execution_Contexts_as_Permission_Levels]]
[[Skills_as_Onboarding_Packets]]
[[CI_as_Supervision_Not_Validation]]
[[Promotion_as_Trust_Graduation]]
[[Failure_Modes_When_Agents_Are_Treated_Wrongly]]
[[Why_This_Model_Scales]]
[[MoC_Chapter_17_Supervision_and_Review_Patterns]]
[[Chapter_17_Why_Supervision_Is_Not_Optional]]
[[Chapter_17_Shadow_Execution_and_Dual_Run_Patterns]]
[[Chapter_17_Review_Gates_and_Human_Signoff]]
[[Chapter_17_When_Not_to_Automate]]
[[Chapter_17_Feedback_Loops_and_Correction]]
[[MoC_Chapter_18_Prose_Refinement_Pipeline]]
[[Prose_Refinement_Objectives]]
[[Prose_Refinement_Workflow]]
[[Prose_Refinement_Checklist]]
[[MoC_Chapter_99_Agent_Demo]]
[[Agent_Demo_Thesis]]
[[Agent_Demo_Pipeline]]
[[Agent_Demo_Checklist]]