The Code Intelligence Landscape: Context, Memory, and Proofs

An exploration of the emerging code intelligence space: context packers, code graphs, memory systems, and why versioned provability is the missing layer. How content-addressing transforms code intelligence from ephemeral search into permanent infrastructure.

Recent landscape analyses of AI memory tools have started mapping the space between “thin memory” (platform tools that forget) and “deep memory” (systems that track belief evolution over time). Comparative benchmarks distinguish “notebooks” (flat fact storage) from “brains” (systems that track what changed and when).

But these analyses focus on general-purpose AI memory: remembering conversations, preferences, and context across chat sessions. The code intelligence problem is related but structurally different. Code has properties that general knowledge doesn’t: it’s versioned (git), it’s deterministic (same source = same behavior), it has formal structure (AST, types, call graphs), and it changes in ways that invalidate prior understanding.

This article explores the code-specific intelligence landscape: what exists, what’s missing, and why the answer requires something none of the current tools provide.

The Problem All These Tools Are Solving

When an AI coding agent (Claude Code, Cursor, Copilot) works on your codebase, it does this:

  1. Reads the file you’re editing
  2. Greps for related symbols
  3. Reads those files
  4. Greps again for callers
  5. Reads more files
  6. Builds a mental model from fragments
  7. Writes code
  8. Next turn: forgets everything and starts over

This costs tokens, time, and accuracy. The agent spends 60% of its context window re-reading files it saw last turn. It misses relationships that span multiple files or repos. It has no memory of what was useful last time.

The codebase is 2M+ tokens. The context window is maybe 200K. The agent needs to select the right 1% to read, every single turn, with no history of what worked before.

The market response has been four categories of tools, each solving one facet.

Category 1: Context Packers

What they do: Analyze your repository and produce a condensed representation for the agent’s context window. Run at query time, produce text output.

Examples: Repo maps, tree-sitter based summarizers, file-importance rankers.

Strengths:

  • Fast to adopt (just generate text and paste it)
  • No persistent state to manage
  • Work with any agent that accepts text context

What they miss:

  • Stateless: they don’t remember what was useful last time
  • No versioning: can’t diff two outputs or prove anything about them
  • Flat: treat all symbols equally regardless of graph relationships
  • Rebuild from scratch every time (no caching)

Context packers are the simplest answer to “give the agent more relevant context.” They’re the grep-but-smarter layer. They don’t build understanding; they compress information.

Category 2: Code Graphs / Indexers

What they do: Build a queryable index of code relationships: call graphs, type hierarchies, imports, implementations. Expose this via API or MCP tools.

Strengths:

  • Rich structural queries (“who calls this function?”, “what implements this interface?”)
  • Language-aware (understand semantics, not just text)
  • Can answer blast-radius and impact questions
  • Persist across sessions (unlike context packers)

What they miss:

  • Mutable state: use auto-increment IDs or UUIDs. No content-addressing.
  • No history: “who called this function last Tuesday?” is unanswerable.
  • No proofs: “prove no one calls this function” is impossible (absence of results != proof of absence).
  • No learning: query results don’t improve with feedback.
  • Regenerated from scratch: can’t incrementally update without full re-index.

Code graphs are the structural intelligence layer. They understand relationships. But they’re ephemeral snapshots with no temporal dimension, no integrity guarantees, and no learning mechanism.

Category 3: Agent Memory Systems

What they do: Persist information across agent sessions. Remember what was discussed, what was useful, what the user’s preferences are.

Strengths:

  • Cross-session continuity (agent doesn’t start cold every time)
  • Can track what worked and what didn’t
  • Some support temporal reasoning (“what did I know when?”)

What they miss:

  • Remember conversations, not code structure. “You asked about auth yesterday” ≠ “auth’s blast radius grew by 3 callers since yesterday.”
  • No formal connection to source code. Memory drifts from reality.
  • Stale memory problem: if code changes, old memory about that code is now wrong.
  • No expiration mechanism tied to code state. Manual cleanup or TTLs, both flawed.

The memory landscape analysis correctly identifies the “drift problem”: information across systems becomes misaligned. For code, drift is even worse because code changes constantly and old understanding becomes actively misleading.

Category 4: Runtime Observability

What they do: Track what actually happens in production: which services call which, traffic patterns, error rates, latency.

Strengths:

  • Ground truth (what actually happens, not what’s declared)
  • Temporal (can say “this started happening on Tuesday”)
  • Quantitative (call frequencies, error rates, latency percentiles)

What they miss:

  • Disconnected from source code. Knows “service A called service B 10K times” but not “the function enabling this is at file X, line 42.”
  • Can’t prove negatives. “Service A never called service B this week” is observational (maybe next week it will).
  • No static understanding. Can’t tell you “this call is possible but hasn’t happened yet.”

Runtime observability is the empirical layer. It tells you what DID happen. It cannot tell you what CAN happen or what’s IMPOSSIBLE.

The Compressed View

ToolWhat it knowsWhat it misses
Context packersImportant files for this queryHistory, learning, proofs, relationships
Code graphsReferences within one workspaceCross-repo callers, history, runtime behavior
Agent memoryWhat you discussed last sessionCode structure, staleness detection
Runtime observabilityWhat happened in productionStatic structure, blast radius, proofs

Each solves one facet. None addresses what ties them together: a versioned, provable identity for the intelligence itself.

The Gap: Nothing Versions The Intelligence

Here’s what no current tool provides:

1. Deterministic identity. Same source code indexed by two people should produce the same result. Not “similar.” Identical. Bit-for-bit. This is what makes results comparable, shareable, and verifiable.

2. Structural history. Not “what files changed” (git does that) but “what relationships changed.” Three new callers appeared. A dependency was removed. Runtime traffic disagrees with static analysis. These are intelligence changes, not file changes.

3. Proofs. Not “I searched and didn’t find it.” Proof. Cryptographic, offline-verifiable, tied to a specific git commit. “This relationship exists” (inclusion proof). “This relationship does NOT exist” (absence proof). “This graph is intact” (integrity verification).

4. Self-healing memory. Not “remember everything forever” (leads to noise). Not “forget after 7 days” (arbitrary). Memory that is tied to the state of the code it references, and automatically becomes invisible when that code changes.

5. Incremental everything. Not “re-index the whole repo on every change.” Change 3 files in a 10,000-file codebase? Process 3 files. Know exactly what’s stale without scanning.

To make this concrete: when you ask “what’s the blast radius of changing BuildHierarchicalTree?”, the answer isn’t in any file. It’s in the set of edges pointing TO that function:

BlastRadius(BuildHierarchicalTree) = {
  SnapshotManager.ComputeSnapshot       --calls-->  BuildHierarchicalTree
  cmd/knowing/prove.cmdProve            --calls-->  BuildHierarchicalTree
  cmd/knowing/prove_absent.cmdProveAbsent --calls--> BuildHierarchicalTree
  mcp/feedback.computeNeighborhoodRoot  --calls-->  BuildHierarchicalTree
  TestBuildHierarchicalTree_Deterministic --calls--> BuildHierarchicalTree
  cache.TestInvalidatePackages          --calls-->  BuildHierarchicalTree
  ... (10+ callers across 5 packages)
}

Change its signature? Every one of these breaks. No single file contains this information. The edges do. Edges are the intelligence; nodes are just vocabulary.

These properties are not optimizations. They’re structural consequences of one architectural choice: content-addressing the intelligence itself.

Content-Addressing as Architecture

Git content-addresses file contents. This gives it deterministic identity, integrity verification, efficient equality checks, and structural history. These aren’t features bolted onto git; they’re free consequences of the identity model.

The same principle applied to code relationships (not files) gives you all five missing properties:

Deterministic identity: Every node, edge, and snapshot is SHA-256 of its content. Same source = same hashes. Always.

Structural history: Each snapshot is a Merkle root over all edges. The chain of snapshots IS the history of relationships. Diff any two in O(packages).

Proofs: A Merkle path from leaf to root proves a relationship exists. Two adjacent sorted leaves prove a gap (absence). Both verify offline with just SHA-256.

Self-healing memory: Feedback stores the Merkle root of the relevant package at recording time. When code changes, the root changes, old feedback becomes invisible. No TTLs. No garbage collection. The hash IS the validity check.

Incremental updates: Changed file = new content hash = known stale nodes = scoped re-extraction. Unchanged code keeps its hashes. Caches remain valid.

This isn’t a feature list. It’s a single design choice (content-address the edges) with five structural consequences.

The Hierarchical Structure

A flat content-addressed system (hash all edges into one root) tells you “something changed” but not “what changed.” The move that makes this practical: organize the tree by semantic boundaries.

repo root
  package root [auth]        <- compare just this to know if auth changed
    edge-type root [calls]   <- compare just this to know if call edges changed
      edge leaves            <- the actual relationships
  package root [store]       <- unchanged? everything cached for store is still valid

This means:

  • “Which packages changed?” is O(packages), not O(edges). Benchmarked at 565x faster for 100K edges.
  • “Is my cached blast-radius for package X still valid?” is one 32-byte comparison. 42 nanoseconds.
  • “Did call edges change independently of import edges?” is one root comparison per type.
  • A proof path goes: edge → edge-type root → package root → repo root. Three levels, ~16 hash steps, ~660 bytes.

The tree doesn’t just prove state. It organizes computation. Same structure serves integrity, caching, diffing, proofs, and feedback expiration.

What Proofs Enable That Search Cannot

Any code graph can tell you “I found that A calls B.” No code graph except a content-addressed one can tell you “I prove that A does NOT call B, cryptographically, verifiable offline by anyone with SHA-256.”

This matters for:

Compliance. “Certify that payment processing has no direct dependency on user data storage.” Not a grep result. A mathematical proof. Tied to a specific git commit. An auditor verifies it independently.

CI gates. “Block this PR if it introduces a new cross-service dependency.” Deterministic: CI produces the same snapshot hash as any developer. The gate is a diff of two Merkle roots, not a heuristic.

Architecture enforcement. “Prove that the billing module is isolated from the user module.” An absence proof that automatically invalidates when someone violates the boundary (because the snapshot root changes).

Temporal reasoning. “When did this cross-service call first appear?” Walk the snapshot chain. Each snapshot has a generation number and a git commit. Binary search is possible.

What Learning Enables That Static Analysis Cannot

Static analysis is a point-in-time snapshot. It tells you what the code looks like NOW. It doesn’t know what matters.

In a 500K LOC codebase with 200K symbols, most are irrelevant for any given task. Static analysis treats them equally. A learning system that accumulates feedback across sessions knows: “When working on auth tasks, these 50 symbols out of 200,000 are consistently useful.”

But persistent feedback has a poisoning problem. Code changes. The function you marked “useful” gets rewritten. The feedback now references code that no longer exists in the same form. Over time, stale feedback dominates fresh signal.

The solution: tie feedback to the cryptographic state of the code it references. When you mark a symbol as useful, record the Merkle root of its package. When you query feedback later, compare the stored root against the current root. If they differ (code changed), the feedback is invisible. If they match (code is the same), the feedback applies.

This is content-addressed memory: the hash IS the validity check. No TTLs. No manual cleanup. No “maybe this is stale.” Mathematical certainty.

Measured impact: precision improves from 16% to 50% over five feedback rounds. The improvement is immediate and sustained. No degradation over time because stale feedback self-expires.

Bringing It Together

The code intelligence landscape is converging on a layered architecture:

LayerWhat it providesExisting tools
ExtractionParse code into structured relationshipsTree-sitter, LSP, SCIP
Graph storageQuery relationshipsCode graphs, indexers
Context packingSelect relevant symbols for agentsContext packers, RWR/HITS
Feedback/memoryLearn what’s useful over timeAgent memory systems
RuntimeObserve what actually happensOTLP, APM
VersioningTrack how intelligence changesGap
ProofsVerify claims about code structureGap

The bottom two layers are where content-addressing lives. They’re the foundation that makes the layers above trustworthy: context packing becomes cacheable, feedback becomes self-healing, runtime observations become comparable against static analysis, and the entire system becomes auditable.

No tool that uses mutable state (auto-increment IDs, UUIDs, ephemeral in-memory graphs) can provide these properties. They require content-addressing to be architectural from the start. You can’t bolt proofs onto a mutable database. You can’t add self-healing memory to a system that doesn’t content-address its feedback. The identity model IS the architecture.

The Practical Stack

For a developer adopting this today, the practical architecture is:

  1. Index your repo into a content-addressed graph (tree-sitter for speed, LSP/SCIP for precision)
  2. Serve via MCP so any agent (Claude Code, Cursor, Copilot) gets ranked context in one call
  3. Watch for changes and re-index incrementally (changed files only, seconds not minutes)
  4. Record feedback when the agent marks symbols as useful/not-useful
  5. Let feedback expire when code changes (merkleized validity, zero maintenance)
  6. Prove claims when compliance or architecture enforcement requires it

The first three give you better context than grep. The fourth gives you learning. The fifth prevents learning from becoming poison. The sixth gives you audit.

Each layer builds on content-addressing. Remove it and you get: uncacheable context (layer 1-3), feedback that drifts (layer 4-5), and assertions without proofs (layer 6).

Where This Goes

The memory tools landscape is evolving from “notebooks” (flat storage) to “brains” (temporal reasoning). The code intelligence landscape will follow the same arc: from “indexers” (point-in-time snapshots) to “versioned intelligence” (temporal, provable, learning systems).

The tools that win will be the ones where:

  • History is structural, not bolted on
  • Proofs are primitive operations, not features
  • Learning is self-healing, not accumulating noise
  • Incremental is the default, not an optimization

Content-addressing makes all four architectural rather than aspirational. It’s the same bet git made for files, applied to code relationships.


knowing is an open-source implementation of these ideas: content-addressed code graph with hierarchical Merkle trees, cryptographic proofs, merkleized feedback, and 26 MCP tools. MIT licensed.

The hierarchical Merkle tree is available as a standalone library: merkle-forest . Zero dependencies. Absence proofs. Scoped queries.