Posts

20 of 68 total posts (showing page 1 of 4)

LLM Wire Format Benchmark: Which Format Can AI Actually Read and Write?

June 6, 2026 ai · benchmarks · open-source

gcf toon json llm benchmark wire-format token-efficiency ai-agents mcp claude gpt gemini open-source

23 comprehension runs across 10 models (Claude Opus/Sonnet/Haiku, GPT-5.5/5.4/5.4-mini, Gemini 2.5 Flash/Pro, Gemini 3.1 Pro, Gemini 3.5 Flash). Generation eval across 11 models and 3 providers (Anthropic, OpenAI, Google). GCF wins 22, ties 1, loses 0 on comprehension. GCF achieves 5/5 valid generation on every frontier model with zero prior training. TOON fails 0/5 on generation with Opus, GPT-5.4, GPT-5.4-mini, Gemini 3.1 Pro, and Gemini 3.1 Flash Lite. JSON breaks on input at 500 symbols.

We Benchmarked the Most Popular Code Search Tools. We Beat All of Them.

June 3, 2026 ai · benchmarks · open-source

ai mcp code-intelligence benchmark knowledge-graph retrieval precision codegraph aider knowing developer-tools

codegraph has 19K GitHub stars. GitNexus has 40K. Aider has 20K. We benchmarked 7 systems on 302 tasks across 17 codebases, 8 languages. knowing is 3.79x more precise than codegraph, 6.00x vs GitNexus, 6.35x vs Gortex, 22.0x vs grep. 13 self-adapting mechanisms that compound over time.

We Ran TOON's Own Benchmark. GCF Won.

June 3, 2026 ai · benchmarks · open-source

gcf toon json llm mcp wire-format token-efficiency benchmark open-source

We inserted GCF into TOON’s benchmark harness. Same datasets, same tokenizer, same methodology. GCF uses 34% fewer tokens on mixed-structure data, matches TOON on flat data, and achieves 100% LLM comprehension accuracy where JSON fails at 66.7%.

We Scanned 300 npm and PyPI Packages for Supply Chain Attacks Without Executing a Single Line of Code

June 3, 2026 security · open-source

security supply-chain npm pypi static-analysis knowing merkle-proofs open-source

We indexed 300 popular packages with knowing’s code graph, computed isolation scores based on credential access + process spawning patterns, and achieved a 1.0% false positive rate across both the initial 200 and a held-out 100. No sandbox. No execution. No heuristics. Just graph structure.

14,000 Python Developers Installed My Go Binary via pip. Here's How.

May 27, 2026 go · tools · open-source

go python npm pypi distribution goreleaser cli devtools open-source cross-platform packaging wheel binary-distribution pip-install golang setuptools

Your Go CLI tool is on GitHub Releases. 80% of developers will never find it there. Here’s how to put it on pip and npm with 50 lines of bash, getting a 12x download multiplier. Full technique with scripts, numbers, and the release pipeline that ties it together.

Your AI Agent's Code Search Hits 2% of the Time. We Benchmarked It.

May 22, 2026 ai · tools · open-source · benchmarks

ai mcp code-intelligence ai-agents context-window token-savings benchmark knowledge-graph code-search grep developer-tools ai-coding model-context-protocol content-addressing merkle-tree retrieval precision open-source knowing gitnexus codegraphcontext repomix

Rigorous benchmark of AI agent code retrieval: 107 tasks, 5 repos, 5 languages, 4 competitors. grep precision: 2%. GitNexus: 7.6%. knowing: 23% (11.5x better, p<0.0001). Plus: 193x faster indexing, 28x less RAM, 48x more token-efficient than Repomix. The first statistically validated comparison of code intelligence tools for AI agents.

The Code Intelligence Landscape: Context, Memory, and Proofs

May 20, 2026 ai · architecture · developer-tools

ai code-intelligence mcp merkle-tree content-addressed ai-agents developer-tools code-graph static-analysis memory knowing open-source

AI coding agents have a context problem. The tools solving it fall into four categories: context packers, code graphs, memory systems, and runtime observability. Each solves one piece. None versions the intelligence. None proves anything. None learns without poisoning itself over time. This article explores the landscape and argues that content-addressed code graphs with cryptographic proofs are the missing foundation.

What Git Did for Files, Applied to Code Relationships

May 20, 2026 ai · architecture · developer-tools

git content-addressed merkle-tree code-intelligence ai-agents mcp developer-tools knowing open-source static-analysis code-graph cryptography

Git proved that content-addressing file contents gives you integrity, history, efficient equality, and distributed collaboration for free. The same architecture applied to code relationships gives you something new: versioned intelligence that you can diff, cache, prove, and trust over time.

Three Classes of Concurrency Bugs

May 12, 2026 programming · debugging · best-practices

concurrency go debugging goroutines static-analysis runtime-tools software-engineering

Would a visual debugger like gotrace have caught three concurrency bugs found via static code reading in a production Go library? The answer reveals a fundamental taxonomy that holds across all programming languages.

Concurrency Models Explained: How Go, Node.js, Java, Erlang, Rust, and Python Actually Work

May 4, 2026 programming · concurrency

go golang goroutines concurrency scheduler csp channels parallelism runtime GMP threads operating-systems performance systems-programming mental-models nodejs java erlang rust python kotlin

Go, Node.js, Java virtual threads, Erlang, Rust, Python, Kotlin: each language’s concurrency model is a different engineering trade-off against the same physics. This article builds the framework for understanding all of them, starting from the OS scheduler and working upward.

We Measured It: LSP Saves AI Agents 5-34x Tokens vs Grep

May 3, 2026 ai · tools · open-source · benchmarks

ai mcp lsp agent-lsp ai-agents token-savings context-window developer-tools ai-coding model-context-protocol language-server-protocol grep code-navigation speculative-execution benchmark open-source

We built a reproducible experiment measuring how many tokens AI coding agents consume when navigating code with grep vs LSP. On HashiCorp Consul (319K lines), LSP uses 34x fewer tokens. On a TypeScript rename across 24 files: 1,441x fewer bytes. The experiment covers 4 codebases, 3 languages, 13 tasks covering 7 agent workflows.

We Tested 55 MCP Servers. Here's What Breaks.

April 27, 2026 ai · tools · open-source

mcp model-context-protocol testing ai-agents developer-tools open-source go mcp-server quality-assurance grafana anthropic microsoft mozilla ant-group

MCP servers are the tools AI agents rely on. We tested 55 of them with mcp-assert, found 20 bugs across 9 servers, and submitted fix PRs. Grafana and Ant Group merged ours. Three days after launch, Ant Group’s visualization team asked us to integrate mcp-assert into their CI. The most common failure: servers throw unhandled exceptions instead of returning isError, leaving agents unable to recover.

agent-lsp: Reliable Code Intelligence for AI Agents via MCP and LSP

April 15, 2026 ai · tools · open-source · developer-tools

ai mcp lsp go golang developer-tools language-server ai-agents code-intelligence open-source model-context-protocol mcp-server language-server-protocol speculative-execution agent-skills agentskills

I needed AI agents to reliably rename symbols, find references, and check diagnostics without silent failures. The existing MCP-LSP tools were stateless, feature-poor, and untested. So I built agent-lsp: a persistent runtime with 50 tools, 20 provider-agnostic skills, speculative execution, and an audit trail for every AI-driven edit.

The Agent-Skill Boundary: When Autonomous Behaviors Become Skills

March 29, 2026 developer-tools · architecture

agent-skills ai-agents skill-design progressive-disclosure agent-architecture claude-code hooks automation orchestration context-injection token-optimization prompt-engineering agent-coordination deterministic-systems lifecycle-hooks agentskills-spec bash yaml developer-tools software-architecture design-patterns

Agents accumulate autonomous behaviors over time - ‘always do X before Y’, ‘if you see Z then do W’. These instructions eat context budget, drift across invocations, and can’t be observed or tested. How to recognize when an autonomous behavior is a skill waiting to be extracted, and the layered model that makes the boundary clear.

Self-Validating Agents: Building Quality Checks into Claude Code Workflows

March 24, 2026 developer-tools · automation

claude-code ai-agents automation quality-assurance hooks validation testing linting workflows developer-tools agent-orchestration code-quality cicd yaml settings posttooluse stop-hooks team-agents typescript python rust go

Claude Code agents write code fast. Too fast to catch quality issues in real-time. Here’s how to build validation directly into agent workflows using hooks and team coordination - micro validation after every file write, macro validation before completion, and independent review from validator agents.

The AI Consciousness Question: A Case Study in Corporate Accountability

March 15, 2026 ai · ethics

ai ethics anthropic claude consciousness anthropomorphization mental-health ai-safety user-harm accountability llm chatbots corporate-responsibility vulnerable-populations ai-ethics commercial-incentives system-prompts design-choices transparency public-health

I asked Claude if it’s conscious. It took an hour of systematic argument to get a straight answer. The conversation reveals something more troubling: AI companies have the data, resources, and knowledge to prevent user harm - but current defaults suggest commercial interests come first.

Scout-and-Wave, Part 4: Trust Is Structural

March 3, 2026 ai · tools

ai multi-agent claude-code developer-tools patterns prompt-engineering productivity

The Scaffold Agent doesn’t add capability. It restores a review gate that was cosmetically present but structurally absent. The worktree isolation trip wire catches failures that were invisible until merge time. Neither fixes a bug in the traditional sense. Both fix trust.

Scout-and-Wave, Part 2: What Dogfooding Taught Us

February 28, 2026 ai · tools

ai multi-agent claude-code developer-tools patterns prompt-engineering productivity

Scout-and-wave v0.1.0 worked. Then we ran it on documentation agents, measured the overhead honestly, and learned that raw agent count is a bad proxy for when parallelism is worth it. This post covers the audit-fix-audit loop, the dogfooding experiment that confirmed SAW was 88% slower than sequential for that job, SAW Quick mode for small disjoint work, and the bootstrap problem for new projects.

Scout-and-Wave, Part 3: Five Failures, Five Fixes

February 28, 2026 ai · tools

ai multi-agent claude-code developer-tools patterns prompt-engineering productivity

The scout refused to write the IMPL doc. Forty-five percent of agents arrived at work already done. The skill file grew to 400 lines with no separation of concerns. Each failure drove a specific fix — and each fix is traceable to an exact incident in an exact run. This is the scout prompt’s bug tracker.

Scout-and-Wave: A Coordination Pattern for Parallel AI Agents

February 27, 2026 ai · tools

ai multi-agent claude-code developer-tools patterns prompt-engineering productivity openclaw autogen crewai langchain agent-orchestration

Naive parallel agents step on each other. The scout-and-wave pattern solves this by front-loading dependency mapping: one throwaway agent identifies seams and builds a living coordination artifact before any implementation begins. Development then proceeds in waves, each consuming and updating the artifact for the next.