Token-Efficiency
LLM Wire Format Benchmark: Which Format Can AI Actually Read and Write?
23 comprehension runs across 10 models (Claude Opus/Sonnet/Haiku, GPT-5.5/5.4/5.4-mini, Gemini 2.5 Flash/Pro, Gemini 3.1 Pro, Gemini 3.5 Flash). Generation eval across 11 models and 3 providers (Anthropic, OpenAI, Google). GCF wins 22, ties 1, loses 0 on comprehension. GCF achieves 5/5 valid generation on every frontier model with zero prior training. TOON fails 0/5 on generation with Opus, GPT-5.4, GPT-5.4-mini, Gemini 3.1 Pro, and Gemini 3.1 Flash Lite. JSON breaks on input at 500 symbols.
We Ran TOON's Own Benchmark. GCF Won.
We inserted GCF into TOON’s benchmark harness. Same datasets, same tokenizer, same methodology. GCF uses 34% fewer tokens on mixed-structure data, matches TOON on flat data, and achieves 100% LLM comprehension accuracy where JSON fails at 66.7%.