Json

Why LLMs Struggle with JSON at Scale: A Tokenization Analysis
We benchmarked 8 tokenizers from 6 providers and performed exhaustive vocabulary scans. 15 of the most common JSON field names (id, name, type, value, title, time, text, url, path, description) exist as merged vocabulary entries (quote+field in one token) on GPT-4 (#32586 for ‘“name’), LLaMA, and Qwen. Claude and Gemma have zero such entries. These merges are hardcoded, deterministic, and irrecoverable without retraining the tokenizer. On real eval data: JSON boundary merge rate 8.93% vs GCF 1.00% (88.8% fewer). GPT-4 has 114 quote+letter vocabulary entries vs 17 pipe+letter (6.7:1 ratio). JSON overhead is 81% at scale. This structural ambiguity compounds per row and explains model-dependent comprehension failures across 2,400+ evaluations.
LLM Wire Format Benchmark: Which Format Can AI Actually Read and Write?
23 comprehension runs across 10 models (Claude Opus/Sonnet/Haiku, GPT-5.5/5.4/5.4-mini, Gemini 2.5 Flash/Pro, Gemini 3.1 Pro, Gemini 3.5 Flash). Generation eval across 11 models and 3 providers (Anthropic, OpenAI, Google). GCF wins 22, ties 1, loses 0 on comprehension. GCF achieves 5/5 valid generation on every frontier model with zero prior training. TOON fails 0/5 on generation with Opus, GPT-5.4, GPT-5.4-mini, Gemini 3.1 Pro, and Gemini 3.1 Flash Lite. JSON breaks on input at 500 symbols.
We Ran TOON's Own Benchmark. GCF Won.
We inserted GCF into TOON’s benchmark harness. Same datasets, same tokenizer, same methodology. GCF uses 34% fewer tokens on mixed-structure data, matches TOON on flat data, and achieves 100% LLM comprehension accuracy where JSON fails at 66.7%.
You Don't Know JSON: Part 8 - Lessons from the JSON Revolution
JSON recreated XML’s entire ecosystem modularly. JSX brought back XML’s syntax. What does this teach us about technology evolution? Explore the architectural zeitgeist, pattern survival, and the modularity paradox: choice vs. discoverability.
You Don't Know JSON: Part 1 - Origins, Evolution, and the Cracks in the Foundation
Everyone thinks they know JSON. But do you know why it was created, what problems it solved, and more importantly - what problems it created? Part 1 explores JSON’s origins, its triumph over XML, and the fundamental weaknesses that spawned an entire ecosystem of extensions.
You Don't Know JSON: Part 2 - JSON Schema and the Art of Validation
JSON lacks types and validation - any structure parses successfully. JSON Schema solves this by adding a validation layer without changing JSON itself. Learn how to define schemas, validate at runtime, generate code, and build type-safe APIs.
You Don't Know JSON: Part 3 - Binary JSON in Databases
Database-managed binary JSON formats solve storage and query performance problems. JSONB enables fast PostgreSQL queries with indexing, while BSON adds extended types for MongoDB. Learn when databases beat text JSON.
You Don't Know JSON: Part 4 - Binary JSON for APIs and Data Transfer
Beyond database storage, binary JSON formats optimize API data transfer. MessagePack provides universal serialization with 30-40% size reduction. CBOR adds IETF standardization for IoT and security. Learn when binary beats JSON for network efficiency.
You Don't Know JSON: Part 5 - JSON-RPC: When REST Isn't Enough
REST is great for resources, but what about actions? JSON-RPC provides a simple, transport-agnostic protocol for calling remote functions. Learn the spec, implementation patterns, and why major projects like Ethereum and VS Code chose JSON-RPC over REST.
You Don't Know JSON: Part 6 - JSON Lines: Processing Gigabytes Without Running Out of Memory
Standard JSON can’t stream - you must parse the entire document. JSON Lines solves this with one JSON object per line, enabling streaming processing, log aggregation, Unix pipelines, and handling gigabyte-scale datasets with constant memory usage.
You Don't Know JSON: Part 7 - Security: Authentication, Signatures, and Attacks
JSON has no built-in security. The ecosystem response: JWT for authentication, JWS for signing, JWE for encryption. Learn how these work, common attacks (algorithm confusion, injection, timing), and how to secure JSON-based systems.