How Continuous Fuzzing Finds Bugs Traditional Testing Misses
Coverage-guided fuzzing runs continuously in CI, exploring millions of input combinations and evolving test cases over time. Learn how to set up continuous fuzzing in Go with GitHub Actions, understand corpus evolution, and see real bugs discovered through automated fuzzing.
- categories
- Programming Testing Best-Practices
- published
You wrote comprehensive tests. Your code has 80% test coverage. All 200 assertions pass. Ship it?
I shipped twice with that confidence. Continuous fuzzing found two bugs in the first hour — bugs my test suite would never have exercised.
Traditional testing has a problem: you only test what you think to test. Empty strings, negative numbers, boundary values - these are good. But what about:
- Japanese field names with empty JSON tags triggering UTF-8 byte slicing bugs
- Regex patterns containing newline characters producing broken JavaScript output
These aren’t bugs you’d write tests for. They’re bugs you discover by exploring the input space automatically.
This is what fuzzing does. And when you run it continuously in CI — generating millions of test cases every day, building on discoveries from previous runs — it finds bugs traditional testing misses.
Fuzzing Fundamentals
Before diving into continuous fuzzing setup, let’s establish what fuzzing is and how it differs from traditional testing.
What Fuzzing Is
Fuzzing is automated testing that generates random inputs to find bugs. Instead of writing specific test cases, you write fuzz targets — functions that accept random inputs and verify properties (invariants) about your code.
Coverage-Guided Fuzzing
Coverage-guided fuzzing uses code coverage feedback to guide input generation toward unexplored code paths. When an input triggers a new branch, it’s saved to the corpus (collection of interesting inputs) for future mutation.
Continuous Fuzzing
Continuous fuzzing runs fuzzing 24/7 in CI, with the corpus persisting across runs. Each run builds on previous discoveries, creating compound growth in test effectiveness.
Why Continuous Fuzzing Works
Traditional tests stay static — you write 200 assertions and coverage plateaus at 80%. Continuous fuzzing improves over time:
- Day 1: Corpus has 10 seed inputs, finds obvious bugs
- Week 1: Corpus grows to 500+ inputs covering edge cases
- Month 1: Corpus reaches 2,000+ inputs, coverage increases from 80% → 85%
- Ongoing: Every run explores from a larger, smarter starting point
The fuzzer runs when you’re sleeping, exploring combinations humans wouldn’t think to test. It found both bugs in goldenthread within the first hour of running.
What This Article Covers
This is a technical deep-dive into continuous fuzzing: how coverage-guided fuzzing works, how corpus evolution compounds over time, and how to set up continuous fuzzing in GitHub Actions. We’ll examine two real bugs discovered by fuzzing before they reached production, with technical details and reproduction steps.
If you’re familiar with property-based testing (QuickCheck, Hypothesis, proptest), fuzzing is similar but runs continuously in CI with automatic corpus growth.
What Fuzzing Is (And Isn’t)
Traditional Testing: Explicit Examples
Traditional testing is example-based: you write specific test cases for scenarios you anticipate.
| |
What you test: 5 examples you thought of
What you don’t test:
- Unicode characters in local part
- Very long email addresses (> 254 characters)
- Multiple @ symbols
- Special characters (#, !, $, %)
- Whitespace variations
- Null bytes
- Control characters
- Internationalized domain names
Fuzzing: Automated Exploration
Fuzzing is exploration-based: the fuzzer generates thousands of inputs automatically, mutating them to explore code paths.
| |
What gets tested: Potentially millions of inputs:
"\x00alice@example.com"(null byte)"alice@exampl\ne.com"(newline in domain)"フィールド@example.com"(UTF-8)"alice@" + strings.Repeat("a", 1000) + ".com"(very long)- And thousands more combinations the fuzzer discovers
How Coverage-Guided Fuzzing Works
Not all fuzzing is equally effective. Coverage-guided fuzzing uses code coverage feedback to guide input generation toward unexplored code paths.
The Fuzzing Loop
discovered?} save[Add to corpus] discard[Discard] end corpus --> mutate mutate --> execute execute --> result result -->|Crash/Error| fail[Report Bug] result -->|Pass| decision decision --> newcov newcov -->|Yes| save newcov -->|No| discard save --> corpus discard --> mutate style corpus fill:#3A4A5C,stroke:#6b7280,color:#f0f0f0 style mutate fill:#3A4C43,stroke:#6b7280,color:#f0f0f0 style execute fill:#4C4538,stroke:#6b7280,color:#f0f0f0 style decision fill:#4C3A3C,stroke:#6b7280,color:#f0f0f0
Instrumentation: Tracking Coverage
Go’s fuzzer instruments your code to track which branches execute:
| |
Instrumented execution tracks:
- Branch 1: Taken (len > 0) or not taken (len == 0)
- Branch 2: Taken (invalid UTF-8) or not taken (valid UTF-8)
- Branch 3: Taken (parts != 2) or not taken (parts == 2)
- Branch 4: Taken (empty local) or not taken (has local)
- Branch 5: Taken (empty domain) or not taken (has domain)
Mutation: Generating Inputs
The fuzzer mutates inputs from the corpus to create new test cases:
Seed: "alice@example.com"
Mutations:
→ "alice@example.com\x00" (append null byte)
→ "Alice@example.com" (flip case)
→ "alice@example.co" (delete byte)
→ "aalice@example.com" (duplicate byte)
→ "alice@exampl\ne.com" (inject newline)
→ "alice@" + repeat("a", 100) (arithmetic - extend)
→ "フィールド@example.com" (dictionary - splice UTF-8)
... millions more
Each mutation runs through the fuzz function. If it discovers a new code path (branch not previously executed), it’s added to the corpus for future mutations.
Example: Discovering a Branch
| |
If name starts with a multi-byte UTF-8 character, name[:1] produces an invalid byte prefix. In practice, this corrupts output (often via replacement characters), even if it doesn’t always produce an “invalid string” at the end — either way, it’s a bug.
Fuzzing execution:
Run 1: "Alice"
Branches: len > 0, return camelCase
Result: "alice" (pass)
Coverage: 2/2 branches
Run 2: "" (mutation: delete all bytes)
Branches: len == 0, return "Anonymous"
Result: "Anonymous" (pass)
Coverage: 2/2 branches (no new coverage)
Run 3: "Alice\x00" (mutation: append null)
Branches: len > 0, return camelCase
Result: "alice\x00" (pass)
Coverage: 2/2 branches (no new coverage)
Run 444,553: "フィールド" (mutation: splice UTF-8 from dictionary)
Branches: len > 0, return camelCase
Result: CORRUPTED OUTPUT (FAIL)
Coverage: New execution path (UTF-8 edge case)
BUG FOUND!
The fuzzer discovered that name[:1] slices bytes, not characters. For multi-byte UTF-8 characters, [:1] returns an incomplete byte sequence, corrupting the output.
Corpus Evolution: Compound Growth
The killer feature of continuous fuzzing: the corpus grows over time, compounding discoveries from previous runs.
Initial State (Day 1, Run 1)
Seed corpus:
FuzzEmit:
- ("User", "username", "email")
- ("Task", "title", "description")
- ("日本語", "フィールド", "") // Explicitly added for UTF-8 testing
Total: 8 seeds across all targets
After 24 Hours (48 runs × 10 minutes)
Corpus growth:
FuzzEmit: 10 → 87 inputs (+770%)
FuzzEmitPattern: 8 → 52 inputs (+550%)
FuzzComputeSchemaHash: 6 → 134 inputs (+2133%)
Total: 8 → 542 inputs (+6675%)
Coverage improvement:
Emitter: 89.4% → 91.7%
Parser: 75.1% → 78.3%
Hash: 47.6% → 52.8%
After 1 Month (1,440 runs)
Corpus growth:
Total inputs: 2,847
Total executions: Millions per target across all runs
Coverage:
Emitter: 94.8%
Parser: 84.2%
Hash: 58.1%
Bugs found: 2 (both in first week)
53% coverage] end subgraph day7["Day 7"] d7c[542 inputs
61% coverage] end subgraph day30["Day 30"] d30c[2,847 inputs
69% coverage] end day1 -->|Compound growth| day7 day7 -->|Continued discovery| day30 style day1 fill:#3A4A5C,stroke:#6b7280,color:#f0f0f0 style day7 fill:#3A4C43,stroke:#6b7280,color:#f0f0f0 style day30 fill:#4C4538,stroke:#6b7280,color:#f0f0f0
Why this works: Each run starts with an improved corpus from the previous run. Inputs that triggered new branches in run N become seeds for run N+1. The fuzzer doesn’t start from scratch every time - it builds on past discoveries.
The time advantage:
Human test writer:
20 test cases × 1 minute each = 20 minutes
Tests check known edge cases only
Continuous fuzzing (based on goldenthread's observed CI performance):
Single target (10m): 52 million executions
12 targets total: ~250 million executions per run
If schedules run reliably: billions of executions per day
Compounds over time as corpus grows
The fuzzer runs when you’re sleeping, exploring edge cases automatically.
Real Bug Discovery: UTF-8 Corruption
Let’s examine an actual bug found by fuzzing in the goldenthread schema compiler.
The Bug
Discovered: 2026-01-25 at 02:34 UTC
Fuzz target: FuzzEmit
Executions to discovery: 444,553
Time to discovery: ~10 seconds
Failing input:
| |
Buggy code:
| |
Why This Failed
In Go, strings are byte sequences; indexing and slicing operate on bytes, not characters (runes). Runes are Unicode code points.
Japanese text uses multi-byte UTF-8 encoding. "フィールド" is 5 runes but 15 bytes in UTF-8:
"フィールド" in UTF-8:
[0xE3, 0x83, 0x95] [0xE3, 0x82, 0xA3] [0xE3, 0x83, 0xBC] [0xE3, 0x83, 0xAB] [0xE3, 0x83, 0x89]
└─ "フ" (3 bytes) └─ "ィ" (3 bytes) └─ "ー" (3 bytes) └─ "ル" (3 bytes) └─ "ド" (3 bytes)
s[:1] returns [0xE3] — the first byte of a 3-byte character — producing a broken prefix and corrupting the output. In goldenthread, that corruption surfaced as invalid UTF-8 in the emitted output and failed utf8.ValidString().
How Fuzzing Caught It
The FuzzEmit target includes an invariant check:
| |
The fuzzer mutated seed inputs, eventually splicing UTF-8 characters into field names. After 444,553 executions, it generated the specific combination (Japanese field name + empty JSON name) that triggered the bug.
The Fix
| |
Why Manual Testing Missed This
No human test writer thinks: “Let me test Japanese field names with empty JSON names to verify UTF-8 handling in camelCase conversion.”
Three Independent Factors
This bug required the intersection of three separate conditions:
- Multi-byte UTF-8 input - Field name starts with Japanese character
- Empty JSON name - Triggers fallback to camelCase conversion
- Byte slicing in implementation - Code uses
s[:1]instead of rune slicing
Any two of these alone wouldn’t trigger the bug. All three together = corrupted output.
Manual testing would likely never discover this specific intersection.
Real Bug Discovery: Regex Escaping
Discovered: 2026-01-25 at 02:41 UTC
Fuzz target: FuzzEmitPattern
Executions to discovery: 180
Time to discovery: < 1 second
Failing input:
| |
Buggy code:
| |
Only backslashes were escaped. Pattern "\n" produced broken JavaScript:
| |
Escaping for JavaScript regex literal context requires more than just backslashes.
How Fuzzing Caught It
FuzzEmitPattern tests random regex patterns:
| |
The fuzzer tried control characters within 180 executions (< 1 second). Pattern "\n" broke JavaScript syntax immediately.
The Fix
| |
We now escape backslashes, the delimiter (/ - required because we’re emitting .regex(/pattern/) literals), and control characters (\n, \r, \t). This handles the common cases for JavaScript regex literal context. Other embedding contexts (like new RegExp("...")) have different escaping requirements.
Why Manual Testing Missed This
Developers test regex patterns like ^[a-z]+$ (alphanumeric), not literal control characters. Fuzzing tried "\n" after just 180 executions.
Setting Up Continuous Fuzzing in GitHub Actions
Here’s the complete workflow for running fuzzing 24/7 in CI.
Workflow Configuration
.github/workflows/fuzz.yml:
| |
Key Configuration Elements
1. Schedule: Continuous fuzzing
| |
Runs continuously on a schedule. Each run builds on the previous corpus.
GitHub Actions Scheduled Workflows: Reliability Note
In my experience, GitHub Actions scheduled workflows can be less reliable than push-triggered workflows:
- Schedules may be delayed or skipped during high platform load
- Repositories with infrequent activity sometimes have schedules paused
- No notifications when schedules fail to run
If your scheduled workflow stops running:
- Manually trigger via workflow_dispatch (often reactivates it)
- Check Settings → Actions → General to ensure workflows are enabled
- For production-critical fuzzing, consider self-hosted runners or OSS-Fuzz
The workflow configuration shown here is correct - the limitation is with GitHub’s scheduling infrastructure, not the workflow itself.
2. Parallel execution
| |
Runs 12 targets simultaneously. Wall-clock time: ~10 minutes (not 120 minutes).
3. Corpus caching (critical for continuous growth)
| |
Cache key uses branch name (not commit SHA) so the corpus persists across commits. This is what enables compound growth - each run builds on the previous corpus, even after you push new code.
4. Exit code capture (critical for reliability)
| |
Using PIPESTATUS[0] captures the exit code of go test, not tee. Without this, the workflow would always see exit code 0 from tee even when fuzzing fails.
5. Automatic issue creation
| |
Only creates issues for scheduled runs (not PRs). Includes:
- Exact reproduction command
- Failing test case ID
- Last 100 lines of output
- Links to artifacts
Understanding Fuzz Target Design
Good fuzz targets test properties (invariants), not specific outputs.
Bad: Testing Exact Output
| |
This fails for any input except “ alice@example.com ”. Fuzzing generates random inputs - exact output tests don’t work.
Good: Testing Properties
| |
Common Property Patterns
1. Roundtrip properties
| |
2. Idempotence (f(f(x)) = f(x))
| |
3. Invariants (properties that always hold)
| |
4. Inverse operations
| |
Debugging Fuzzing Failures
When fuzzing finds a bug, here’s how to reproduce and debug it locally.
Step 1: Download Failing Test Case
GitHub Actions uploads the failing test case as an artifact. Download it from the workflow run.
Step 2: Reproduce Locally
| |
This runs the exact input that caused the failure. Fully deterministic.
Step 3: Debug
| |
The failing input is small and focused (fuzzer minimizes it automatically), making debugging straightforward.
Step 4: Fix and Verify
| |
Step 5: Add Regression Test
| |
This prevents regression and documents the fix.
Cost and Resource Management
GitHub Actions Costs
Public repositories:
GitHub-hosted runners for public repos are generally generous enough that continuous fuzzing is often feasible at no cost. However, fair-use policies apply and specifics can change.
Private repositories:
For private repositories, continuous fuzzing can become expensive. Example calculation with 12 targets running for 10 minutes every 30 minutes:
5,760 minutes/day × 30 days = ~172,000 minutes/month
At ~$0.008/minute (Linux runners, rates vary) = ~$1,400/month
Note: Rates and policies change. Check current GitHub Actions pricing for accurate costs.
This is why production continuous fuzzing often requires:
- Self-hosted GitHub Actions runners
- Dedicated fuzzing infrastructure (OSS-Fuzz)
- GitHub Enterprise with higher quotas
- Reduced frequency/duration (trade-offs below)
Optimization Strategies
1. Reduce frequency:
| |
Reduces cost by 6× (still runs 8 times per day).
2. Limit fuzz time:
| |
Halves cost, still runs frequently.
3. Selective fuzzing:
| |
Eliminates cost from PR builds.
When Fuzzing Finds Nothing
After a month of continuous fuzzing, no new bugs. Is fuzzing working?
Signs of Healthy Fuzzing
1. Corpus is growing:
| |
If no new corpus entries for weeks, fuzzing may have plateaued.
2. Coverage is increasing:
| |
Coverage should increase as corpus grows (but will plateau eventually).
3. Executions are consistent:
Check GitHub Actions logs for execution counts. Here’s what I observed on goldenthread’s CI:
Recent goldenthread CI run (GitHub-hosted runners):
FuzzEmit (10m): 52,278,168 executions
FuzzEmitFieldName (5m): 24,345,196 executions
FuzzEmitValidation (5m): 23,007,683 executions
Average observed rate: ~87,000 executions/second
Per 10-minute run: ~50 million executions per target
Your mileage will vary based on test complexity, corpus size, and runner specifications. GitHub Actions runners typically provide more workers (22 in my case) than local machines, resulting in higher throughput.
When Finding Nothing Means Success
Week 1: 2 bugs found
Week 2-4: 0 bugs found
Month 2: 0 bugs found
Month 3: 0 bugs found
This is success - your code is stable. Continuous fuzzing acts as insurance: it keeps running to catch regressions from future changes.
Adding More Fuzz Targets
If fuzzing plateaus, add more targets to explore different code paths:
| |
Fuzzing vs Property-Based Testing
If you’re familiar with property-based testing (QuickCheck, Hypothesis, proptest), fuzzing is similar with three key differences:
1. Coverage guidance - Fuzzing uses coverage feedback to explore new code paths. Property-based testing generates pure random inputs without feedback.
2. Persistent corpus - Fuzzing saves inputs that trigger new branches. Property-based testing generates fresh random inputs each run.
3. Scale - Fuzzing runs continuously in CI (millions of executions over time). Property-based testing runs 100-10,000 cases per test suite execution.
Use both: property-based tests catch bugs during development, fuzzing catches edge cases over time in production.
Conclusion
Traditional testing checks examples you think of. Fuzzing explores combinations you don’t.
What we covered:
- Coverage-guided fuzzing uses instrumentation to guide input generation toward unexplored code paths
- Corpus evolution compounds over time - each run builds on previous discoveries
- Continuous fuzzing runs 24/7 in CI, exploring billions of input combinations
- Real bugs: UTF-8 corruption (444,553 executions) and regex escaping (180 executions)
- GitHub Actions workflow runs hourly with automatic issue creation
- Fuzz targets test properties (invariants), not exact outputs
When to use fuzzing:
- Parsers, serializers, encoders (lots of edge cases)
- String processing (UTF-8, escape sequences, control characters)
- Format validation (emails, URLs, regex patterns)
- Mathematical operations (overflow, division by zero)
- Anything with complex input space
When to skip fuzzing:
- Simple business logic (example-based tests are clearer)
- Code with no invariants to test
- UI interactions (fuzzing doesn’t work well with stateful UIs)
- Database migrations (specific sequences matter)
The best testing strategy uses multiple approaches: unit tests for known cases, integration tests for workflows, property-based tests for algorithmic properties, and fuzzing for continuous exploration.
Fuzzing found two production bugs in goldenthread before release. Both were edge cases no human test writer would think to check. This is what continuous fuzzing does - it explores the input space automatically, finding bugs you didn’t know existed.
Further Reading
Official Documentation:
Related Articles on This Blog:
- The Complete Guide to Rust Testing - Property-based testing with proptest
- How Multicore CPUs Changed Object-Oriented Programming - Why value semantics matter for concurrent code
Real-World Examples:
- goldenthread Fuzzing Bug Log - Detailed analysis of both bugs found by fuzzing, including trigger conditions, root cause analysis, and fixes
- goldenthread Continuous Fuzzing Setup - Complete implementation guide for the fuzzing system described in this article
Tools and Resources:
- go-fuzz - Alternative Go fuzzing tool
- AFL (American Fuzzy Lop) - Industry-standard fuzzer
- libFuzzer - LLVM’s fuzzing library
- OSS-Fuzz - Google’s continuous fuzzing for open source
Found an error or have questions? Open an issue or reach out on Twitter/X .