Chaos Testing
Deterministic failure-mode testing framework for the AgenC runtime's replay and storage subsystems.
Chaos Testing
Deterministic failure-mode testing framework for the AgenC runtime's replay and storage subsystems.
Source: runtime/src/eval/chaos-matrix.ts
Overview
The Chaos Testing framework provides a structured approach to validating how the runtime behaves under adverse conditions. Rather than relying on random fault injection, it defines a Chaos Scenario Matrix -- a catalog of named failure scenarios, each with a known trigger, an expected anomaly, and a classification that determines whether the outcome is fully reproducible or statistically bounded.
Every scenario targets one of the runtime's critical subsystems: replay comparison, persistent storage, state transitions, or network connectivity. Running the full matrix produces a pass/fail report that maps directly to the anomaly codes used by the replay engine and alerting pipeline.
Scenario Structure
Each entry in the chaos matrix implements the ChaosScenario interface:
interface ChaosScenario {
id: string; // Namespaced identifier, e.g. "chaos.store_write_failure"
category: ChaosScenarioCategory;
trigger: string; // Human-readable description of what causes this scenario
expectedAnomaly: {
code?: ReplayAnomalyCode; // Expected anomaly code (replay scenarios)
alertCode?: string; // Expected alert code (non-replay scenarios)
severity: 'error' | 'warning' | 'info';
kind: ReplayAlertKind;
};
classification: 'deterministic' | 'probabilistic';
fixture: string; // Path to fixture data used by the test
}Field Reference
| Field | Type | Description | ||
|---|---|---|---|---|
id | string | Dot-namespaced identifier. Convention: chaos.<category>.<failure_mode>. | ||
category | ChaosScenarioCategory | One of replay, store, transition, network, or partial_write. | ||
trigger | string | Plain-language description of the condition that activates the scenario. | ||
expectedAnomaly.code | ReplayAnomalyCode | The anomaly code the runtime should emit. Present for replay-category scenarios. | ||
expectedAnomaly.alertCode | string | The alert code the runtime should emit. Present for non-replay-category scenarios. | ||
expectedAnomaly.severity | `'error' \ | 'warning' \ | 'info'` | Expected severity level of the resulting anomaly. |
expectedAnomaly.kind | ReplayAlertKind | Classification of the alert for routing and escalation. | ||
classification | `'deterministic' \ | 'probabilistic'` | Whether the scenario produces identical results on every run. | |
fixture | string | Relative path to the JSON or binary fixture consumed by the test harness. |
Classification Model
Scenario Categories
`replay` -- Replay Comparison Anomalies
Tests that validate the replay comparator's ability to detect divergences between on-chain event projections and locally recorded traces.
| ID | Trigger | Severity | Classification |
|---|---|---|---|
chaos.comparator.hash_mismatch | Projected hash differs from local replay hash | error | deterministic |
chaos.comparator.missing_event | On-chain event exists but not in local trace | error | deterministic |
chaos.comparator.unexpected_event | Local trace has event not in on-chain projection | warning | deterministic |
chaos.comparator.type_mismatch | Event type changed between projection and local | error | deterministic |
`chaos.comparator.hash_mismatch` -- The most common replay failure. The fixture contains a local trace whose content hash diverges from the projected on-chain hash. The comparator must detect this and emit an error-severity anomaly with the corresponding ReplayAnomalyCode.
`chaos.comparator.missing_event` -- Simulates a gap in the local trace. An event recorded on-chain has no corresponding entry in the local replay log. This typically indicates a dropped write or a subscriber lag.
`chaos.comparator.unexpected_event` -- The inverse of missing_event. The local trace contains an event that does not appear in the on-chain projection. Classified as warning because the extra event may be an optimistic pre-commit that was later rolled back.
`chaos.comparator.type_mismatch` -- The local trace and on-chain projection both contain an event at the same index, but the event types differ. This is a corruption indicator and is treated as an error.
`store` -- Storage Failures
Tests that validate the runtime's behavior when the underlying storage layer fails or returns corrupt data.
| ID | Trigger | Severity | Classification |
|---|---|---|---|
chaos.store.write_failure | Storage write operation fails mid-transaction | error | deterministic |
chaos.store.read_corruption | Storage returns corrupted data on read | error | deterministic |
`chaos.store.write_failure` -- The fixture configures a storage backend that rejects writes after a specific byte offset. The runtime must detect the incomplete write, avoid committing partial state, and surface the failure through the alerting pipeline.
`chaos.store.read_corruption` -- The fixture provides a storage backend that returns altered bytes on read. The runtime must detect the integrity violation (typically via checksum mismatch) and refuse to use the corrupted data.
`transition` -- State Machine Violations
Tests that validate the runtime's state machine enforcement layer.
| ID | Trigger | Severity | Classification |
|---|---|---|---|
chaos.transition.impossible_state | Invalid state transition attempted | error | deterministic |
chaos.transition.timing_violation | Event ordering constraint violated | warning | deterministic |
`chaos.transition.impossible_state` -- The fixture forces a state transition that violates the protocol's state machine (e.g., moving a task from Completed back to Claimed). The runtime must reject the transition and emit an error-level alert.
`chaos.transition.timing_violation` -- The fixture delivers events with timestamps that violate the expected ordering constraint (e.g., a completion event timestamped before the corresponding claim event). Classified as warning because clock skew can cause benign ordering anomalies in production.
`network` -- Network Connectivity Failures
Tests that validate the runtime's behavior when network operations fail.
| ID | Trigger | Severity | Classification |
|---|---|---|---|
chaos.network.rpc_timeout | RPC endpoint does not respond within timeout | error | probabilistic |
chaos.network.ws_disconnect | WebSocket connection drops mid-stream | warning | probabilistic |
Network scenarios are classified as probabilistic because they depend on timing and I/O scheduling.
`partial_write` -- Incomplete Write Operations
Tests that validate recovery from writes that succeed partially before failing.
| ID | Trigger | Severity | Classification |
|---|---|---|---|
chaos.partial_write.replay_log | Replay log write truncated mid-entry | error | probabilistic |
chaos.partial_write.checkpoint | Checkpoint file written incompletely | error | probabilistic |
Partial-write scenarios verify that the runtime can detect and recover from incomplete persistence operations without data loss or silent corruption.
Running Chaos Tests
Command Line
# Run all chaos tests
npm run test:chaos
# Run chaos tests via grep filter
npm run test -- --grep chaos
# Run a specific scenario category
npm run test -- --grep "chaos.comparator"
npm run test -- --grep "chaos.store"
npm run test -- --grep "chaos.transition"
# Run with verbose output
npm run test:chaos -- --reporter verboseProgrammatic Execution
import { chaosMatrix } from '@agenc/runtime/eval/chaos-matrix';
// Iterate all scenarios
for (const scenario of chaosMatrix) {
console.log(`${scenario.id} [${scenario.classification}] -- ${scenario.trigger}`);
}
// Filter by category
const replayScenarios = chaosMatrix.filter(s => s.category === 'replay');
const deterministicOnly = chaosMatrix.filter(s => s.classification === 'deterministic');Writing a Chaos Test
Each chaos test loads a fixture, executes the trigger condition, and asserts that the runtime produces the expected anomaly:
import { describe, it, expect } from 'vitest';
import { chaosMatrix } from '../eval/chaos-matrix.js';
import { ReplayComparator } from '../eval/replay-comparator.js';
describe('chaos.comparator', () => {
const scenarios = chaosMatrix.filter(s => s.category === 'replay');
for (const scenario of scenarios) {
it(scenario.id, async () => {
// Load fixture
const fixture = await loadFixture(scenario.fixture);
// Execute trigger
const comparator = new ReplayComparator(fixture.config);
const result = await comparator.compare(fixture.localTrace, fixture.onChainProjection);
// Assert expected anomaly
expect(result.anomalies).toContainEqual(
expect.objectContaining({
code: scenario.expectedAnomaly.code,
severity: scenario.expectedAnomaly.severity,
})
);
});
}
});CI Integration
Chaos tests run as a dedicated step in the CI pipeline, separate from unit and integration tests.
Pipeline Configuration
chaos_tests:
runs-on: ubuntu-latest
needs: [unit_tests]
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: '20'
- name: Install dependencies
run: npm ci
- name: Run chaos tests
run: npm run test:chaos
env:
CHAOS_FAIL_THRESHOLD: 0.05Key CI Behaviors
| Behavior | Details |
|---|---|
| Execution order | Chaos tests run after unit tests pass. |
| Auto-triggering | Disabled. Chaos failures require manual review before blocking the pipeline. |
| Fail-rate thresholds | Mutation gates accept a configurable threshold (e.g., CHAOS_FAIL_THRESHOLD=0.05 allows up to 5% of probabilistic scenarios to fail). |
| Deterministic failures | Always block the pipeline. A deterministic scenario that fails indicates a regression. |
| Probabilistic failures | Evaluated against the threshold. If the observed fail rate exceeds the threshold, the step fails. |
| Artifact collection | On failure, fixture data and anomaly reports are uploaded as CI artifacts for offline analysis. |
Environment Variables
| Variable | Default | Description |
|---|---|---|
CHAOS_FAIL_THRESHOLD | 0.05 | Maximum acceptable failure rate for probabilistic scenarios (0.0 -- 1.0). |
CHAOS_PROBABILISTIC_RUNS | 100 | Number of iterations for each probabilistic scenario. |
CHAOS_FIXTURE_DIR | ./fixtures/chaos | Directory containing chaos test fixtures. |
CHAOS_VERBOSE | 0 | Set to 1 for detailed anomaly output during test runs. |
Adding a New Scenario
- Define the fixture. Create a JSON file in
fixtures/chaos/that contains the data needed to trigger the failure condition.
- Add the scenario to the matrix. Append a new entry to the
chaosMatrixarray inruntime/src/eval/chaos-matrix.ts:
{
id: 'chaos.store.index_corruption',
category: 'store',
trigger: 'Storage index points to wrong offset after compaction',
expectedAnomaly: {
alertCode: 'STORE_INDEX_CORRUPT',
severity: 'error',
kind: ReplayAlertKind.StorageIntegrity,
},
classification: 'deterministic',
fixture: './fixtures/chaos/store-index-corruption.json',
}- Write the test. If the new scenario falls under an existing category, the category-level test loop picks it up automatically. If it introduces a new category, create a new test file following the pattern in
runtime/src/eval/__tests__/.
- Validate locally. Run the full chaos suite and confirm the new scenario passes:
npm run test:chaos -- --reporter verbose- Update the threshold (if probabilistic). If the new scenario is probabilistic, verify that the CI fail-rate threshold still holds with the expanded matrix.
Relationship to Other Test Layers
The chaos testing framework occupies a specific position in the overall test strategy:
| Layer | Scope | Tool | Documentation |
|---|---|---|---|
| Unit tests | Individual functions and classes | Vitest | Testing Patterns |
| Fuzz tests | On-chain instruction invariants | proptest | Fuzz Testing |
| Chaos tests | Runtime failure modes (replay, storage, state) | Vitest + chaos-matrix | This page |
| Smoke tests | End-to-end deployment verification | Custom harness | Smoke Tests |
| Integration tests | Cross-module interactions (LiteSVM) | Vitest + LiteSVM | Testing Patterns |
Chaos tests complement fuzz tests: fuzz tests explore the input space of on-chain instructions to find invariant violations, while chaos tests exercise known failure modes in the off-chain runtime to verify that detection and recovery mechanisms work correctly.
Troubleshooting
Deterministic scenario fails intermittently
A deterministic scenario should never fail intermittently. If it does:
git diff fixtures/chaos/ to check for unintended changes.Probabilistic scenario always fails
CHAOS_PROBABILISTIC_RUNS to get a more reliable failure rate estimate.deterministic if the underlying condition is now reliably reproducible.Fixture not found
Error: ENOENT: no such file or directory, open './fixtures/chaos/...'fixture path in the scenario entry is relative to the project root..gitignore).CI threshold exceeded
If the CI step fails with a threshold violation:
- Review the failure report artifact to identify which probabilistic scenarios failed.
- Determine whether the failures indicate a real regression or environmental noise (e.g., CI runner I/O latency).
- If the failures are environmental, consider increasing
CHAOS_FAIL_THRESHOLDtemporarily and filing a follow-up issue to stabilize the scenario. - If the failures indicate a regression, fix the underlying issue before merging.