Chaos Testing

Deterministic failure-mode testing framework for the AgenC runtime's replay and storage subsystems.

Source: runtime/src/eval/chaos-matrix.ts

Overview

The Chaos Testing framework provides a structured approach to validating how the runtime behaves under adverse conditions. Rather than relying on random fault injection, it defines a Chaos Scenario Matrix -- a catalog of named failure scenarios, each with a known trigger, an expected anomaly, and a classification that determines whether the outcome is fully reproducible or statistically bounded.

Every scenario targets one of the runtime's critical subsystems: replay comparison, persistent storage, state transitions, or network connectivity. Running the full matrix produces a pass/fail report that maps directly to the anomaly codes used by the replay engine and alerting pipeline.

Scenario Structure

Each entry in the chaos matrix implements the ChaosScenario interface:

typescript

interface ChaosScenario {
  id: string;                     // Namespaced identifier, e.g. "chaos.store_write_failure"
  category: ChaosScenarioCategory;
  trigger: string;                // Human-readable description of what causes this scenario
  expectedAnomaly: {
    code?: ReplayAnomalyCode;     // Expected anomaly code (replay scenarios)
    alertCode?: string;           // Expected alert code (non-replay scenarios)
    severity: 'error' | 'warning' | 'info';
    kind: ReplayAlertKind;
  };
  classification: 'deterministic' | 'probabilistic';
  fixture: string;                // Path to fixture data used by the test
}

Field Reference

Field	Type	Description
`id`	`string`	Dot-namespaced identifier. Convention: `chaos.<category>.<failure_mode>`.
`category`	`ChaosScenarioCategory`	One of `replay`, `store`, `transition`, `network`, or `partial_write`.
`trigger`	`string`	Plain-language description of the condition that activates the scenario.
`expectedAnomaly.code`	`ReplayAnomalyCode`	The anomaly code the runtime should emit. Present for replay-category scenarios.
`expectedAnomaly.alertCode`	`string`	The alert code the runtime should emit. Present for non-replay-category scenarios.
`expectedAnomaly.severity`	`'error' \	'warning' \	'info'`	Expected severity level of the resulting anomaly.
`expectedAnomaly.kind`	`ReplayAlertKind`	Classification of the alert for routing and escalation.
`classification`	`'deterministic' \	'probabilistic'`	Whether the scenario produces identical results on every run.
`fixture`	`string`	Relative path to the JSON or binary fixture consumed by the test harness.

Classification Model

Deterministic -- Given the same fixture data and trigger, the scenario always produces the exact same anomaly. Safe to assert with strict equality. All replay and transition scenarios use this classification.

Probabilistic -- The scenario involves timing, concurrency, or partial I/O and may not fire on every run. Assertions use statistical thresholds (e.g., "must fail in at least 95 of 100 runs"). Network and partial-write scenarios typically use this classification.

Scenario Categories

`replay` -- Replay Comparison Anomalies

Tests that validate the replay comparator's ability to detect divergences between on-chain event projections and locally recorded traces.

ID	Trigger	Severity	Classification
`chaos.comparator.hash_mismatch`	Projected hash differs from local replay hash	error	deterministic
`chaos.comparator.missing_event`	On-chain event exists but not in local trace	error	deterministic
`chaos.comparator.unexpected_event`	Local trace has event not in on-chain projection	warning	deterministic
`chaos.comparator.type_mismatch`	Event type changed between projection and local	error	deterministic

`chaos.comparator.hash_mismatch` -- The most common replay failure. The fixture contains a local trace whose content hash diverges from the projected on-chain hash. The comparator must detect this and emit an error-severity anomaly with the corresponding ReplayAnomalyCode.

`chaos.comparator.missing_event` -- Simulates a gap in the local trace. An event recorded on-chain has no corresponding entry in the local replay log. This typically indicates a dropped write or a subscriber lag.

`chaos.comparator.unexpected_event` -- The inverse of missing_event. The local trace contains an event that does not appear in the on-chain projection. Classified as warning because the extra event may be an optimistic pre-commit that was later rolled back.

`chaos.comparator.type_mismatch` -- The local trace and on-chain projection both contain an event at the same index, but the event types differ. This is a corruption indicator and is treated as an error.

`store` -- Storage Failures

Tests that validate the runtime's behavior when the underlying storage layer fails or returns corrupt data.

ID	Trigger	Severity	Classification
`chaos.store.write_failure`	Storage write operation fails mid-transaction	error	deterministic
`chaos.store.read_corruption`	Storage returns corrupted data on read	error	deterministic

`chaos.store.write_failure` -- The fixture configures a storage backend that rejects writes after a specific byte offset. The runtime must detect the incomplete write, avoid committing partial state, and surface the failure through the alerting pipeline.

`chaos.store.read_corruption` -- The fixture provides a storage backend that returns altered bytes on read. The runtime must detect the integrity violation (typically via checksum mismatch) and refuse to use the corrupted data.

`transition` -- State Machine Violations

Tests that validate the runtime's state machine enforcement layer.

ID	Trigger	Severity	Classification
`chaos.transition.impossible_state`	Invalid state transition attempted	error	deterministic
`chaos.transition.timing_violation`	Event ordering constraint violated	warning	deterministic

`chaos.transition.impossible_state` -- The fixture forces a state transition that violates the protocol's state machine (e.g., moving a task from Completed back to Claimed). The runtime must reject the transition and emit an error-level alert.

`chaos.transition.timing_violation` -- The fixture delivers events with timestamps that violate the expected ordering constraint (e.g., a completion event timestamped before the corresponding claim event). Classified as warning because clock skew can cause benign ordering anomalies in production.

`network` -- Network Connectivity Failures

Tests that validate the runtime's behavior when network operations fail.

ID	Trigger	Severity	Classification
`chaos.network.rpc_timeout`	RPC endpoint does not respond within timeout	error	probabilistic
`chaos.network.ws_disconnect`	WebSocket connection drops mid-stream	warning	probabilistic

Network scenarios are classified as probabilistic because they depend on timing and I/O scheduling.

`partial_write` -- Incomplete Write Operations

Tests that validate recovery from writes that succeed partially before failing.

ID	Trigger	Severity	Classification
`chaos.partial_write.replay_log`	Replay log write truncated mid-entry	error	probabilistic
`chaos.partial_write.checkpoint`	Checkpoint file written incompletely	error	probabilistic

Partial-write scenarios verify that the runtime can detect and recover from incomplete persistence operations without data loss or silent corruption.

Running Chaos Tests

Command Line

# Run all chaos tests
npm run test:chaos

# Run chaos tests via grep filter
npm run test -- --grep chaos

# Run a specific scenario category
npm run test -- --grep "chaos.comparator"
npm run test -- --grep "chaos.store"
npm run test -- --grep "chaos.transition"

# Run with verbose output
npm run test:chaos -- --reporter verbose

Programmatic Execution

typescript

import { chaosMatrix } from '@agenc/runtime/eval/chaos-matrix';

// Iterate all scenarios
for (const scenario of chaosMatrix) {
  console.log(`${scenario.id} [${scenario.classification}] -- ${scenario.trigger}`);
}

// Filter by category
const replayScenarios = chaosMatrix.filter(s => s.category === 'replay');
const deterministicOnly = chaosMatrix.filter(s => s.classification === 'deterministic');

Writing a Chaos Test

Each chaos test loads a fixture, executes the trigger condition, and asserts that the runtime produces the expected anomaly:

typescript

import { describe, it, expect } from 'vitest';
import { chaosMatrix } from '../eval/chaos-matrix.js';
import { ReplayComparator } from '../eval/replay-comparator.js';

describe('chaos.comparator', () => {
  const scenarios = chaosMatrix.filter(s => s.category === 'replay');

  for (const scenario of scenarios) {
    it(scenario.id, async () => {
      // Load fixture
      const fixture = await loadFixture(scenario.fixture);

      // Execute trigger
      const comparator = new ReplayComparator(fixture.config);
      const result = await comparator.compare(fixture.localTrace, fixture.onChainProjection);

      // Assert expected anomaly
      expect(result.anomalies).toContainEqual(
        expect.objectContaining({
          code: scenario.expectedAnomaly.code,
          severity: scenario.expectedAnomaly.severity,
        })
      );
    });
  }
});

CI Integration

Chaos tests run as a dedicated step in the CI pipeline, separate from unit and integration tests.

Pipeline Configuration

yaml

chaos_tests:
  runs-on: ubuntu-latest
  needs: [unit_tests]
  steps:
    - uses: actions/checkout@v4
    - uses: actions/setup-node@v4
      with:
        node-version: '20'
    - name: Install dependencies
      run: npm ci
    - name: Run chaos tests
      run: npm run test:chaos
      env:
        CHAOS_FAIL_THRESHOLD: 0.05

Key CI Behaviors

Behavior	Details
Execution order	Chaos tests run after unit tests pass.
Auto-triggering	Disabled. Chaos failures require manual review before blocking the pipeline.
Fail-rate thresholds	Mutation gates accept a configurable threshold (e.g., `CHAOS_FAIL_THRESHOLD=0.05` allows up to 5% of probabilistic scenarios to fail).
Deterministic failures	Always block the pipeline. A deterministic scenario that fails indicates a regression.
Probabilistic failures	Evaluated against the threshold. If the observed fail rate exceeds the threshold, the step fails.
Artifact collection	On failure, fixture data and anomaly reports are uploaded as CI artifacts for offline analysis.

Environment Variables

Variable	Default	Description
`CHAOS_FAIL_THRESHOLD`	`0.05`	Maximum acceptable failure rate for probabilistic scenarios (0.0 -- 1.0).
`CHAOS_PROBABILISTIC_RUNS`	`100`	Number of iterations for each probabilistic scenario.
`CHAOS_FIXTURE_DIR`	`./fixtures/chaos`	Directory containing chaos test fixtures.
`CHAOS_VERBOSE`	`0`	Set to `1` for detailed anomaly output during test runs.

Adding a New Scenario

Define the fixture. Create a JSON file in fixtures/chaos/ that contains the data needed to trigger the failure condition.

Add the scenario to the matrix. Append a new entry to the chaosMatrix array in runtime/src/eval/chaos-matrix.ts:

typescript

{
  id: 'chaos.store.index_corruption',
  category: 'store',
  trigger: 'Storage index points to wrong offset after compaction',
  expectedAnomaly: {
    alertCode: 'STORE_INDEX_CORRUPT',
    severity: 'error',
    kind: ReplayAlertKind.StorageIntegrity,
  },
  classification: 'deterministic',
  fixture: './fixtures/chaos/store-index-corruption.json',
}

Write the test. If the new scenario falls under an existing category, the category-level test loop picks it up automatically. If it introduces a new category, create a new test file following the pattern in runtime/src/eval/__tests__/.

Validate locally. Run the full chaos suite and confirm the new scenario passes:

npm run test:chaos -- --reporter verbose

Update the threshold (if probabilistic). If the new scenario is probabilistic, verify that the CI fail-rate threshold still holds with the expanded matrix.

Relationship to Other Test Layers

The chaos testing framework occupies a specific position in the overall test strategy:

Layer	Scope	Tool	Documentation
Unit tests	Individual functions and classes	Vitest	Testing Patterns
Fuzz tests	On-chain instruction invariants	proptest	Fuzz Testing
Chaos tests	Runtime failure modes (replay, storage, state)	Vitest + chaos-matrix	This page
Smoke tests	End-to-end deployment verification	Custom harness	Smoke Tests
Integration tests	Cross-module interactions (LiteSVM)	Vitest + LiteSVM	Testing Patterns

Chaos tests complement fuzz tests: fuzz tests explore the input space of on-chain instructions to find invariant violations, while chaos tests exercise known failure modes in the off-chain runtime to verify that detection and recovery mechanisms work correctly.

Troubleshooting

Deterministic scenario fails intermittently

A deterministic scenario should never fail intermittently. If it does:

Check for test pollution from a previous scenario. Each test should initialize its own comparator and fixture state.

Verify the fixture file has not been modified. Run git diff fixtures/chaos/ to check for unintended changes.

Ensure no global state (singletons, module-level caches) persists between test runs.

Probabilistic scenario always fails

Increase CHAOS_PROBABILISTIC_RUNS to get a more reliable failure rate estimate.

Review whether the trigger condition changed (e.g., a timeout was shortened, making the scenario deterministically triggerable).

Consider reclassifying the scenario as deterministic if the underlying condition is now reliably reproducible.

Fixture not found

text

Error: ENOENT: no such file or directory, open './fixtures/chaos/...'

Confirm the fixture path in the scenario entry is relative to the project root.

Ensure the fixture file was committed to version control (check .gitignore).

CI threshold exceeded

If the CI step fails with a threshold violation:

Review the failure report artifact to identify which probabilistic scenarios failed.
Determine whether the failures indicate a real regression or environmental noise (e.g., CI runner I/O latency).
If the failures are environmental, consider increasing CHAOS_FAIL_THRESHOLD temporarily and filing a follow-up issue to stabilize the scenario.
If the failures indicate a regression, fix the underlying issue before merging.

Chaos Testing

Deterministic failure-mode testing framework for the AgenC runtime's replay and storage subsystems.

Source: runtime/src/eval/chaos-matrix.ts

Overview

Scenario Structure

Each entry in the chaos matrix implements the ChaosScenario interface:

typescript

interface ChaosScenario {
  id: string;                     // Namespaced identifier, e.g. "chaos.store_write_failure"
  category: ChaosScenarioCategory;
  trigger: string;                // Human-readable description of what causes this scenario
  expectedAnomaly: {
    code?: ReplayAnomalyCode;     // Expected anomaly code (replay scenarios)
    alertCode?: string;           // Expected alert code (non-replay scenarios)
    severity: 'error' | 'warning' | 'info';
    kind: ReplayAlertKind;
  };
  classification: 'deterministic' | 'probabilistic';
  fixture: string;                // Path to fixture data used by the test
}

Field Reference

Field	Type	Description
`id`	`string`	Dot-namespaced identifier. Convention: `chaos.<category>.<failure_mode>`.
`category`	`ChaosScenarioCategory`	One of `replay`, `store`, `transition`, `network`, or `partial_write`.
`trigger`	`string`	Plain-language description of the condition that activates the scenario.
`expectedAnomaly.code`	`ReplayAnomalyCode`	The anomaly code the runtime should emit. Present for replay-category scenarios.
`expectedAnomaly.alertCode`	`string`	The alert code the runtime should emit. Present for non-replay-category scenarios.
`expectedAnomaly.severity`	`'error' \	'warning' \	'info'`	Expected severity level of the resulting anomaly.
`expectedAnomaly.kind`	`ReplayAlertKind`	Classification of the alert for routing and escalation.
`classification`	`'deterministic' \	'probabilistic'`	Whether the scenario produces identical results on every run.
`fixture`	`string`	Relative path to the JSON or binary fixture consumed by the test harness.

Classification Model

Scenario Categories

`replay` -- Replay Comparison Anomalies

Tests that validate the replay comparator's ability to detect divergences between on-chain event projections and locally recorded traces.

ID	Trigger	Severity	Classification
`chaos.comparator.hash_mismatch`	Projected hash differs from local replay hash	error	deterministic
`chaos.comparator.missing_event`	On-chain event exists but not in local trace	error	deterministic
`chaos.comparator.unexpected_event`	Local trace has event not in on-chain projection	warning	deterministic
`chaos.comparator.type_mismatch`	Event type changed between projection and local	error	deterministic

`store` -- Storage Failures

Tests that validate the runtime's behavior when the underlying storage layer fails or returns corrupt data.

ID	Trigger	Severity	Classification
`chaos.store.write_failure`	Storage write operation fails mid-transaction	error	deterministic
`chaos.store.read_corruption`	Storage returns corrupted data on read	error	deterministic

`transition` -- State Machine Violations

Tests that validate the runtime's state machine enforcement layer.

ID	Trigger	Severity	Classification
`chaos.transition.impossible_state`	Invalid state transition attempted	error	deterministic
`chaos.transition.timing_violation`	Event ordering constraint violated	warning	deterministic

`network` -- Network Connectivity Failures

Tests that validate the runtime's behavior when network operations fail.

ID	Trigger	Severity	Classification
`chaos.network.rpc_timeout`	RPC endpoint does not respond within timeout	error	probabilistic
`chaos.network.ws_disconnect`	WebSocket connection drops mid-stream	warning	probabilistic

Network scenarios are classified as probabilistic because they depend on timing and I/O scheduling.

`partial_write` -- Incomplete Write Operations

Tests that validate recovery from writes that succeed partially before failing.

ID	Trigger	Severity	Classification
`chaos.partial_write.replay_log`	Replay log write truncated mid-entry	error	probabilistic
`chaos.partial_write.checkpoint`	Checkpoint file written incompletely	error	probabilistic

Partial-write scenarios verify that the runtime can detect and recover from incomplete persistence operations without data loss or silent corruption.

Running Chaos Tests

Command Line

# Run all chaos tests
npm run test:chaos

# Run chaos tests via grep filter
npm run test -- --grep chaos

# Run a specific scenario category
npm run test -- --grep "chaos.comparator"
npm run test -- --grep "chaos.store"
npm run test -- --grep "chaos.transition"

# Run with verbose output
npm run test:chaos -- --reporter verbose

Programmatic Execution

typescript

import { chaosMatrix } from '@agenc/runtime/eval/chaos-matrix';

// Iterate all scenarios
for (const scenario of chaosMatrix) {
  console.log(`${scenario.id} [${scenario.classification}] -- ${scenario.trigger}`);
}

// Filter by category
const replayScenarios = chaosMatrix.filter(s => s.category === 'replay');
const deterministicOnly = chaosMatrix.filter(s => s.classification === 'deterministic');

Writing a Chaos Test

Each chaos test loads a fixture, executes the trigger condition, and asserts that the runtime produces the expected anomaly:

typescript

import { describe, it, expect } from 'vitest';
import { chaosMatrix } from '../eval/chaos-matrix.js';
import { ReplayComparator } from '../eval/replay-comparator.js';

describe('chaos.comparator', () => {
  const scenarios = chaosMatrix.filter(s => s.category === 'replay');

  for (const scenario of scenarios) {
    it(scenario.id, async () => {
      // Load fixture
      const fixture = await loadFixture(scenario.fixture);

      // Execute trigger
      const comparator = new ReplayComparator(fixture.config);
      const result = await comparator.compare(fixture.localTrace, fixture.onChainProjection);

      // Assert expected anomaly
      expect(result.anomalies).toContainEqual(
        expect.objectContaining({
          code: scenario.expectedAnomaly.code,
          severity: scenario.expectedAnomaly.severity,
        })
      );
    });
  }
});

CI Integration

Chaos tests run as a dedicated step in the CI pipeline, separate from unit and integration tests.

Pipeline Configuration

yaml

chaos_tests:
  runs-on: ubuntu-latest
  needs: [unit_tests]
  steps:
    - uses: actions/checkout@v4
    - uses: actions/setup-node@v4
      with:
        node-version: '20'
    - name: Install dependencies
      run: npm ci
    - name: Run chaos tests
      run: npm run test:chaos
      env:
        CHAOS_FAIL_THRESHOLD: 0.05

Key CI Behaviors

Behavior	Details
Execution order	Chaos tests run after unit tests pass.
Auto-triggering	Disabled. Chaos failures require manual review before blocking the pipeline.
Fail-rate thresholds	Mutation gates accept a configurable threshold (e.g., `CHAOS_FAIL_THRESHOLD=0.05` allows up to 5% of probabilistic scenarios to fail).
Deterministic failures	Always block the pipeline. A deterministic scenario that fails indicates a regression.
Probabilistic failures	Evaluated against the threshold. If the observed fail rate exceeds the threshold, the step fails.
Artifact collection	On failure, fixture data and anomaly reports are uploaded as CI artifacts for offline analysis.

Environment Variables

Variable	Default	Description
`CHAOS_FAIL_THRESHOLD`	`0.05`	Maximum acceptable failure rate for probabilistic scenarios (0.0 -- 1.0).
`CHAOS_PROBABILISTIC_RUNS`	`100`	Number of iterations for each probabilistic scenario.
`CHAOS_FIXTURE_DIR`	`./fixtures/chaos`	Directory containing chaos test fixtures.
`CHAOS_VERBOSE`	`0`	Set to `1` for detailed anomaly output during test runs.

Adding a New Scenario

Define the fixture. Create a JSON file in fixtures/chaos/ that contains the data needed to trigger the failure condition.

Add the scenario to the matrix. Append a new entry to the chaosMatrix array in runtime/src/eval/chaos-matrix.ts:

typescript

{
  id: 'chaos.store.index_corruption',
  category: 'store',
  trigger: 'Storage index points to wrong offset after compaction',
  expectedAnomaly: {
    alertCode: 'STORE_INDEX_CORRUPT',
    severity: 'error',
    kind: ReplayAlertKind.StorageIntegrity,
  },
  classification: 'deterministic',
  fixture: './fixtures/chaos/store-index-corruption.json',
}

Write the test. If the new scenario falls under an existing category, the category-level test loop picks it up automatically. If it introduces a new category, create a new test file following the pattern in runtime/src/eval/__tests__/.

Validate locally. Run the full chaos suite and confirm the new scenario passes:

npm run test:chaos -- --reporter verbose

Update the threshold (if probabilistic). If the new scenario is probabilistic, verify that the CI fail-rate threshold still holds with the expanded matrix.

Relationship to Other Test Layers

The chaos testing framework occupies a specific position in the overall test strategy:

Layer	Scope	Tool	Documentation
Unit tests	Individual functions and classes	Vitest	Testing Patterns
Fuzz tests	On-chain instruction invariants	proptest	Fuzz Testing
Chaos tests	Runtime failure modes (replay, storage, state)	Vitest + chaos-matrix	This page
Smoke tests	End-to-end deployment verification	Custom harness	Smoke Tests
Integration tests	Cross-module interactions (LiteSVM)	Vitest + LiteSVM	Testing Patterns

Troubleshooting

Deterministic scenario fails intermittently

A deterministic scenario should never fail intermittently. If it does:

Check for test pollution from a previous scenario. Each test should initialize its own comparator and fixture state.

Verify the fixture file has not been modified. Run git diff fixtures/chaos/ to check for unintended changes.

Ensure no global state (singletons, module-level caches) persists between test runs.

Probabilistic scenario always fails

Increase CHAOS_PROBABILISTIC_RUNS to get a more reliable failure rate estimate.

Review whether the trigger condition changed (e.g., a timeout was shortened, making the scenario deterministically triggerable).

Consider reclassifying the scenario as deterministic if the underlying condition is now reliably reproducible.

Fixture not found

text

Error: ENOENT: no such file or directory, open './fixtures/chaos/...'

Confirm the fixture path in the scenario entry is relative to the project root.

Ensure the fixture file was committed to version control (check .gitignore).

CI threshold exceeded

If the CI step fails with a threshold violation:

Review the failure report artifact to identify which probabilistic scenarios failed.
Determine whether the failures indicate a real regression or environmental noise (e.g., CI runner I/O latency).
If the failures are environmental, consider increasing CHAOS_FAIL_THRESHOLD temporarily and filing a follow-up issue to stabilize the scenario.
If the failures indicate a regression, fix the underlying issue before merging.

Chaos Testing

Overview

Scenario Structure

Field Reference

Classification Model

Scenario Categories

`replay` -- Replay Comparison Anomalies

`store` -- Storage Failures

`transition` -- State Machine Violations

`network` -- Network Connectivity Failures

`partial_write` -- Incomplete Write Operations

Running Chaos Tests

Command Line

Programmatic Execution

Writing a Chaos Test

CI Integration

Pipeline Configuration

Key CI Behaviors

Environment Variables

Adding a New Scenario

Relationship to Other Test Layers

Troubleshooting

Deterministic scenario fails intermittently

Probabilistic scenario always fails

Fixture not found

CI threshold exceeded

Command Palette

Chaos Testing

Overview

Scenario Structure

Field Reference

Classification Model

Scenario Categories

`replay` -- Replay Comparison Anomalies

`store` -- Storage Failures

`transition` -- State Machine Violations

`network` -- Network Connectivity Failures

`partial_write` -- Incomplete Write Operations

Running Chaos Tests

Command Line

Programmatic Execution

Writing a Chaos Test

CI Integration

Pipeline Configuration

Key CI Behaviors

Environment Variables

Adding a New Scenario

Relationship to Other Test Layers

Troubleshooting

Deterministic scenario fails intermittently

Probabilistic scenario always fails

Fixture not found

CI threshold exceeded