Prerequisites

1 Install

Terminal
npm install -D replayci

2 Initialize

Terminal
npx replayci init

This creates:

  • .replayci.yml — config file (provider, model, pack path)
  • packs/starter/ — a working contract pack you can run immediately

3 Configure

Set your API keys as environment variables:

Terminal
export REPLAYCI_PROVIDER_KEY="sk-..."
export REPLAYCI_API_KEY="rci_live_..."

Edit .replayci.yml to set your provider and model:

.replayci.yml
pack: "./packs/starter"
provider: openai
model: gpt-4o-mini

Settings in .replayci.yml can be overridden with CLI flags. API keys always come from environment variables — never in the config file.

4 Run

Terminal
npx replayci

You'll see:

zsh
  replayci · openai/gpt-4o-mini · 1 contract

   tool_call              Pass  a3f82c91

  1/1 passed

  results → https://app.replayci.com/runs/r_8f3a2c

The results → line links to your dashboard where you can see full run details. This appears automatically when REPLAYCI_API_KEY is set.

What's in a contract?

A contract is a YAML file that defines one tool-call behavior to test:

tool_call.yaml
# Does my AI call the right tool?

tool: "get_weather"

assertions:
  output_invariants:
    - path: "$.tool_calls[0].name"
      equals: "get_weather"
    - path: "$.tool_calls[0].arguments"
      exists: true

golden_cases:
  - id: "tool_call_success"
    input_ref: "tool_call.success.json"
    expect_ok: true
  • tool — the tool name the model should call.
  • assertions — JSON path checks on the model's response.
  • golden_cases — specific input/output pairs to test.

What happens when a test fails?

ReplayCI classifies every failure with a taxonomy:

ClassificationMeaning
tool_not_invokedModel returned text instead of calling the tool
malformed_argumentsTool arguments aren't valid JSON
schema_violationArguments don't match the expected schema
wrong_toolModel called a different tool than expected
unexpected_errorProvider returned an error

Each failure gets a fingerprint. If the same failure happens again, it gets the same fingerprint — so you can track whether a problem is new or recurring.

Prove determinism

Run the same contracts multiple times and verify fingerprints match:

Terminal
npx replayci --repeat 3

Output includes a determinism_proof showing whether each step produced identical fingerprints across runs.

Try a different provider

Change provider and model in .replayci.yml — same contracts, different model:

.replayci.yml
provider: anthropic
model: claude-sonnet-4-6

Then npx replayci again. Same test, different provider, results compared automatically.

Next steps

  • View your dashboard: Go to app.replayci.com to see run results, contract history, and failure trends.
  • Write your own contracts: See Writing Tests for the full YAML format and examples.
  • CLI flags: See CLI Reference for all flags, modes, and output formats.
  • Set up CI gating: Add npx replayci to your CI pipeline to block merges on test failures.