Prerequisites
- Node.js 20+
- An OpenAI API key (or Anthropic)
- A ReplayCI API key (get one at app.replayci.com/signup)
1 Install
npm install -D replayci
2 Initialize
npx replayci init
This creates:
.replayci.yml— config file (provider, model, pack path)packs/starter/— a working contract pack you can run immediately
3 Configure
Set your API keys as environment variables:
export REPLAYCI_PROVIDER_KEY="sk-..." export REPLAYCI_API_KEY="rci_live_..."
Edit .replayci.yml to set your provider and model:
pack: "./packs/starter" provider: openai model: gpt-4o-mini
Settings in .replayci.yml can be overridden with CLI flags. API keys always come from environment variables — never in the config file.
4 Run
npx replayci
You'll see:
replayci · openai/gpt-4o-mini · 1 contract ✓ tool_call Pass a3f82c91 1/1 passed results → https://app.replayci.com/runs/r_8f3a2c
The results → line links to your dashboard where you can see full run details. This appears automatically when REPLAYCI_API_KEY is set.
What's in a contract?
A contract is a YAML file that defines one tool-call behavior to test:
# Does my AI call the right tool? tool: "get_weather" assertions: output_invariants: - path: "$.tool_calls[0].name" equals: "get_weather" - path: "$.tool_calls[0].arguments" exists: true golden_cases: - id: "tool_call_success" input_ref: "tool_call.success.json" expect_ok: true
- tool — the tool name the model should call.
- assertions — JSON path checks on the model's response.
- golden_cases — specific input/output pairs to test.
What happens when a test fails?
ReplayCI classifies every failure with a taxonomy:
| Classification | Meaning |
|---|---|
tool_not_invoked | Model returned text instead of calling the tool |
malformed_arguments | Tool arguments aren't valid JSON |
schema_violation | Arguments don't match the expected schema |
wrong_tool | Model called a different tool than expected |
unexpected_error | Provider returned an error |
Each failure gets a fingerprint. If the same failure happens again, it gets the same fingerprint — so you can track whether a problem is new or recurring.
Prove determinism
Run the same contracts multiple times and verify fingerprints match:
npx replayci --repeat 3
Output includes a determinism_proof showing whether each step produced identical fingerprints across runs.
Try a different provider
Change provider and model in .replayci.yml — same contracts, different model:
provider: anthropic model: claude-sonnet-4-6
Then npx replayci again. Same test, different provider, results compared automatically.
Next steps
- View your dashboard: Go to app.replayci.com to see run results, contract history, and failure trends.
- Write your own contracts: See Writing Tests for the full YAML format and examples.
- CLI flags: See CLI Reference for all flags, modes, and output formats.
- Set up CI gating: Add
npx replaycito your CI pipeline to block merges on test failures.