Quickstart — ReplayCI

Prerequisites

Node.js 20+
An OpenAI API key (or Anthropic)
A ReplayCI API key (get one at app.replayci.com/signup)

1 Install

            Terminal
          
            npm install -D replayci

2 Initialize

            Terminal
          
            npx replayci init

This creates:

.replayci.yml — config file (provider, model, pack path)
packs/starter/ — a working contract pack you can run immediately

3 Configure

Set your API keys as environment variables:

            Terminal
          
            export REPLAYCI_PROVIDER_KEY="sk-..."
export REPLAYCI_API_KEY="rci_live_..."

Edit .replayci.yml to set your provider and model:

            .replayci.yml
          
            pack: "./packs/starter"
provider: openai
model: gpt-4o-mini

Settings in .replayci.yml can be overridden with CLI flags. API keys always come from environment variables — never in the config file.

4 Run

            Terminal
          
            npx replayci

You'll see:

  replayci · openai/gpt-4o-mini · 1 contract

  ✓ tool_call              Pass  a3f82c91

  1/1 passed

  results → https://app.replayci.com/runs/r_8f3a2c

The results → line links to your dashboard where you can see full run details. This appears automatically when REPLAYCI_API_KEY is set.

What's in a contract?

A contract is a YAML file that defines one tool-call behavior to test:

            tool_call.yaml
          

            # Does my AI call the right tool?

tool: "get_weather"

assertions:
  output_invariants:
    - path: "$.tool_calls[0].name"
      equals: "get_weather"
    - path: "$.tool_calls[0].arguments"
      exists: true

golden_cases:
  - id: "tool_call_success"
    input_ref: "tool_call.success.json"
    expect_ok: true
          

tool — the tool name the model should call.
assertions — JSON path checks on the model's response.
golden_cases — specific input/output pairs to test.

What happens when a test fails?

ReplayCI classifies every failure with a taxonomy:

Classification	Meaning
`tool_not_invoked`	Model returned text instead of calling the tool
`malformed_arguments`	Tool arguments aren't valid JSON
`schema_violation`	Arguments don't match the expected schema
`wrong_tool`	Model called a different tool than expected
`unexpected_error`	Provider returned an error

Each failure gets a fingerprint. If the same failure happens again, it gets the same fingerprint — so you can track whether a problem is new or recurring.

Prove determinism

Run the same contracts multiple times and verify fingerprints match:

            Terminal
          
            npx replayci --repeat 3

Output includes a determinism_proof showing whether each step produced identical fingerprints across runs.

Try a different provider

Change provider and model in .replayci.yml — same contracts, different model:

            .replayci.yml
          
            provider: anthropic
model: claude-sonnet-4-6

Then npx replayci again. Same test, different provider, results compared automatically.

Next steps

View your dashboard: Go to app.replayci.com to see run results, contract history, and failure trends.
Write your own contracts: See Writing Tests for the full YAML format and examples.
CLI flags: See CLI Reference for all flags, modes, and output formats.
Set up CI gating: Add npx replayci to your CI pipeline to block merges on test failures.