EvalGate

Quick Start Guide

Get started with the EvalGate SDK in under 5 minutes

EvalGate is CI for AI behavior. LLMs drift silently — EvalGate turns evaluations into CI gates so regressions never reach production.

One-Command CI (EvalGate 2.0.0)

Complete CI pipeline in a single command. No config needed.

# Add this to .github/workflows/evalai.yml
name: EvalGate CI
on: [push, pull_request]
jobs:
  evalai:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
      - run: npm ci
      - run: npx evalgate ci --format github --write-results --base main
      - uses: actions/upload-artifact@v4
        if: always()
        with:
          name: evalai-results
          path: .evalai/

That's it! Your CI now automatically discovers specs, runs only impacted tests, compares against baseline, and posts rich summaries in PRs.

Traditional Setup (2 minutes)

For existing projects or when you need more control

npx @evalgate/sdk init    # detects repo, creates baseline, installs CI workflow
git add evals/ .github/workflows/evalai-gate.yml evalai.config.json
git commit -m "chore: add EvalGate regression gate"
git push                           # open a PR → CI blocks regressions

That's it. evalai init detects your package manager, runs your tests to capture a baseline, and scaffolds everything. No account required for local gating.

Run gate locally

npx evalgate gate

Update baseline

npx evalgate baseline update

Upgrade to full gate

npx evalgate upgrade --full

Python Quickstart

Same CI gate, same quality checks. Full parity with TypeScript.

pip install pauly4010-evalgate-sdk

from evalgate_sdk import expect

result = expect("The capital of France is Paris.").to_contain("Paris")
print(result.passed)  # True

No API key needed for local assertions. For platform traces and evaluations, use AIEvalClient. See SDK page for full Python examples.

Or set up manually with the platform

Prerequisites

Node.js 18.0.0 or higher
npm, yarn, or pnpm package manager
Python 3.8+ (for Python SDK)
An EvalGate account (sign up above)

Create an API Key

1. Navigate to the Developer Dashboard
2. Scroll down to the "API Keys" section
3. Click "Create API Key"
4. Enter a name (e.g., "Development Key")
5. Select the scopes you need (start with all for testing)
6. Click "Create Key"
7. Copy and save your API key immediately - you won't see it again!

Important: Your API key is shown only once. Store it securely!

Install the SDK

Install the EvalGate SDK in your project using your preferred package manager:

npm install @evalgate/sdk

yarn add @evalgate/sdk

pnpm add @evalgate/sdk

Python

pip install pauly4010-evalgate-sdk

Configure Environment Variables

Create a .env file in your project root:

EVALAI_API_KEY=sk_test_your_api_key_here
EVALAI_ORGANIZATION_ID=your_org_id_here

Where to find these values:

EVALAI_API_KEY - From the API key creation dialog
EVALAI_ORGANIZATION_ID - Shown in the API key creation dialog

Security Tip: Add .env to your .gitignore file to prevent committing secrets!

Initialize the Client

Import and initialize the SDK in your code:

TypeScript

import { AIEvalClient } from '@evalgate/sdk'

// Auto-loads from environment variables
const client = AIEvalClient.init()

// Or with explicit configuration
const client = new AIEvalClient({
  apiKey: process.env.EVALAI_API_KEY,
  organizationId: parseInt(process.env.EVALAI_ORGANIZATION_ID!),
  debug: true // Enable debug logging
})

Python

from evalgate_sdk import AIEvalClient

# Auto-loads from environment variables
client = AIEvalClient.init()

# Or with explicit configuration
client = AIEvalClient(
    api_key=os.environ["EVALAI_API_KEY"],
    organization_id=int(os.environ["EVALAI_ORGANIZATION_ID"]),
    debug=True
)

Create Your First Trace

Track your first LLM call:

TypeScript

// Create a trace
const trace = await client.traces.create({
  name: 'Chat Completion',
  traceId: 'trace-' + Date.now(),
  metadata: {
    userId: 'user-123',
    model: 'gpt-4'
  }
})

console.log('Trace created:', trace.id)

// Add a span to track the LLM call
const span = await client.traces.createSpan(trace.id, {
  name: 'OpenAI API Call',
  spanId: 'span-' + Date.now(),
  type: 'llm',
  startTime: new Date().toISOString(),
  input: 'What is AI?',
  output: 'AI is artificial intelligence...',
  metadata: {
    model: 'gpt-4',
    tokens: 150,
    latency: 1200
  }
})

console.log('Span created:', span.id)

Python

from evalgate_sdk.types import CreateTraceParams, CreateSpanParams

# Create a trace
trace = await client.traces.create(CreateTraceParams(
    name="Chat Completion",
    trace_id=f"trace-{int(time.time() * 1000)}",
    metadata={"userId": "user-123", "model": "gpt-4"}
))

print(f"Trace created: {trace.id}")

# Add a span to track the LLM call
span = await client.traces.create_span(trace.id, CreateSpanParams(
    name="OpenAI API Call",
    span_id=f"span-{int(time.time() * 1000)}",
    type="llm",
    start_time=datetime.now().isoformat(),
    input="What is AI?",
    output="AI is artificial intelligence...",
    metadata={"model": "gpt-4", "tokens": 150, "latency": 1200}
))

print(f"Span created: {span.id}")

Write Your First Eval

Now that you can trace, let's evaluate. The SDK includes a test suite runner with 20+ built-in assertions designed for LLM outputs.

TypeScript

import { createTestSuite, expect } from '@evalgate/sdk';

const suite = createTestSuite('My First Eval', {
  executor: async (input) => await myLLM(input),
  cases: [{
    input: 'Summarize this document...',
    assertions: [
      (output) => expect(output).toHaveLength({ min: 50, max: 500 }),
      (output) => expect(output).toNotContainPII(),
      (output) => expect(output).toHaveSentiment('neutral'),
    ]
  }]
});

const { total, passed, failed } = await suite.run();
console.log(`Results: ${passed}/${total} passed`);

Python

from evalgate_sdk import create_test_suite, expect

suite = create_test_suite("My First Eval",
    executor=lambda input: my_llm(input),
    cases=[{
        "input": "Summarize this document...",
        "assertions": [
            lambda output: expect(output).to_have_length(min=50, max=500),
            lambda output: expect(output).to_not_contain_pii(),
            lambda output: expect(output).to_have_sentiment("neutral"),
        ]
    }]
)

results = await suite.run()
print(f"Results: {results.passed}/{results.total} passed")

Explore all 20+ assertions including hallucination detection, JSON validation, and profanity checks. View the full assertion library →

Next Steps

Now that you're set up, explore these features:

Assertion Library

20+ built-in assertions for LLM outputs

OpenAI Integration

Automatically trace OpenAI calls

LLM Judge

Use AI to evaluate AI outputs

API Reference

Complete API documentation

Need Help?

View Documentation Contact Support

# Add this to .github/workflows/evalai.yml name: EvalGate CI on: [push, pull_request] jobs: evalai: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - uses: actions/setup-node@v4 - run: npm ci - run: npx evalgate ci --format github --write-results --base main - uses: actions/upload-artifact@v4 if: always() with: name: evalai-results path: .evalai/

npx @evalgate/sdk init # detects repo, creates baseline, installs CI workflow git add evals/ .github/workflows/evalai-gate.yml evalai.config.json git commit -m "chore: add EvalGate regression gate" git push # open a PR → CI blocks regressions

import { AIEvalClient } from '@evalgate/sdk' // Auto-loads from environment variables const client = AIEvalClient.init() // Or with explicit configuration const client = new AIEvalClient({ apiKey: process.env.EVALAI_API_KEY, organizationId: parseInt(process.env.EVALAI_ORGANIZATION_ID!), debug: true // Enable debug logging })

from evalgate_sdk import AIEvalClient # Auto-loads from environment variables client = AIEvalClient.init() # Or with explicit configuration client = AIEvalClient( api_key=os.environ["EVALAI_API_KEY"], organization_id=int(os.environ["EVALAI_ORGANIZATION_ID"]), debug=True )

// Create a trace const trace = await client.traces.create({ name: 'Chat Completion', traceId: 'trace-' + Date.now(), metadata: { userId: 'user-123', model: 'gpt-4' } }) console.log('Trace created:', trace.id) // Add a span to track the LLM call const span = await client.traces.createSpan(trace.id, { name: 'OpenAI API Call', spanId: 'span-' + Date.now(), type: 'llm', startTime: new Date().toISOString(), input: 'What is AI?', output: 'AI is artificial intelligence...', metadata: { model: 'gpt-4', tokens: 150, latency: 1200 } }) console.log('Span created:', span.id)

from evalgate_sdk.types import CreateTraceParams, CreateSpanParams # Create a trace trace = await client.traces.create(CreateTraceParams( name="Chat Completion", trace_id=f"trace-{int(time.time() * 1000)}", metadata={"userId": "user-123", "model": "gpt-4"} )) print(f"Trace created: {trace.id}") # Add a span to track the LLM call span = await client.traces.create_span(trace.id, CreateSpanParams( name="OpenAI API Call", span_id=f"span-{int(time.time() * 1000)}", type="llm", start_time=datetime.now().isoformat(), input="What is AI?", output="AI is artificial intelligence...", metadata={"model": "gpt-4", "tokens": 150, "latency": 1200} )) print(f"Span created: {span.id}")

import { createTestSuite, expect } from '@evalgate/sdk'; const suite = createTestSuite('My First Eval', { executor: async (input) => await myLLM(input), cases: [{ input: 'Summarize this document...', assertions: [ (output) => expect(output).toHaveLength({ min: 50, max: 500 }), (output) => expect(output).toNotContainPII(), (output) => expect(output).toHaveSentiment('neutral'), ] }] }); const { total, passed, failed } = await suite.run(); console.log(`Results: ${passed}/${total} passed`);

from evalgate_sdk import create_test_suite, expect suite = create_test_suite("My First Eval", executor=lambda input: my_llm(input), cases=[{ "input": "Summarize this document...", "assertions": [ lambda output: expect(output).to_have_length(min=50, max=500), lambda output: expect(output).to_not_contain_pii(), lambda output: expect(output).to_have_sentiment("neutral"), ] }] ) results = await suite.run() print(f"Results: {results.passed}/{results.total} passed")