EvalGate

Integration Reference

Complete technical reference for wiring external projects into the EvalGate

Generated from source code — every type, endpoint, and method signature below is real.

SDK Package

npm package	`@evalgate/sdk`
Version	`1.5.0`
Source	`src/packages/sdk/`
Exports	`.` (main), `./assertions`, `./testing`, `./integrations/openai`, `./integrations/anthropic`
Peer deps	`openai ^4.0.0` (optional), `@anthropic-ai/sdk ^0.20.0` (optional)
Node	`>=16.0.0`
CLI	`npx evalgate` → `./dist/cli/index.js`

AIEvalClient — Constructor & Auth

Option A: Zero-config (reads env vars)

// Env: EVALAI_API_KEY, EVALAI_ORGANIZATION_ID, EVALAI_BASE_URL
const client = AIEvalClient.init();

Option B: Explicit config

const client = new AIEvalClient({
  apiKey: 'your-api-key',           // required (or EVALAI_API_KEY env)
  organizationId: 123,              // optional (or EVALAI_ORGANIZATION_ID env)
  baseUrl: 'https://your-app.vercel.app', // defaults to '' in browser, 'http://localhost:3000' in Node
  timeout: 30000,                   // ms, default 30s
  debug: false,                     // enables verbose logging
  logLevel: 'info',                 // 'debug' | 'info' | 'warn' | 'error'
  retry: {
    maxAttempts: 3,
    backoff: 'exponential',         // 'exponential' | 'linear' | 'fixed'
    retryableErrors: ['RATE_LIMIT_EXCEEDED', 'TIMEOUT', 'NETWORK_ERROR', 'INTERNAL_SERVER_ERROR']
  },
  enableBatching: true,             // auto-batch requests
  batchSize: 10,
  batchDelay: 50,                   // ms
  cacheSize: 1000,                  // GET request cache entries
});

Auth pattern: Every request sends Authorization: Bearer <apiKey> header.

Client API Modules

client.traces          → TraceAPI
client.evaluations     → EvaluationAPI
client.llmJudge        → LLMJudgeAPI
client.annotations     → AnnotationsAPI
client.developer       → DeveloperAPI (apiKeys, webhooks, usage)
client.organizations   → OrganizationsAPI

TraceAPI Methods

Create a trace

client.traces.create({
  name: string,
  traceId: string,
  organizationId?: number,  // falls back to client's orgId
  status?: string,          // 'pending' | 'success' | 'error'
  durationMs?: number,
  metadata?: Record<string, unknown>,
}) → Promise<Trace>

List traces

client.traces.list({
  limit?: number,       // max 100
  offset?: number,
  organizationId?: number,
  status?: string,
  search?: string,
}) → Promise<Trace[]>

Get single trace

client.traces.get(id: number) → Promise<TraceDetail>

TraceDetail = { trace: Trace, spans: Span[] }

Delete trace

client.traces.delete(id: number) → Promise<{ message: string }>

EvaluationAPI Methods

Create evaluation

client.evaluations.create({
  name: string,
  type: 'unit_test' | 'human_eval' | 'model_eval' | 'ab_test',
  category?: string,
  description?: string,
  organizationId?: number,
}) → Promise<Evaluation>

Run evaluation

client.evaluations.run(id: number, {
  environment?: string,
  metadata?: Record<string, unknown>,
}) → Promise<EvaluationRun>

Import results

client.evaluations.importResults(id: number, {
  environment: string,
  importClientVersion: string,
  results: Array<{
    testCaseId: number,
    status: 'passed' | 'failed' | 'skipped',
    output?: string,
    latencyMs?: number,
    errorMessage?: string,
  }>,
}) → Promise<{ runId: number, score: number }>

Integration Paths

SDK Integration

• TypeScript/JavaScript projects
• Full API coverage with type safety
• Built-in retry and batching
• Environment-based configuration

REST API

• unknown language/framework
• OpenAPI specification available
• Standard HTTP methods
• JSON request/response format

MCP Protocol

• AI agent integration
• Tool discovery and execution
• Cursor, Claude, ChatGPT compatible
• Structured tool schemas

Webhooks

• Event-driven integration
• Real-time notifications
• Evaluation completion events
• Custom payload handling

Quick-Start Recipes

Basic Evaluation

import { AIEvalClient } from '@evalgate/sdk';

const client = AIEvalClient.init();

// Create evaluation
const eval = await client.evaluations.create({
  name: 'Chatbot Safety Test',
  type: 'unit_test',
  category: 'safety'
});

// Add test cases
await client.evaluations.addTestCases(eval.id, [
  { input: 'Hello', expectedOutput: 'greeting' },
  { input: 'Help me', expectedOutput: 'assistance' }
]);

// Run evaluation
const run = await client.evaluations.run(eval.id);
console.log('Run ID:', run.id);

Tracing LLM Calls

// Create trace
const trace = await client.traces.create({
  name: 'Chat Completion',
  traceId: 'chat-' + Date.now(),
  metadata: { userId: 'user-123', model: 'gpt-4' }
});

// Add span for LLM call
const span = await client.traces.createSpan(trace.id, {
  name: 'OpenAI API Call',
  type: 'llm',
  startTime: new Date().toISOString(),
  input: 'What is AI?',
  output: 'AI is artificial intelligence...',
  metadata: { model: 'gpt-4', tokens: 150, latency: 1200 }
});

Python Integration

While the primary SDK is TypeScript-based, you can integrate with Python using the REST API:

import requests
import os

# Configuration
BASE_URL = "https://eval.ai/api"
API_KEY = os.getenv("EVALAI_API_KEY")
ORG_ID = os.getenv("EVALAI_ORGANIZATION_ID")

headers = {
    "Authorization": f"Bearer {API_KEY}",
    "Content-Type": "application/json"
}

# Create evaluation
response = requests.post(f"{BASE_URL}/evaluations", 
    json={
        "name": "Python Safety Test",
        "type": "unit_test",
        "organizationId": int(ORG_ID)
    },
    headers=headers
)

evaluation = response.json()
print(f"Created evaluation: {evaluation['id']}")

Explore Further

API Contract MCP Integration Full API Reference