Integration Reference
Complete technical reference for wiring external projects into the EvalGate
Generated from source code — every type, endpoint, and method signature below is real.
SDK Package
| npm package | @evalgate/sdk |
| Version | 1.5.0 |
| Source | src/packages/sdk/ |
| Exports | . (main), ./assertions, ./testing, ./integrations/openai, ./integrations/anthropic |
| Peer deps | openai ^4.0.0 (optional), @anthropic-ai/sdk ^0.20.0 (optional) |
| Node | >=16.0.0 |
| CLI | npx evalgate → ./dist/cli/index.js |
AIEvalClient — Constructor & Auth
Option A: Zero-config (reads env vars)
// Env: EVALAI_API_KEY, EVALAI_ORGANIZATION_ID, EVALAI_BASE_URL const client = AIEvalClient.init();
Option B: Explicit config
const client = new AIEvalClient({
apiKey: 'your-api-key', // required (or EVALAI_API_KEY env)
organizationId: 123, // optional (or EVALAI_ORGANIZATION_ID env)
baseUrl: 'https://your-app.vercel.app', // defaults to '' in browser, 'http://localhost:3000' in Node
timeout: 30000, // ms, default 30s
debug: false, // enables verbose logging
logLevel: 'info', // 'debug' | 'info' | 'warn' | 'error'
retry: {
maxAttempts: 3,
backoff: 'exponential', // 'exponential' | 'linear' | 'fixed'
retryableErrors: ['RATE_LIMIT_EXCEEDED', 'TIMEOUT', 'NETWORK_ERROR', 'INTERNAL_SERVER_ERROR']
},
enableBatching: true, // auto-batch requests
batchSize: 10,
batchDelay: 50, // ms
cacheSize: 1000, // GET request cache entries
});Auth pattern: Every request sends Authorization: Bearer <apiKey> header.
Client API Modules
client.traces → TraceAPI client.evaluations → EvaluationAPI client.llmJudge → LLMJudgeAPI client.annotations → AnnotationsAPI client.developer → DeveloperAPI (apiKeys, webhooks, usage) client.organizations → OrganizationsAPI
TraceAPI Methods
Create a trace
client.traces.create({
name: string,
traceId: string,
organizationId?: number, // falls back to client's orgId
status?: string, // 'pending' | 'success' | 'error'
durationMs?: number,
metadata?: Record<string, unknown>,
}) → Promise<Trace>List traces
client.traces.list({
limit?: number, // max 100
offset?: number,
organizationId?: number,
status?: string,
search?: string,
}) → Promise<Trace[]>Get single trace
client.traces.get(id: number) → Promise<TraceDetail>TraceDetail = { trace: Trace, spans: Span[] }
Delete trace
client.traces.delete(id: number) → Promise<{ message: string }>EvaluationAPI Methods
Create evaluation
client.evaluations.create({
name: string,
type: 'unit_test' | 'human_eval' | 'model_eval' | 'ab_test',
category?: string,
description?: string,
organizationId?: number,
}) → Promise<Evaluation>Run evaluation
client.evaluations.run(id: number, {
environment?: string,
metadata?: Record<string, unknown>,
}) → Promise<EvaluationRun>Import results
client.evaluations.importResults(id: number, {
environment: string,
importClientVersion: string,
results: Array<{
testCaseId: number,
status: 'passed' | 'failed' | 'skipped',
output?: string,
latencyMs?: number,
errorMessage?: string,
}>,
}) → Promise<{ runId: number, score: number }>Integration Paths
SDK Integration
- • TypeScript/JavaScript projects
- • Full API coverage with type safety
- • Built-in retry and batching
- • Environment-based configuration
REST API
- • unknown language/framework
- • OpenAPI specification available
- • Standard HTTP methods
- • JSON request/response format
MCP Protocol
- • AI agent integration
- • Tool discovery and execution
- • Cursor, Claude, ChatGPT compatible
- • Structured tool schemas
Webhooks
- • Event-driven integration
- • Real-time notifications
- • Evaluation completion events
- • Custom payload handling
Quick-Start Recipes
Basic Evaluation
import { AIEvalClient } from '@evalgate/sdk';
const client = AIEvalClient.init();
// Create evaluation
const eval = await client.evaluations.create({
name: 'Chatbot Safety Test',
type: 'unit_test',
category: 'safety'
});
// Add test cases
await client.evaluations.addTestCases(eval.id, [
{ input: 'Hello', expectedOutput: 'greeting' },
{ input: 'Help me', expectedOutput: 'assistance' }
]);
// Run evaluation
const run = await client.evaluations.run(eval.id);
console.log('Run ID:', run.id);Tracing LLM Calls
// Create trace
const trace = await client.traces.create({
name: 'Chat Completion',
traceId: 'chat-' + Date.now(),
metadata: { userId: 'user-123', model: 'gpt-4' }
});
// Add span for LLM call
const span = await client.traces.createSpan(trace.id, {
name: 'OpenAI API Call',
type: 'llm',
startTime: new Date().toISOString(),
input: 'What is AI?',
output: 'AI is artificial intelligence...',
metadata: { model: 'gpt-4', tokens: 150, latency: 1200 }
});Python Integration
While the primary SDK is TypeScript-based, you can integrate with Python using the REST API:
import requests
import os
# Configuration
BASE_URL = "https://eval.ai/api"
API_KEY = os.getenv("EVALAI_API_KEY")
ORG_ID = os.getenv("EVALAI_ORGANIZATION_ID")
headers = {
"Authorization": f"Bearer {API_KEY}",
"Content-Type": "application/json"
}
# Create evaluation
response = requests.post(f"{BASE_URL}/evaluations",
json={
"name": "Python Safety Test",
"type": "unit_test",
"organizationId": int(ORG_ID)
},
headers=headers
)
evaluation = response.json()
print(f"Created evaluation: {evaluation['id']}")