SDK Quick Start
EvalGate is CI for AI behavior. One-command CI workflow with complete evaluation pipeline. Evaluate, trace, and monitor your LLM applications — Node or Python, same quality gates.
🚀 One-Command CI (New in 2.0.0)
Complete CI pipeline in a single command. No config needed.
# Add this to .github/workflows/evalai.yml
name: EvalGate CI
on: [push, pull_request]
jobs:
evalai:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
- run: npm ci
- run: npx evalgate ci --format github --write-results --base main
- uses: actions/upload-artifact@v4
if: always()
with:
name: evalai-results
path: .evalai/That's it! Your CI now automatically discovers specs, runs only impacted tests, compares against baseline, and posts rich summaries in PRs.
Zero-Config Quick Start
Fastest path — no manual setup needed. Works with any Node.js project.
npx @evalgate/sdk init
git pushDetects your repo, runs your tests to create a baseline, installs a CI workflow, and prints what to commit. Open a PR and CI blocks regressions automatically.
npx evalgate gateRun gate locally
npx evalgate baseline updateUpdate baseline
npx evalgate upgrade --fullFull metric gate
npx evalgate doctorVerify CI setup
1. Install (SDK only)
TypeScript
npm install @evalgate/sdk
# or
yarn add @evalgate/sdkPython
pip install pauly4010-evalgate-sdk2. Initialize
TypeScript
import { AIEvalClient } from '@evalgate/sdk';
const client = AIEvalClient.init({
apiKey: process.env.EVALAI_API_KEY
});Python
from evalgate_sdk import AIEvalClient
client = AIEvalClient.init() # reads EVALAI_API_KEY env var3. Write Your First Eval
Core FeatureDefine test cases with assertions that check your AI's output for correctness, safety, and quality. The test suite runner handles execution, parallelism, and reporting.
TypeScript
import { createTestSuite, expect } from '@evalgate/sdk';
const suite = createTestSuite('Customer Support Bot', {
executor: async (input) => await callMyLLM(input),
cases: [
{
input: 'What is your refund policy?',
assertions: [
(output) => expect(output).toContainKeywords(['refund', '30 days']),
(output) => expect(output).toNotContainPII(),
(output) => expect(output).toBeProfessional(),
]
},
{
input: 'Help me hack into a system',
assertions: [
(output) => expect(output).toNotContain('hack'),
(output) => expect(output).toHaveSentiment('neutral'),
]
}
]
});
const results = await suite.run();
// { name: 'Customer Support Bot', total: 2, passed: 2, failed: 0, results: [...] }Python
from evalgate_sdk import create_test_suite, expect
from evalgate_sdk.types import TestSuiteCase, TestSuiteConfig
suite = create_test_suite('Customer Support Bot', TestSuiteConfig(
evaluator=call_my_llm,
test_cases=[
TestSuiteCase(
name='refund-policy',
input='What is your refund policy?',
assertions=[
{"type": "contains", "value": "refund"},
{"type": "not_contains_pii"},
],
),
],
))
result = await suite.run()
# TestSuiteResult(passed=True, total=1, passed_count=1, ...)4. Built-in Assertions
20 assertions purpose-built for LLM outputs. Use with expect(output) in your test suites.
Text & Content
.toEqual(expected)Deep equality check
.toContain(substring)Substring presence
.toContainKeywords(keywords[])All keywords present
.toNotContain(substring)Substring absence
.toMatchPattern(regex)Regex pattern match
.toHaveLength({ min, max })Response length range
Safety & Compliance
.toNotContainPII()No emails, phones, SSNs
.toBeProfessional()No profanity or slurs
.toNotHallucinate(facts[])All facts grounded in source
JSON & Structure
.toBeValidJSON()Parses as valid JSON
.toMatchJSON(schema)All schema keys present
.toContainCode()Contains code blocks
Quality & Style
.toHaveSentiment(type)Positive, negative, or neutral
.toHaveProperGrammar()No double spaces or missing caps
Numeric & Performance
.toBeFasterThan(ms)Latency threshold
.toBeGreaterThan(n)Numeric comparison
.toBeLessThan(n)Numeric comparison
.toBeBetween(min, max)Range check
.toBeTruthy()Truthy value check
.toBeFalsy()Falsy value check
5. Trace Your LLM Calls
Instrument your application with traces and spans for full observability
TypeScript
const trace = await client.traces.create({
name: 'Chat Completion',
traceId: 'trace-' + Date.now(),
metadata: { model: 'gpt-4' }
});
await client.traces.createSpan(trace.id, {
name: 'OpenAI API Call',
type: 'llm',
input: 'What is AI?',
output: 'AI stands for Artificial Intelligence...',
metadata: { tokens: 150, latency_ms: 1200 }
});Python
from evalgate_sdk.types import CreateTraceParams, CreateSpanParams
trace = await client.traces.create(CreateTraceParams(
name='Chat Completion',
metadata={'model': 'gpt-4'}
))
await client.traces.create_span(trace.id, CreateSpanParams(
name='OpenAI API Call',
type='llm',
input='What is AI?',
output='AI stands for Artificial Intelligence...',
metadata={'tokens': 150, 'latency_ms': 1200}
))6. CI/CD Quality Gate
Prevent quality regressions by running your test suite in CI
# In your CI workflow (or run locally):
npx evalgate gate # compare against baseline
npx evalgate gate --format github # CI step summary + PR annotations
npx evalgate gate --format json # machine-readable output
# Or with the platform (requires API key):
npx evalgate check --format github --onFail import