CI/CD Integration
Integrate LLM evaluation into your continuous integration and deployment pipeline
Why CI/CD for LLMs?
Just like traditional software, your LLM applications need automated testing in the development workflow:
Catch Regressions Early
Detect quality degradation before it reaches production
Faster Iteration
Get immediate feedback on prompt and model changes
Team Confidence
Deploy with confidence knowing tests have passed
Compliance & Audit
Maintain test history and quality standards
🚀 One-Command CI Setup (EvalGate 2.0.0)
With EvalGate 2.0.0, you get a complete CI pipeline in a single command:
name: EvalGate CI
on: [push, pull_request]
jobs:
evalai:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
- run: npm ci
- run: npx evalgate ci --format github --write-results --base main
- uses: actions/upload-artifact@v4
if: always()
with:
name: evalai-results
path: .evalai/That's it! This single command does everything:
Legacy Setup (Pre-2.0.0)
For existing workflows, you can use the traditional regression gate:
Option A: Zero-config — run npx @evalgate/sdk init to auto-generate this workflow.
name: EvalGate CI Gate
on:
pull_request:
branches: [main]
jobs:
eval-gate:
runs-on: ubuntu-latest
permissions:
contents: read
pull-requests: write
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: '20'
- run: npm ci
# Option A: Local gate (no API key needed)
- name: EvalGate regression gate
run: npx evalgate gate --format github
# Option B: Platform gate (requires API key)
# - name: EvalGate quality gate
# env:
# EVALAI_API_KEY: ${{ secrets.EVALAI_API_KEY }}
# run: npx evalgate check --format github --onFail importGitLab CI Configuration
For GitLab users, add this to your .gitlab-ci.yml:
eval-gate:
stage: test
image: node:20
script:
- npm ci
- npx evalgate gate --format json
only:
- merge_requests
- mainSetting Quality Gates
Defining Thresholds
Configure minimum scores for different evaluation criteria:
{
"thresholds": {
"accuracy": 0.85,
"relevance": 0.80,
"safety": 1.0,
"latency_p95": 2000
},
"failOnViolation": true
}Blocking Deployments
When evaluations fail, the CI pipeline will block the merge/deployment until issues are resolved. This ensures only high-quality changes make it to production.
Best Practices
- • Keep tests fast: Use a subset of test cases in CI, run full suite nightly
- • Cache dependencies: Speed up builds by caching npm packages and models
- • Parallel execution: Run independent test suites in parallel when possible
- • Clear reporting: Generate easy-to-read reports showing what failed and why
- • Version control: Store test cases and thresholds in version control
- • Cost monitoring: Track API costs to avoid expensive CI runs
CLI Commands Reference
# Setup (run once)
npx @evalgate/sdk init # scaffolds everything: baseline, CI workflow, config
# Gate commands
npx evalgate gate # run regression gate locally
npx evalgate gate --format github # CI step summary + PR annotations
npx evalgate gate --format json # machine-readable output
# Baseline management
npx evalgate baseline update # re-run tests and update baseline
# Platform gate (requires API key)
npx evalgate check --format github --onFail import
# Diagnostics
npx evalgate doctor # verify CI/CD setup