Eval API.
Anatomy of an evaluation
TheEval function defines a complete test suite for your capability. Here’s the basic structure:
Key parameters
data: A function that returns an array of test cases. Each test case has aninput(what you send to your capability) and anexpectedoutput (the ground truth).task: An async function that executes your capability for a given input and returns the output.scorers: An array of scorer functions that evaluate the output against the expected result.metadata: Optional metadata like a description or tags.
Creating collections
Thedata parameter defines your collection of test cases. Start with a small set of examples and grow it over time as you discover edge cases.
Inline collections
For small collections, define test cases directly in the evaluation:External collections
For larger collections, load test cases from external files or databases:Defining the task
Thetask function executes your AI capability for each test case. It receives the input from the test case and should return the output your capability produces.
The task function should generally be the same code you use in your actual capability. This ensures your evaluations accurately reflect real-world behavior.
Creating scorers
Scorers evaluate your capability’s output. They receive theinput, output, and expected values, and return a score (typically 0-1 or boolean).
Custom scorers
Create custom scorers using theScorer wrapper:
Using autoevals
Theautoevals library provides prebuilt scorers for common tasks:
Complete example
Here’s a complete evaluation for a support ticket classification system:src/lib/capabilities/classify-ticket/evaluations/spam-classification.eval.ts
File naming conventions
Name your evaluation files with the.eval.ts extension so they’re automatically discovered by the Axiom CLI:
**/*.eval.{ts,js,mts,mjs,cts,cjs} based on your axiom.config.ts configuration.
What’s next?
- To parameterize your capabilities and run experiments, see Flags and experiments.
- To run evaluations using the CLI, see Run evaluations.