✓ spam-classification (4/4 passed)
✓ Test case 1: spam detection
✓ Test case 2: legitimate question
Scorers:
category-match: 100% (4/4)
high-confidence: 75% (3/4)
Results:
Total: 4 test cases
Passed: 4 (100%)
Duration: 3.2s
Cost: $0.0024
View full report:
https://app.axiom.co/your-org/ai-engineering/evaluations?runId=ABC123