AI engineering lifecycle
The concepts in AI engineering are best understood within the context of the development lifecycle. While AI capabilities can become highly sophisticated, they typically start simple and evolve through a disciplined, iterative process:1
Prototype a capability
Development starts by defining a task and prototyping a with a prompt to solve it.
2
Evaluate with ground truth
The prototype is then tested against a of reference examples (so called “”) to measure its quality and effectiveness using . This process is known as an .
3
Observe in production
Once a capability meets quality benchmarks, it’s deployed. In production, scorers can be applied to live traffic () to monitor performance and cost in real-time.
4
Iterate with new insights
Insights from production monitoring reveal edge cases and opportunities for improvement. These new examples are used to refine the capability, expand the ground truth collection, and begin the cycle anew.
AI engineering terms
Capability
A generative AI capability is a system that uses large language models to perform a specific task by transforming inputs into desired outputs. Capabilities exist on a spectrum of complexity, ranging from simple to sophisticated architectures:- Single-turn model interactions: A single prompt and response, such as classifying a support ticket’s intent or summarizing a document.
- Workflows: Multi-step processes where each step’s output feeds into the next, such as research → analysis → report generation.
- Single-agent: An agent that can reasons and make decisions to accomplish a goal, such as a customer support agent that can search documentation, check order status, and draft responses.
- Multi-agent: Multiple specialized agents collaborating to solve complex problems, such as software engineering through architectural planning, coding, testing, and review.