evals.runner
EvalRunner — orchestrates evaluation suites against registered agents.
EvalReport Objects
@dataclass
class EvalReport()
Public-facing eval report returned by EvalRunner.run().
EvalRunner Objects
class EvalRunner()
Executes evaluation suites against registered agents.
Parameters
agent_id : str
Identifier of the agent to evaluate.
agent_fn : Callable[[dict], Awaitable[str]] | None
Async function that takes {"prompt": ...} and returns agent output.
If not provided, a no-op stub is used.
output_dir : str
Directory for storing eval reports.
run
async def run(eval_spec: list[dict[str, Any]],
agent_system_prompt: str | None = None,
tool_schemas: list[dict[str, Any]] | None = None) -> EvalReport
Run the full evaluation suite.
Parameters
eval_spec : list[dict]
The evals section from the agentspec.
agent_system_prompt : str | None
System prompt for synthetic case generation.
tool_schemas : list[dict] | None
Tool JSON schemas for grounding.