Skip to main content

evals.runner

EvalRunner — orchestrates evaluation suites against registered agents.

EvalReport Objects

@dataclass
class EvalReport()

Public-facing eval report returned by EvalRunner.run().

EvalRunner Objects

class EvalRunner()

Executes evaluation suites against registered agents.

Parameters

agent_id : str Identifier of the agent to evaluate. agent_fn : Callable[[dict], Awaitable[str]] | None Async function that takes {"prompt": ...} and returns agent output. If not provided, a no-op stub is used. output_dir : str Directory for storing eval reports.

run

async def run(eval_spec: list[dict[str, Any]],
agent_system_prompt: str | None = None,
tool_schemas: list[dict[str, Any]] | None = None) -> EvalReport

Run the full evaluation suite.

Parameters

eval_spec : list[dict] The evals section from the agentspec. agent_system_prompt : str | None System prompt for synthetic case generation. tool_schemas : list[dict] | None Tool JSON schemas for grounding.