Skip to main content

evals.spec_adapter

Convert agentspec eval configuration into pydantic-evals Datasets.

EvalCaseSpec Objects

@dataclass
class EvalCaseSpec()

A single evaluation case derived from the agentspec.

EvalSuiteSpec Objects

@dataclass
class EvalSuiteSpec()

A suite of eval cases for one named evaluation.

parse_eval_spec

def parse_eval_spec(eval_spec: list[dict[str, Any]]) -> list[EvalSuiteSpec]

Parse the agentspec evals YAML list into strongly-typed suite specs.

Parameters

eval_spec : list[dict] The evals section from the agentspec, e.g.::

    [
{"name": "KPI Accuracy", "category": "coding", "task_count": 400},
{"name": "Variance Quality", "category": "reasoning", "task_count": 200},
]

build_dataset_from_spec

async def build_dataset_from_spec(
eval_spec: list[dict[str, Any]],
agent_system_prompt: str | None = None,
tool_schemas: list[dict[str, Any]] | None = None) -> Any

Convert agentspec eval config into a pydantic-evals Dataset.

If pydantic_evals is available, returns a Dataset instance. Otherwise returns a list of EvalSuiteSpec for manual processing.

Parameters

eval_spec : list[dict] The evals config from the agentspec. agent_system_prompt : str | None System prompt of the agent (used for synthetic case generation). tool_schemas : list[dict] | None JSON schemas of the agent's tools (for grounding case generation).