Skip to main content

evals.report

Evaluation report formatting and persistence.

CaseResult Objects

@dataclass
class CaseResult()

Result of a single evaluation case.

ReportSummary Objects

@dataclass
class ReportSummary()

Aggregated summary of an eval report.

EvalReportData Objects

@dataclass
class EvalReportData()

Full evaluation report data.

format_report

def format_report(eval_id: str,
agent_id: str,
case_results: list[CaseResult],
started_at: datetime | None = None,
metadata: dict[str, Any] | None = None) -> EvalReportData

Build a structured eval report from individual case results.

Parameters

eval_id : str Unique identifier for this eval run. agent_id : str The agent that was evaluated. case_results : list[CaseResult] Results for each evaluation case. started_at : datetime | None When the eval run started. metadata : dict | None Additional metadata (spec version, model, etc.).

save_report_json

def save_report_json(report: EvalReportData,
output_dir: str = _DEFAULT_EVALS_DIR) -> str

Persist an eval report as JSON. Returns the file path.