Skip to main content

EvalSpec

Agent Runtimes


Agent Runtimes / types/evals / EvalSpec

Interface: EvalSpec

Defined in: types/evals.ts:9

Evaluation benchmark specification.

Properties

category

category: "Coding" | "Knowledge" | "Reasoning" | "Agentic" | "Safety"

Defined in: types/evals.ts:19

Category: Coding, Knowledge, Reasoning, Agentic, or Safety


description

description: string

Defined in: types/evals.ts:17

Description of the evaluation


difficulty

difficulty: "easy" | "medium" | "hard" | "expert"

Defined in: types/evals.ts:27

Difficulty level


id

id: string

Defined in: types/evals.ts:11

Unique eval identifier


languages

languages: string[]

Defined in: types/evals.ts:29

Relevant languages


metric

metric: string

Defined in: types/evals.ts:23

Primary metric (e.g., 'pass@1', 'accuracy', 'success_rate')


name

name: string

Defined in: types/evals.ts:15

Display name


source

source: string

Defined in: types/evals.ts:25

Source URL or repository


task_count

task_count: number

Defined in: types/evals.ts:21

Number of tasks in the benchmark


version?

optional version?: string

Defined in: types/evals.ts:13

Version