EvalSpec
Agent Runtimes / types/evals / EvalSpec
Interface: EvalSpec
Defined in: types/evals.ts:9
Evaluation benchmark specification.
Properties
category
category:
"Coding"|"Knowledge"|"Reasoning"|"Agentic"|"Safety"
Defined in: types/evals.ts:19
Category: Coding, Knowledge, Reasoning, Agentic, or Safety
description
description:
string
Defined in: types/evals.ts:17
Description of the evaluation
difficulty
difficulty:
"easy"|"medium"|"hard"|"expert"
Defined in: types/evals.ts:27
Difficulty level
id
id:
string
Defined in: types/evals.ts:11
Unique eval identifier
languages
languages:
string[]
Defined in: types/evals.ts:29
Relevant languages
metric
metric:
string
Defined in: types/evals.ts:23
Primary metric (e.g., 'pass@1', 'accuracy', 'success_rate')
name
name:
string
Defined in: types/evals.ts:15
Display name
source
source:
string
Defined in: types/evals.ts:25
Source URL or repository
task_count
task_count:
number
Defined in: types/evals.ts:21
Number of tasks in the benchmark
version?
optionalversion?:string
Defined in: types/evals.ts:13
Version