README
Agent Runtimes / specs/evals
specs/evals
Variables
- AGENTBENCH_EVAL_SPEC_0_0_1
- EVAL_CATALOG
- GPQA_DIAMOND_EVAL_SPEC_0_0_1
- HUMANEVAL_EVAL_SPEC_0_0_1
- MMLU_EVAL_SPEC_0_0_1
- SWE_BENCH_EVAL_SPEC_0_0_1
- SWE_BENCH_VERIFIED_EVAL_SPEC_0_0_1
- TOOLBENCH_EVAL_SPEC_0_0_1
- TRUTHFULQA_EVAL_SPEC_0_0_1