Agent Leaderboard
Last updated:
Model | Score | Cost ($) | Runtime (s) |
|---|---|---|---|
Shown are scores averaged over first across all instances per task category (Radar chart to the right). Averaged scores per task are then aggregated across all tasks to get the overall score (table to the left).
Best Accuracy per Dollar on DRB
Best Accuracy per Second on DRB
Commercial AI products
The following heatmap shows the performance of commercial AI products on Deep Research Bench.

