WebVoyager Leaderboard·Rank 01·Jun 10 2026

browser-control

browser-control — a shell-native CDP harness — driving Fable 5.

Run

Om Labs’ submissions to the WebVoyager benchmark — 615 live-website tasks across 15 real sites, run with browser-control. Judged by GPT-5.5 (Alumnium vanilla).

99.19%

SOTANew · Jun 10 2026

610/615 passed

Baseline 98.5%Judge GPT-5.5 (Alumnium vanilla)Model claude-fable-5

Run stats

Duration15h 53mavg 1m 33s

Tokens439.3Mavg 714K

Steps3976avg 6.5

About

Methodology

WebVoyager benchmarks agents that browse real, live websites and return natural-language answers. Example: “Search for women’s hiking boots on Amazon filtered to waterproof, 4★+, size 6.” Other notable entrants include OpenAI’s Computer-Using Agent, Google DeepMind’s Project Mariner, and H Company’s Surfer 2, all of which have reported results on this benchmark.

Introduced by He et al. 2024. The browser-control runs (Fable 5, Opus 4.8) share one harness — browser-control, a shell-native CDP tool — and differ only in the driving model. The Jina run used a separate, custom harness that was not open-sourced.

Note — Judged by GPT-5.5 (Alumnium vanilla) on screenshot + response. 615-task scope (643 minus the 24-task Alumnium-removed set and 4 currently-impossible tasks). Self-reported.

Per-site results

Group by

Tasks · 615

615 results

Task Scope

Print Summary

Total615

Passed610

Failed5

The full task log is available in the interactive web page.

Citation

Attribution

Please cite this page as an Om Labs WebVoyager leaderboard submission. For questions, contact Keon Woo Kim <keon@omlabs.xyz>.

@misc{omlabs_webvoyager_2026,
  title = {Om Labs WebVoyager Leaderboard Submission},
  author = {Kim, Keon Woo},
  year = {2026},
  howpublished = {https://webvoyager.omlabs.xyz},
  note = {Contact: keon@omlabs.xyz}
}