
This market predicts what the highest score on the SWE-Bench Pro public dataset leaderboard will be as of January 1, 2027.
Current top performers on SWE-Bench Pro public dataset (as of October 21 2025):
claude-4-5-Sonnet: 43.6%
claude-4-Sonnet: 42.7%
Resolution Criteria: This market will resolve to the score range that contains the highest score on the official SWE-Bench Pro public dataset leaderboard (https://scale.com/leaderboard/swe_bench_pro_public) on January 1, 2027.
The gap between Claude 4 Sonnet and Claude Opus 4.5 was half a year and in the time the score increase was 3.1%.
The gap between GPT-5.2 and GPT-5.3 codex, released 2 months apart, was 0.4% (per OpenAI Reporting). Given this data, how on Earth does this market think there is a 78% chance the score is going to increase at least 25% in the next 10 months?