| 1 |
o1-mini |
1237 (Coding) |
Coding |
| 2 |
o3-mini |
1137 (Coding), 40.09/39.89 (MultiChallenge) |
Coding, MultiChallenge |
| 3 |
GPT-4o (November 2024) |
1132 (Coding) |
Coding |
| 4 |
o1-preview |
1123 (Coding) |
Coding |
| 5 |
Gemini 2.0 Flash Experimental (December 2024) |
1111 (Coding) |
Coding |
| 6 |
Gemini 2.0 Pro (December 2024) |
1109 (Coding) |
Coding |
| 7 |
Gemini 2.0 Flash Thinking (January 2025) |
1108 (Coding), 37.78 (MultiChallenge) |
Coding, MultiChallenge |
| 8 |
DeepSeek R1 |
1100 (Coding) |
Coding |
| 9 |
o1 (December 2024) |
1083 (Coding), 44.93 (MultiChallenge) |
Coding, MultiChallenge |
| 10 |
Claude 3.5 Sonnet (June 2024) |
1079 (Coding), 96.60 (Math) |
Coding, Math |
| 11 |
GPT-4o (August 2024) |
1045 (Coding), 95.68 (Math) |
Coding, Math |
| 12 |
GPT-4o (May 2024) |
1036 (Coding) |
Coding |
| 13 |
GPT-4 Turbo Preview |
1034 (Coding), 95.10 (Math) |
Coding, Math |
| 14 |
Mistral Large 2 |
1029 (Coding), 93.94 (Math) |
Coding, Math |
| 15 |
Llama 3.1 405B Instruct |
1022 (Coding), 95.60 (Math) |
Coding, Math |
| 16 |
Gemini 1.5 Pro (August 27, 2024) |
1007 (Coding), 94.69 (Math) |
Coding, Math |
| 17 |
Claude 3 Opus |
95.19 (Math) |
Math |
| 18 |
Claude 3 Sonnet |
93.28 (Math) |
Math |
| 19 |
Claude 3.7 Sonnet Thinking (February 2025) |
51.58 (MultiChallenge) |
MultiChallenge |
| 20 |
GPT-4.5 Preview (February 2025) |
43.77 (MultiChallenge) |
MultiChallenge |