Compare Deepseek Machine Learning
It tops the leaderboard among open-source models and rivals the most advanced closed-source models globally.
0905
72B-Inst
405B-Inst
Sonnet-1022
0513
Architecture
MoE
MoE
Dense
Dense
-
-
# Activated Params
37B
21B
72B
405B
-
-
# Total Params
671B
236B
72B
405B
-
-
English
MMLU (EM)
88.5
80.6
85.3
88.6
88.3
87.2
MMLU-Redux (EM)
89.1
80.3
85.6
86.2
88.9
88.0
MMLU-Pro (EM)
75.9
66.2
71.6
73.3
78.0
72.6
DROP (3-shot F1)
91.6
87.8
76.7
88.7
88.3
83.7
IF-Eval (Prompt Strict)
86.1
80.6
84.1
86.0
86.5
84.3
GPQA-Diamond (Pass@1)
59.1
41.3
49.0
51.1
65.0
49.9
SimpleQA (Correct)
24.9
10.2
9.1
17.1
28.4
38.2
FRAMES (Acc.)
73.3
65.4
69.8
70.0
72.5
80.5
LongBench v2 (Acc.)
48.7
35.4
39.4
36.1
41.0
48.1
Code
HumanEval-Mul (Pass@1)
82.6
77.4
77.3
77.2
81.7
80.5
LiveCodeBench (Pass@1-COT)
40.5
29.2
31.1
28.4
36.3
33.4
LiveCodeBench (Pass@1)
37.6
28.4
28.7
30.1
32.8
34.2
Codeforces (Percentile)
51.6
35.6
24.8
25.3
20.3
23.6
SWE Verified (Resolved)
42.0
22.6
23.8
24.5
50.8
38.8
Aider-Edit (Acc.)
79.7
71.6
65.4
63.9
84.2
72.9
Aider-Polyglot (Acc.)
49.6
18.2
7.6
5.8
45.3
16.0
Math
AIME 2024 (Pass@1)
39.2
16.7
23.3
23.3
16.0
9.3
MATH-500 (EM)
90.2
74.7
80.0
73.8
78.3
74.6
CNMO 2024 (Pass@1)
43.2
10.8
15.9
6.8
13.1
10.8
Chinese
CLUEWSC (EM)
90.9
90.4
91.4
84.7
85.4
87.9
C-Eval (EM)
86.5
79.5
86.1
61.5
76.7
76.0
C-SimpleQA (Correct)
64.1
54.1
48.4
50.4
51.3
59.3
Last updated