Compare Deepseek Machine Learning

It tops the leaderboard among open-source models and rivals the most advanced closed-source models globally.

Benchmark (Metric)
DeepSeek V3
DeepSeek V2.5
Qwen2.5
Llama3.1
Claude-3.5
GPT-4o

0905

72B-Inst

405B-Inst

Sonnet-1022

0513

Architecture

MoE

MoE

Dense

Dense

-

-

# Activated Params

37B

21B

72B

405B

-

-

# Total Params

671B

236B

72B

405B

-

-

English

MMLU (EM)

88.5

80.6

85.3

88.6

88.3

87.2

MMLU-Redux (EM)

89.1

80.3

85.6

86.2

88.9

88.0

MMLU-Pro (EM)

75.9

66.2

71.6

73.3

78.0

72.6

DROP (3-shot F1)

91.6

87.8

76.7

88.7

88.3

83.7

IF-Eval (Prompt Strict)

86.1

80.6

84.1

86.0

86.5

84.3

GPQA-Diamond (Pass@1)

59.1

41.3

49.0

51.1

65.0

49.9

SimpleQA (Correct)

24.9

10.2

9.1

17.1

28.4

38.2

FRAMES (Acc.)

73.3

65.4

69.8

70.0

72.5

80.5

LongBench v2 (Acc.)

48.7

35.4

39.4

36.1

41.0

48.1

Code

HumanEval-Mul (Pass@1)

82.6

77.4

77.3

77.2

81.7

80.5

LiveCodeBench (Pass@1-COT)

40.5

29.2

31.1

28.4

36.3

33.4

LiveCodeBench (Pass@1)

37.6

28.4

28.7

30.1

32.8

34.2

Codeforces (Percentile)

51.6

35.6

24.8

25.3

20.3

23.6

SWE Verified (Resolved)

42.0

22.6

23.8

24.5

50.8

38.8

Aider-Edit (Acc.)

79.7

71.6

65.4

63.9

84.2

72.9

Aider-Polyglot (Acc.)

49.6

18.2

7.6

5.8

45.3

16.0

Math

AIME 2024 (Pass@1)

39.2

16.7

23.3

23.3

16.0

9.3

MATH-500 (EM)

90.2

74.7

80.0

73.8

78.3

74.6

CNMO 2024 (Pass@1)

43.2

10.8

15.9

6.8

13.1

10.8

Chinese

CLUEWSC (EM)

90.9

90.4

91.4

84.7

85.4

87.9

C-Eval (EM)

86.5

79.5

86.1

61.5

76.7

76.0

C-SimpleQA (Correct)

64.1

54.1

48.4

50.4

51.3

59.3

Last updated