华中科技大学白翔团队联合华南理工大学、阿德莱德大学和字节跳动联合推出新一代OCR评测基准OCRBench v2,并发布最新私有数据榜单(2025年9月)。

Seed1.6-vision、Qwen3-Omni-30B-A3B-Instruct和Gemini-2.5-Pro包揽了中英榜单前三名,即便是这些最先进的模型,其平均分也仅勉强达到“60分”的及格线,它们仍难以完全满足复杂多样的真实应用需求。

OCRBench v2榜单测试范围广泛,囊括了从2023年至2025年间的全球58个主流LMMs,有来自谷歌(Gemini 2.5 Pro)、字节跳动(Seed1.6-vision)、OpenAI(GPT-5)的闭源模型,也有来自阿里巴巴(Qwen-VL)、上海人工智能实验室(InternVL)的优秀开源LMMs,能看到LMMs在OCR任务上的表现有着显著进步。

榜单给出了LMMs在文本定位、知识推理等8个核心能力维度的细分指标,直观量化了模型在各类应用场景上的真实表现。

OCRBench v2私有数据英文榜单(25年9月)

Rank Method Venue Open-source LLM Size Average Recognition Referring Spotting Extraction Parsing Calculation Understanding Reasoning
1 Seed1.6-vision No 62.2 70.1 59.8 38.0 89.0 22.3 84 76.4 58.0
2 Qwen3-Omni-30B-A3B Instruct Arxiy 2025 Yes 30B 61.3 72.3 62.0 45.6 93.5 20.8 67.0 74.1 55.3
3 Gemini-2.5-Pro No 59.3 70.9 45.8 13.4 93.7 26.9 84.6 75.8 63.0
4 Liama-3.1-Memotron-Nano VL-8B-V1 Yes 8B 56.4 62.9 61.3 68.6 88.2 10.0 44.1 75.3 41.0
5 GPT5-2025-08-07 No 55.5 69.3 28.2 6.5 90.1 24.8 83.7 76.4 65.4
6 Ovis2.5-8B Arxiv 2025 Yes 8B 54.1 63.6 31.2 1.7 89.8 24.8 85.4 75.3 61.0
7 Gemini1.5-Pro Arxiv 2024 No 51.6 59.1 41.2 6.6 89.5 22.4 54.7 78.8 60.3
8 SAIL-VL2-8B Arxiv 2025 Yes 8B 49.3 67.4 30.4 2.8 91.0 23.7 51.6 75.6 52.0
9 MiniCPM-V-4.5-8B Arxiv 2025 Yes 8B 48.4 60.3 25.7 2.2 91.1 23.4 54.1 78.6 52.0
10 GPT-4o Arxiv 2024 No 47.6 58.6 23.4 0.0 87.4 23.1 51.6 74.4 62.3
11 Claude3.5-sonnet No 47.5 52.9 24.9 2.5 86.9 23.8 61.4 74.4 53.0
12 InternVL3.5-14B Arxiv 2025 Yes 14B 47.1 50.4 24.2 1.7 88.0 23.2 61.0 72.6 55.8
13 Step-1V No 46.8 56.7 27.4 2.6 86.3 33.3 42.6 76.6 48.7
14 InterVL3-14B Yes 14B 46.8 55.8 24.5 2.1 89.3 21.0 59.5 72.0 50.0
15 Ovis2-8B Yes 7B 46.1 54.2 20.9 0.0 83.6 24.2 54.7 74.1 57.3
16 InterVL3.5-8B Arxiv 2025 Yes 8B 46.0 49.6 25.1 0.8 86.0 21.5 63.9 75.5 45.5
17 LLavA-OneVision-1.5-8B-Instruct Yes 8B 46.0 54.9 22.6 0 84.4 23.5 58.0 74.5 50.3
18 IntemVL3-8B Yes 8B 45.3 49.7 22.3 0.2 86.8 22.4 57.0 70.7 53.0
19 Grok4 No 45.0 49.7 20.4 3.6 63.0 22.4 77.8 71.0 52.0
20 GPT-4o-mini No 44.1 55.3 21.8 0.0 85.4 20.6 45.2 75.5 49.0
21 SAIL-VL-1.6-8B Arxiv 2025 Yes 8B 43.1 56.7 24.1 2.2 79.3 22.8 45.4 69.2 45.3
22 InterVL2.5-26B Anxiv 2024 Yes 20B 42.6 53.5 21.4 0 84.0 21.4 51.5 67.5 41.5
23 WeThink-Qwen2.5VL-7B Anxiv 2025 Yes 7B 42.5 59.0 23.7 3.6 34.0 25.2 71.9 73.4 49.0
24 Claude-sonnet.4-2025051 No 42.4 56.2 24.6 1.4 78.1 24.7 36.8 78.1 39.5
25 Qwen2-V-7B Arxiy 2024 Yes 8B 42.3 47.0 42.0 1.5 90.2 13.7 36.4 71.1 36.6
26 Qwen2.5-VL-7B Arxiv 2025 Yes 8B 41.8 51.5 24.5 3.1 64.8 13.1 53.3 78.6 45.5
27 InterVL2-26B SCIS 2024 Yes 20B 41.8 56.0 21.2 0 80.5 23.9 40.3 72.1 40.7
28 MiniCPM-o-2.6 Yes 8B 41.6 54.1 24.7 0.3 74.4 17.6 39.2 75.7 47.0
29 Deepseek-VL2-Small Arxiv 2024 Yes 16B 41.0 56.6 23.7 0 86.4 18.9 30.6 72.2 39.5
30 InternVL2.5-8B Arxiy 2024 Yes 8B 40.5 48.9 21.2 0 82.1 20.3 41.2 67.8 42.3
31 Pixtral-12B Arxiv 2024 Yes 12B 38.4 45.1 21.8 0 71.6 21.7 30.4 77.3 39.5
32 Phi-4-MultiModal Arxiv 2025 Yes 5.6B 38.1 58.4 19.0 0 53.5 38.7 28.7 66.8 39.8
33 Ovis1.6-3B Arxiv 2024 Yes 3B 38.0 48.5 19.5 0 69.2 20.7 22.1 74.6 49.5
34 GLM-4v-9B Arxiv 2024 Yes 9B 37.1 52.7 20.6 0 79.4 15.9 21.5 74.7 32.0
35 IntermVL2-8B SCIS 2024 Yes 8B 36.1 43.0 21.6 0 70.2 19.2 35.6 65.9 33.6
36 Molmo-7B CVPR2025 Yes 8B 33.9 40.8 19.5 0 51.7 10.0 33.9 67.0 48.0
37 XComposer2-4KHD NIPS 2025 Yes 7B 33.9 39.5 12.0 0 69.7 26.0 20.2 68.2 35.8
38 LLaVA-OV-7B Arxiv 2024 Yes 8B 33.7 45.4 18.5 0 60.0 15.5 32.0 59.0 39.3
39 MiniCPM-V-2.6 Arxiv 2024 Yes 8B 33.0 52.2 18.6 0.3 45.8 19.6 20.9 68.9 37.3
40 Cambrian-1-8B NIPS 2025 Yes 8B 32.3 44.0 19.0 0 52.3 19.0 20.7 64.0 39.3
41 Kimi-VL-A3B-16B Arxiy 2025 Yes 16B 32.1 49.1 13.5 0 28.8 21.9 37.6 69.4 36.2
42 LLaVA-Next-8B Yes 8B 28.5 41.4 17.0 0 49.0 12.9 16.1 60.9 30.5
43 ldefics3-8B NeurIPS 2024 Workshop Yes 8B 26.0 37.4 13.0 0 28.9 19.4 21.1 65.4 21.8
44 Eagle-X5-7B ICLR 2025 Yes 8B 25.7 34.6 18.5 9.7 18.5 24.0 63.1 37.0
45 Qwen-VL-chat Arxiv 2023 Yes 8B 25.7 34.1 12.6 0.1 42.6 19.5 18.4 58.3 20.3
46 Qwen-VL Arxiv 2023 Yes 8B 24.8 35.9 4.2 0 38.7 28.5 13.8 60.1 16.9
47 Deepseek.VL-7B Arxiv 2024 Yes 7B 24.5 33.5 13.7 0 19.1 11.7 24.8 60.5 32.5
48 Monkey CVPR 2024 Yes 8B 24.2 31.5 0.1 0 34.4 26.3 17.7 61.4 22.4
49 DocOwt2 Arxiv 2024 Yes 7B 23.4 25.4 7.5 0 47.1 26.2 8.3 52.8 19.5
50 TextMonkey Arxiv 2024 Yes 8B 23.4 39.8 1.6 0 27.6 24.8 10.2 62.3 21.2
51 VILA1.5-8B CVPR2024 Yes 8B 23.2 36.0 14.5 0 26.0 17.4 20.3 44.7 27.0
52 EMU2-chat CVPR 2024 Yes 37B 20.2 34.3 0 0 20.4 21.3 20.3 47.1 18.3
53 CogVLM-chat NIPS 2024 Yes 7B 19.9 40.8 0 0 1.6 18.6 10.9 60.2 26.8
54 YI-VL-6B Arxiy 2024 Yes 6B 19.7 31.1 4.0 0 23.4 22.5 18.1 43.0 15.5
55 mPLUG-Owl3 Arxiv 2024 Yes 8B 16.5 34.9 17.0 0 12.0 14.9 24.1 50.7 25.5
56 Janus-1.3B CVPR2025 Yes 1.3B 14.3 32.6 0 0 12.0 14.9 24.1 50.7 25.5
57 UReader EMNLP finding 2023 Yes 7B 14.1 20.9 0 0 0 20.7 11.3 39.0 20.8
58 LLaVAR Arxiv 2023 Yes 13B 12.4 13.8 0 0 8.3 15.2 4.4 42.4 15.0

OCRBench v2私有数据中文榜单(25年9月)

Rank Method Venue Open-source LLM Size Average Recognition Extraction Parsing Understanding Reasoning
1 Gemini-2.5-Pro No 62.2 72.0 74.0 35.2 90.0 39.7
2 Seed1.6-vision No 60.5 68.1 74.1 34.0 80.0 46.6
3 Qwen3-Omni-30B-A3B-Instruct Arxiv 2025 Yes 30B 60.0 71.0 80.3 9 92.0 47.0
4 MiniCPM-V-4.5-8B Arxiv2025 Yes 8B 58.8 70.1 70.1 32.5 92.0 29.4
5 SAIL-VL2-8B Arxiv2025 Yes 8B 57.6 63.4 66.4 38.9 88.0 31.1
6 Ovis2-8B Yes 7B 56.0 61.0 67.7 43.6 82.0 25.6
7 WeThink-Qwen2.5VL-7B Arxiv 2025 Yes 7B 55.8 31.9 79.6 55.5 80.0 31.8
8 Gemini1.5-Pro Arxiv2024 No 55.5 71.4 63.8 30.5 82.0 29.9
9 Kimi-VL-A3B-16B Arxiv 2025 Yes 16B 54.1 54.0 71.1 32.5 84.0 28.7
10 Step-1V No 53.4 65.2 64.9 33.1 78.0 25.5
11 InterVL3-14B Arxiv 2025 Yes 14B 52.8 62.1 59.5 33.2 80.0 29.2
12 InternVL3.5-14B Arxiy2025 Yes 14B 51.9 61.1 57.6 31.5 82.0 27.1
13 GLM-4v-9B Arxiy2024 Yes 9B 51.7 60.6 65.2 32.4 82.0 18.2
14 IntemVL3.5-8B Arxiv 2025 Yes 8B 50.3 62.4 58.7 28.8 72.0 29.6
15 Qwen2.5-VL-7B Arxiv2025 Yes 8B 49.5 24.4 78.9 33.1 82.0 29.0
16 InternVL3-8B Yes 8B 49.0 57.7 55.8 29.9 72.0 29.4
17 Claude3.5-sonnet No 48.4 34.2 62.5 35.2 78.0 32.2
18 DeepSeek-VL2-Small Arxiv2024 Yes 16B 48.1 51.6 56.3 27.8 79.6 25.3
19 MiniCPM-V-2.6 Arxiv2024 Yes 8B 47.7 53.1 53.2 32.8 76.0 23.4
20 MiniCPM-o-2.6 Yes 8B 47.7 54.0 62.4 24.1 68.0 29.8
21 Claude-sonnet4-20250514 No 47.3 40.0 58.9 34.2 76.0 27.3
22 GPT-5-2025-08-07 No 45.7 39.2 55.7 30.2 70.0 33.3
23 GPT-4o Arxiv2024 No 45.7 41.7 52.1 29.0 76.0 29.4
24 Qwen2-V-7B Arxiy2024 Yes 8B 44.7 23.7 63.5 27.9 80.0 28.5
25 LLaVA-OneVision-1.5-8B-Instruct Yes 8B 43.8 46.5 42.8 35.8 76.0 17.9
26 InternVL2.5-8B Arxiv2024 Yes 8B 42.8 42.8 47.9 27.3 80.0 23.5
27 SAIL-VL-1.6-8B Arxiv2025 Yes 8B 42.6 35.8 41.5 35.7 76.0 23.9
28 InternVL2.5-26B Arxiy2024 Yes 20B 41.9 40.2 42.7 25.6 74.0 27.0
29 InterVL2-8B SCIS 2024 Yes 8B 41.3 35.2 42.8 26.1 78.0 24.4
30 Liama-3.1-Nemotron-Nano-VL-8B-V1 Yes 8B 40.1 38.2 54.9 26.6 66.0 14.8
31 IntermVL2-26B SCIS 2024 Yes 20B 38.1 20.4 50.7 29.0 76.0 14.5
32 GPT-4o-mini No 37.4 20.0 53.6 27.9 66.0 19.6
33 Phi4-MutiModal Arxiv2025 Yes 5.6B 37.3 30.5 40.5 42.7 56.0 16.9
34 XComposer2-4KHD NIPS 2025 Yes 8B 32.4 12.9 38.6 37.5 60.0 13.1
35 Ovis1.6-3B Arxiv2024 Yes 3B 31.7 22.5 33.3 31.5 54.0 17.0
36 Grok4 No 22.7 8.1 33.3 16.1 40.0 16.1
37 Monkey CVPR2024 Yes 8B 21.5 1.5 28.4 29.1 40.0 8.3
38 TextMonkey Arxiv2024 Yes 8B 21.5 10.5 15.2 30.2 44.0 7.6
39 Cambrian-1-8B NIPS 2025 Yes 8B 18.5 2.4 19.8 26.7 36.0 7.6
40 LLaVA-OV-7B Arxiy2024 Yes 8B 17.4 5.4 13.6 20.3 34.0 13.6
41 mPLUG-Ow3 Arxiv2024 Yes 8B 16.5 1.6 27.4 27.3 16.0 10.0
42 Pixtral-12B Arxiv2024 Yes 12B 16.0 6.2 22.3 11.4 26.0 14.0
43 Qwen-VL-chat Arxiv2023 Yes 8B 16.5 9.1 3.6 18.9 44.0 7.1
44 Idefic3-8B NeurlIPS 2024 Workshop Yes 8B 15.6 2.9 29.0 12.3 26.0 7.9
45 Qwen-VL Arxiv 2023 Yes 8B 15.6 4.3 0 30.6 38.0 5.1
46 Molmo-7B CVPR 2025 Yes 8B 15.0 3.4 29.8 6.6 24.0 11.1
47 DocOw2 Arxiv2024 Yes 7B 14.4 1.0 17.8 29.4 20.0 3.9
48 Deepseek-VL-7B Arxiv 2024 Yes 7B 13.7 3.2 14.7 10.7 30.0 9.8
49 CogVLM-chat NIPS 2024 Yes 7B 12.8 2.4 16.2 22.5 20.0 3.1
50 Eagle-X5-7B ICLR 2025 Yes 8B 12.3 1.9 16.1 13.6 22.0 8.1
51 VILA1.5-8B CVPR2024 Yes 8B 11.0 1.4 9.1 22.2 16.0 6.4
52 Y-VL-6B Arxiv 2024 Yes 6B 10.4 1.6 6.4 28.8 10.0 5.3
53 LLaVA-Nex-8B Yes 8B 9.2 2.8 0.9 14.9 20.0 7.4
54 UReader EMNLP finding 2023 Yes 7B 9.0 0.3 2.0 28.1 12.0 2.4
55 LLaVAR Arxiv2023 Yes 13B 8.6 2.2 2.0 27.1 10.0 1.9
56 EMU2-chal CVPR2024 Yes 37B 8.2 1.2 3.0 29.3 4.0 3.6
57 Janus-1.3B CVPR 2025 Yes 1.3B 7.5 4.1 2.2 10.4 14.0 6.7

以往针对LMMs的OCR评测基准主要关注基础的文字识别任务,但随着大模型的普及,现实世界的OCR需求已超越简单的“读出文字”。文档中的表格、图表、手写笔记、复杂的版式,文字图像的文本定位,以及基于文本的推理,都是LMMs面临的挑战。

现有评测基准大多任务单一、场景有限,导致模型得分迅速饱和,难以真实反映其在复杂应用中的能力。OCRBench v2提出一个综合全面的OCR测评基准,评估LMMs在面对文字定位、复杂场景理解与推理等广泛OCR任务时的真实表现。

OCRBench v2涵盖23种细分任务,涵盖了实际应用中常见的OCR需求场景。OCRBench v2将这些任务划分为8个核心能力维度:文本识别、文本定位、文本检测识别、关系抽取、元素解析、数学计算、视觉文本理解和知识推理,评价榜单中分别展示了模型在各个能力维度上的具体表现。

OCRBench v2的公开数据集包含来自80余个学术数据集及部分自有数据的1万条高质量QA,并经过人工审核,确保覆盖真实OCR应用中的多样化场景。

OCRBench v2还包括独立的私有数据,这部分数据包含人工采集并标注的1500条QA,其任务设置和场景覆盖范围均与公开数据保持一致。

实验发现,公开数据与私有数据榜单排名具有较高的一致性(详见论文分析),这证明了OCRBench v2任务设计、数据构造和评价指标的合理性,体现了其在衡量LMMs现有局限方面的重要价值。