🏆 OCRBench v2 Leaderboard

| GitHub | Paper |

OCRBench v2 is a large-scale bilingual text-centric benchmark with currently the most comprehensive set of tasks (4× more tasks than the previous multi-scene benchmark OCRBench), the widest coverage of scenarios (31 diverse scenarios including street scene, receipt, formula, diagram, and so on), and thorough evaluation metrics, with a total of 10, 000 human-verified question-answering pairs and a high proportion of difficult samples.

Rank	Name	Open Source	Text Recognition	Text Referring	Text Spotting	Relation Extraction	Element Parsing	Mathematical Calculation	Visual Text Understanding	Knowledge Reasoning	Average Score
10	GPT-4o-mini	Yes	70.2	69.1	61.8	81.4	39.2	31.9	73.1	54.7	60.2

Rank	Name	Open Source	Text Recognition	Text Referring	Text Spotting	Relation Extraction	Element Parsing	Mathematical Calculation	Visual Text Understanding	Knowledge Reasoning	Average Score
1	Llama Nemotron Nano VL 8B	Yes	70.2	69.1	61.8	81.4	39.2	31.9	73.1	54.7	60.2
2	InternVL-3-14B	Yes	67.3	36.9	11.2	89	38.4	38.4	79.2	60.5	52.6
3	Gemini-Pro	No	61.2	39.5	13.5	79.3	39.2	47.7	75.5	59.3	51.9
4	Qwen2-VL-7B	Yes	72.1	47.9	17.5	82.5	25.5	25.4	78.4	61.5	51.4
5	InternVL2.5-26B	Yes	65.6	26.1	1.6	86.9	36.2	37.4	78.3	62.9	49.4
6	InternVL-3-8B	Yes	68.6	30.4	8.8	85.3	34	27.1	77.5	60.3	49
7	GPT-4V	No	69.7	26.9	0.3	75.6	36.7	42.9	71.5	57.9	47.7
8	Ovis2-8B	Yes	73.2	24.6	0.7	62.4	44.8	40.6	72.7	62.6	47.7
9	InternVL2-26B	Yes	63.4	26.1	0	76.8	37.8	32.3	79.4	58.9	46.8
10	Step-1V	No	67.8	31.3	7.2	73.6	37.2	27.8	69.8	58.6	46.7
11	Qwen2.5-VL-7B	Yes	68.8	25.7	1.2	80.2	30.4	38.2	73.2	56.2	46.7
12	GPT-4o	No	61.2	26.7	0	77.5	36.3	43.4	71.1	55.5	46.5
13	Claude3.5-sonnet	No	62.2	28.4	1.3	56.6	37.8	40.8	73.5	60.9	45.2
14	MiniCPM-o-2.6	Yes	66.9	29.5	0.5	70.8	33.4	31.9	69.9	57.9	45.1
15	InternVL2.5-8B	Yes	59	25	1.4	77.5	35.1	29.4	75.3	57.2	45
16	SAIL-VL-1.6-8B	Yes	67.7	28.6	2.8	70.5	25.9	29.5	73.9	59.7	44.8
17	Deepseek-VL2-Small	Yes	62.7	28	0.1	77.5	32.7	14.3	77.1	53.9	43.3
18	GPT-4o-mini	No	57.9	23.3	0.6	70.8	31.5	38.8	65.9	55.1	43
19	GLM-4V-Plus	No	60.3	25.2	0	74.7	37.6	26.4	61.4	57.2	42.9
20	GLM-4V-9B	Yes	61.8	22.6	0	71.7	31.6	22.6	72.1	58.4	42.6
21	Kimi-VL-A3-16B	Yes	56.5	13.8	0	59.2	33.8	32.9	75.5	56.7	41.1
22	Ovis1.6-3B	Yes	59.2	14.3	0	65	32.1	29	69.8	56.8	40.8
23	MiniCPM-V-2.6	Yes	66.8	6	0.8	62	28.8	32.4	73.7	52.1	40.3
24	Pixtral-12B	Yes	48.9	21.6	0	66.3	35.5	29.8	66.9	53.7	40.3
25	InternVL2-8B	Yes	49.9	23.1	0.5	65.2	24.8	26.7	73.5	52.9	39.6
26	LLaVA-OV-7B	Yes	46	20.8	0.1	58.3	25.3	23.3	64.4	53	36.4
27	Phi-4-MultiModal	Yes	63.7	16.4	0	40.4	19.1	18.3	69.8	53.9	35.2
28	Cambrian-1-8B	Yes	45.3	21.5	0	53.6	19.2	19.5	63.5	55.5	34.7
29	Molmo-7B	Yes	52.4	21.3	0.1	45.5	7.6	28.5	65.3	55	34.5
30	Idefics3-8B	Yes	23.8	13.2	0	63.2	23.8	23	65.8	44.9	32.2
31	LLaVA-Next-8B	Yes	41.3	18.8	0	49.5	21.2	17.3	55.2	48.9	31.5
32	XComposer2-4KHD	Yes	45.1	21.8	0.1	15.9	11.7	15.7	66.8	45.9	27.9
33	Eagle-X5-7B	Yes	34.7	17.8	0	21.7	20.6	21.5	61	42.6	27.5
34	Deepseek-VL-7B	Yes	37.1	15.4	0	23.5	14.6	20.8	53.3	52.9	27.2
35	mPLUG-Owl3	Yes	41.6	14	0.6	24.4	10.9	11.1	52.2	46	25.1
36	TextMonkey	Yes	39.1	0.7	0	19	12.2	19	61.1	40.2	23.9
37	VILA1.5-8B	Yes	35.3	15.5	0	21.1	12.7	17.3	46.3	40.3	23.6
38	Qwen-VL-chat	Yes	34.5	4.1	0	25.9	14	13.8	55.7	39.5	23.4
39	Qwen-VL	Yes	34.6	7.5	0	18.2	20	8.1	57.2	41.1	23.3
40	Monkey	Yes	35.2	0	0	16.6	16.3	14.4	59.8	42.3	23.1
41	CogVLM-chat	Yes	50.9	0	0	0.2	8.4	15	58.1	41.7	21.8
42	DocOwl2	Yes	24	9.7	0	13.4	13.5	8.8	53.7	32	19.4
43	EMU2-chat	Yes	42.1	0.2	0	12.5	8.1	11.2	42.7	33.4	18.8
44	Janus-1.3B	Yes	46.1	0	0	0.2	14.5	13.5	36	39.1	18.7
45	Yi-VL-6B	Yes	28.9	2.9	0	9.7	12.9	15.8	36.1	32	17.3
46	TextHarmony	Yes	25.8	2.5	0	1.8	8.5	10.4	46.1	33.1	16
47	LLaVAR	Yes	37.3	0	0	1	9.9	12.3	34.6	27	15.3
48	UReader	Yes	22.4	0.1	0	0	9.2	7.9	41	29.1	13.7

Rank	Name	Open Source	Text Recognition	Relation Extraction	Element Parsing	Visual Text Understanding	Knowledge Reasoning	Average Score
10	GPT-4o-mini	Yes	66.2	64.8	33.5	63.4	50.6	55.7

Rank	Name	Open Source	Text Recognition	Relation Extraction	Element Parsing	Visual Text Understanding	Knowledge Reasoning	Average Score
1	InternVL3-14B	Yes	66.2	64.8	33.5	63.4	50.6	55.7
2	Qwen2.5-VL-7B	Yes	75.3	61.4	41.8	59.3	40.4	55.6
3	InternVL3-8B	Yes	68.9	62	31.6	57.9	47.3	53.5
4	Ovis2-8B	Yes	72.2	50.8	37.7	47.9	37.4	49.2
5	InternVL2.5-8B	Yes	52.8	52.8	28.6	56.4	40.5	46.2
6	Kimi-VL-A3B-16B	Yes	57.2	54.7	31.5	52.5	31.4	45.5
7	InternVL2.5-26B	Yes	32.4	56.1	32.6	56.3	43.6	44.2
8	Gemini-Pro	No	52.5	47.3	30.9	51.5	33.4	43.1
9	Qwen2-VL-7B	Yes	51.3	51.4	21.6	52.5	37.5	42.9
10	Deepseek-VL2-Small	Yes	60.9	50.6	28.3	53	20.5	42.7
11	Step-1V	No	56.7	41.1	37.6	38.3	39.2	42.6
12	MiniCPM-o-2.6	Yes	53	49.4	27.1	43.5	32.7	41.1
13	GPT-4V	No	49.9	52.2	34.6	40.8	22.9	40.1
14	Claude3.5-sonnet	No	21	56.2	35.2	55	30.5	39.6
15	GLM-4V-Plus	No	34.5	60.6	23.9	49.8	28.2	39.4
16	Llama Nemotron Nano VL 8B	Yes	47	37.7	43.1	41.4	20.2	37.9
17	InternVL2-26B	Yes	21.9	46	34.8	50.9	34.8	37.7
18	GLM-4V-9B	Yes	24.4	60.6	20.4	52.8	25.2	36.6
19	InternVL2-8B	Yes	20.6	45.2	23.2	54.4	38.1	36.3
20	SAIL-VL-1.6-8B	Yes	31.2	40	23.9	42.3	35	34.5
21	MiniCPM-V-2.6	Yes	51	29.9	21.2	34	33.6	33.9
22	GPT-4o	No	21.6	53	29.8	38.5	18.2	32.2
23	Phi-4-MultiModal	Yes	51.5	32.3	12.1	34.4	23	30.7
24	GPT-4o-mini	No	13.1	38.9	27.2	28.8	16.9	25
25	Ovis1.6-3B	Yes	11.5	23.7	22.8	28.8	18.9	21.1
26	LLaVA-OV-7B	Yes	14.8	15.7	13.7	16	28.7	17.8
27	TextMonkey	Yes	23.5	14.8	8.4	19.9	12.2	15.8
28	XComposer2-4KHD	Yes	16.7	18.8	12.1	27.5	2.3	15.5
29	Pixtral-12B	Yes	13.4	10.9	21	7	20.7	14.6
30	mPLUG-Owl3	Yes	6.6	17.9	9.7	6	26.1	13.3
31	Monkey	Yes	4.6	11.2	8.4	21.5	20	13.1
32	Idefics3-8B	Yes	7	15.5	15.9	9	18.1	13.1
33	Molmo-7B	Yes	7.1	15	9.2	9	23.7	12.8
34	Deepseek-VL-7B	Yes	8	13.3	15.7	5.5	18.5	12.2
35	Qwen-VL-chat	Yes	9.5	8.2	9.3	11	21.1	11.8
36	Eagle-X5-7B	Yes	7.5	12	11.6	5	19.2	11.1
37	Cambrian-1-8B	Yes	5.3	14.9	12.6	8.5	8.1	9.9
38	Yi-VL-6B	Yes	4.8	4.4	8.5	4	25	9.4
39	Qwen-VL	Yes	7.2	5.3	10.7	11.5	11.2	9.2
40	LLaVA-Next-8B	Yes	5.7	2.9	12.2	7.5	17.2	9.1
41	Janus-1.3B	Yes	7.6	8.7	11.4	4.5	10.7	8.6
42	ViLA1.5-8B	Yes	5.4	8.8	8.5	3	15.5	8.2
43	DocOwl2	Yes	4.2	10.3	8.6	4	9.6	7.3
44	CogVLM-chat	Yes	5.5	10	9.8	1.5	2.5	5.9
45	TextHarmony	Yes	1.8	4.5	8.2	1.5	11.9	5.6
46	UReader	Yes	6.8	2.7	8.4	2.5	7.2	5.5
47	EMU2-chat	Yes	2.3	0.5	8.5	1	7.3	3.9
48	LLaVAR	Yes	2.3	1.7	8.9	0	2.5	3.1

Notice

Sometimes, API calls to closed-source models may not succeed. In such cases, we will repeat the calls for unsuccessful samples until it becomes impossible to obtain a successful response. If you would like to include your model in the OCRBench leaderboard, please follow the evaluation instructions provided on GitHub and feel free to contact us via email at ling_fu@hust.edu.cn. We will update the leaderboard in time.