OCRBench v2 is a large-scale bilingual text-centric benchmark with currently the most comprehensive set of tasks (4ร more tasks than the previous multi-scene benchmark OCRBench), the widest coverage of scenarios (31 diverse scenarios including street scene, receipt, formula, diagram, and so on), and thorough evaluation metrics, with a total of 10, 000 human-verified question-answering pairs and a high proportion of difficult samples.
Rank | Name | Open Source | Text Recognition | Text Referring | Text Spotting | Relation Extraction | Element Parsing | Mathematical Calculation | Visual Text Understanding | Knowledge Reasoning | Average Score |
---|---|---|---|---|---|---|---|---|---|---|---|
10 | Yes | 70.2 | 69.1 | 61.8 | 81.4 | 39.2 | 31.9 | 73.1 | 54.7 | 60.2 |
OCRBench v2 is a large-scale bilingual text-centric benchmark with currently the most comprehensive set of tasks (4ร more tasks than the previous multi-scene benchmark OCRBench), the widest coverage of scenarios (31 diverse scenarios including street scene, receipt, formula, diagram, and so on), and thorough evaluation metrics, with a total of 10, 000 human-verified question-answering pairs and a high proportion of difficult samples.
Rank | Name | Open Source | Text Recognition | Relation Extraction | Element Parsing | Visual Text Understanding | Knowledge Reasoning | Average Score |
---|---|---|---|---|---|---|---|---|
10 | Yes | 66.2 | 64.8 | 33.5 | 63.4 | 50.6 | 55.7 |
Notice
Sometimes, API calls to closed-source models may not succeed. In such cases, we will repeat the calls for unsuccessful samples until it becomes impossible to obtain a successful response. If you would like to include your model in the OCRBench leaderboard, please follow the evaluation instructions provided on GitHub and feel free to contact us via email at ling_fu@hust.edu.cn. We will update the leaderboard in time.