Show HN: Qwen-2.5-32B is now the best open source OCR model

Last week was big for open source LLMs. We got:

- Qwen 2.5 VL (72b and 32b)

- Gemma-3 (27b)

- DeepSeek-v3-0324

And a couple weeks ago we got the new mistral-ocr model. We updated our OCR benchmark to include the new models.

We evaluated 1,000 documents for JSON extraction accuracy. Major takeaways:

- Qwen 2.5 VL (72b and 32b) are by far the most impressive. Both landed right around 75% accuracy (equivalent to GPT-4o’s performance). Qwen 72b was only 0.4% above 32b. Within the margin of error.

- Both Qwen models passed mistral-ocr (72.2%), which is specifically trained for OCR.

- Gemma-3 (27B) only scored 42.9%. Particularly surprising given that it's architecture is based on Gemini 2.0 which still tops the accuracy chart.

The data set and benchmark runner is fully open source. You can check out the code and reproduction steps here:

- https://getomni.ai/blog/benchmarking-open-source-models-for-...

- https://github.com/getomni-ai/benchmark

- https://huggingface.co/datasets/getomni-ai/ocr-benchmark

Comments URL: https://news.ycombinator.com/item?id=43549072

Points: 61

# Comments: 13

https://github.com/getomni-ai/benchmark/blob/main/README.md

Vytvořeno 27d | 1. 4. 2025 21:40:16

Chcete-li přidat komentář, přihlaste se

Ostatní příspěvky v této skupině

Reports of the death of California High-Speed Rail have been greatly exaggerated

Article URL: https://asteriskmag.com/issues/10/reports-of-the-death-of-

28. 4. 2025 22:40:14 | Hacker news

Building Small Go Containers?

Article URL: https://github.com/randomizedcoder/go_nix_simple

Comments URL:

28. 4. 2025 22:40:13 | Hacker news

One Million Chessboards

Article URL: https://eieio.games/blog/one-million-chessboards/

Comments URL:

28. 4. 2025 22:40:12 | Hacker news

Beyond Elk: Lightweight and Scalable Cloud-Native Log Monitoring

Article URL: https://greptime.com/blogs/2025-04-24-elasticsearch-greptimedb-comparison-performance

28. 4. 2025 22:40:12 | Hacker news

Qwen3: Think deeper, act faster

Article URL: https://qwenlm.github.io/blog/qwen3/

Comments URL: https://news.ycomb

28. 4. 2025 22:40:11 | Hacker news

Packed Data Support in Haskell

Article URL: https://arthi-chaud.github.io/posts/packed/

Comments URL: http

28. 4. 2025 22:40:11 | Hacker news

Legal art forgery, for the sake of movies

Article URL: https://www.vanityfair.com/hollywood/2014/04/art-in-movies

Comments URL:

28. 4. 2025 22:40:10 | Hacker news

Techie