Why LLMs still have problems with OCR

Document ingestion and the launch of Gemini 2.0 caused a lot of buzz this week. As a team building in this space, this is something we researched thoroughly. Here’s our take: ingestion is a multistep pipeline, and maintaining confidence from LLM nondeterministic outputs over millions of pages is a problem.


Comments URL: https://news.ycombinator.com/item?id=42966958

Points: 103

# Comments: 75

https://www.runpulse.com/blog/why-llms-suck-at-ocr

созданный 19d | 8 февр. 2025 г., 08:20:04


Войдите, чтобы добавить комментарий

Другие сообщения в этой группе

Show HN: Emdash – Slack/Zoom alternative for distributed team collaboration

Hi HN, I’m Phil, one of the co-founders building emdash. Previously, I was an early engineer at Facebook and led Customer Products at Square.

We’ve focused on making chat and video work together

27 февр. 2025 г., 06:40:07 | Hacker news