This started out as a weekend hack with gpt-4-mini, using the very basic strategy of "just ask the ai to ocr the document".
But this turned out to be better performing than our current implementation of Unstructured/Textract. At pretty much the same cost.
I've tested almost every variant of document OCR over the past year, especially trying things like table / chart extraction. I've found the rules based extraction has always been lacking. Documents are meant to be a visual representation after all. With weird layouts, tables, charts, etc. Using a vision model just make sense!
In general, I'd categorize this solution as slow, expensive, and non deterministic. But 6 months ago it was impossible. And 6 months from now it'll be fast, cheap, and probably more reliable!
Comments URL: https://news.ycombinator.com/item?id=41048194
Points: 17
# Comments: 8
Login to add comment
Other posts in this group
Article URL: https://github.com/Tencent/Hunyuan3D-2
Comments URL: https://news.y
Article URL: https://bearingsonly.net/
Comments URL: https://news.ycombinator.com/item?id=427
Article URL: https://jreleaser.org/
Comments URL: https://news.ycombinator.com/item?id=42784880
Article URL: https://www.quistgaardpepper.com
Comments URL: https://news.ycombinator.c