Show HN: Zerox – document OCR with GPT-mini

This started out as a weekend hack with gpt-4-mini, using the very basic strategy of "just ask the ai to ocr the document".

But this turned out to be better performing than our current implementation of Unstructured/Textract. At pretty much the same cost.

I've tested almost every variant of document OCR over the past year, especially trying things like table / chart extraction. I've found the rules based extraction has always been lacking. Documents are meant to be a visual representation after all. With weird layouts, tables, charts, etc. Using a vision model just make sense!

In general, I'd categorize this solution as slow, expensive, and non deterministic. But 6 months ago it was impossible. And 6 months from now it'll be fast, cheap, and probably more reliable!

Comments URL: https://news.ycombinator.com/item?id=41048194

Points: 17

# Comments: 8

https://github.com/getomni-ai/zerox

Created 6mo | Jul 23, 2024, 7:40:10 PM

Login to add comment

Other posts in this group

Stargate Project: SoftBank, OpenAI, Oracle, MGX to build data centers

Stargate Project: SoftBank, OpenAI, Oracle, MGX to build data centers

Article URL: https://apnews.com/article/trump-ai-openai-oracle-softbank-son

Jan 22, 2025, 1:20:09 AM | Hacker news

Hunyuan3D 2.0 – High-Resolution 3D Assets Generation

Hunyuan3D 2.0 – High-Resolution 3D Assets Generation

Article URL: https://github.com/Tencent/Hunyuan3D-2

Comments URL: https://news.y

Jan 22, 2025, 1:20:08 AM | Hacker news

Ross Ulbricht was just granted a full pardon

Ross Ulbricht was just granted a full pardon

Article URL: https://twitter.com/Free_Ross/status/1881851923005165704

Comments URL:

Jan 22, 2025, 1:20:07 AM | Hacker news

Show HN: A submarine combat game in the browser

Show HN: A submarine combat game in the browser

Article URL: https://bearingsonly.net/

Comments URL: https://news.ycombinator.com/item?id=427

Jan 21, 2025, 10:50:14 PM | Hacker news

JReleaser: quick and effortless way to release your project

JReleaser: quick and effortless way to release your project

Article URL: https://jreleaser.org/

Comments URL: https://news.ycombinator.com/item?id=42784880

Jan 21, 2025, 10:50:13 PM | Hacker news

Remembering Nan Shepherd

Remembering Nan Shepherd

Article URL: https://www.lrb.co.uk/the-paper/v47/n01/fraser-macdonald/diary

Comments URL:

Jan 21, 2025, 10:50:12 PM | Hacker news

The Peppermills of Jens Quistgaard

The Peppermills of Jens Quistgaard

Article URL: https://www.quistgaardpepper.com

Comments URL: https://news.ycombinator.c

Jan 21, 2025, 10:50:09 PM | Hacker news

Techie