I'm excited to showcase Kreuzberg!
Kreuzberg is a modern Python library built from the ground up with async/await, type hints, and optimized I/O handling.
It provides a unified interface for extracting text from documents (PDFs, images, office files) without external API dependencies.
Key technical features: - Built with modern Python best practices (async/await, type hints, functional-first) - Optimized async I/O with anyio for multi-loop compatibility - Smart worker process pool for CPU-bound tasks (OCR, doc conversion) - Efficient batch processing with concurrent extractions - Clean error handling with context-rich exceptions
I built this after struggling with existing solutions that were either synchronous-only, required complex deployments, or had poor async support. The goal was to create something that works well in modern async Python applications, can be easily dockerized or used in serverless contexts, and relies only on permissive OSS.
Key advantages over alternatives: - True async support with optimized I/O - Minimal dependencies (much smaller than alternatives) - Perfect for serverless and async web apps - Local processing without API calls - Built for modern Python codebases with rigorous typing and testing
I Would love feedback!
The library is MIT licensed and open to contributions.
Here is the repo: https://github.com/Goldziher/kreuzberg
Staring is caring
Comments URL: https://news.ycombinator.com/item?id=43057375
Points: 10
# Comments: 5
Autentifică-te pentru a adăuga comentarii
Alte posturi din acest grup
![Alzheimer's biomarkers now visible up to a decade ahead of symptoms](https://www.cdn5.niftycent.com/a/1/E/V/o/E/K/alzheimer-s-biomarkers-now-visible-up-to-a-decade-ahead-of-symptoms.webp)
![Jill – a functional programming language for the Nand2Tetris platform](https://www.cdn5.niftycent.com/a/1/B/q/O/E/J/jill-a-functional-programming-language-for-the-nand2tetris-platform.webp)
Article URL: https://github.com/mpatajac/jillc
Comments URL: https://news.ycombinator
![Trot](https://www.cdn5.niftycent.com/a/1/G/w/0/E/g/trot.webp)
Article URL: https://robinrendle.com/notes/trot/
Comments URL: https://news.ycombin
![PAROL6: 3D-printed desktop robotic arm](https://www.cdn5.niftycent.com/a/1/0/B/b/E/7/parol6-3d-printed-desktop-robotic-arm.webp)
Article URL: https://source-robotics.github.io/PAROL-docs/
![Multiple Russian Threat Actors Targeting Microsoft Device Code Authentication](https://www.cdn5.niftycent.com/a/e/L/5/g/o/R/multiple-russian-threat-actors-targeting-microsoft-device-code-authentication.webp)
![Schemesh: Fusion between Unix shell and Lisp REPL](https://www.cdn5.niftycent.com/a/k/o/6/B/A/7/schemesh-fusion-between-unix-shell-and-lisp-repl.webp)
Article URL: https://github.com/cosmos72/schemesh
Comments URL: https://news.ycomb
![OmniParser V2 – A simple screen parsing tool towards pure vision based GUI agent](https://www.cdn5.niftycent.com/a/e/r/6/g/5/a/omniparser-v2-a-simple-screen-parsing-tool-towards-pure-vision-based-gui-agent.webp)
Article URL: https://github.com/microsoft/OmniParser
Comments URL: https://news