I'm excited to showcase Kreuzberg!
Kreuzberg is a modern Python library built from the ground up with async/await, type hints, and optimized I/O handling.
It provides a unified interface for extracting text from documents (PDFs, images, office files) without external API dependencies.
Key technical features: - Built with modern Python best practices (async/await, type hints, functional-first) - Optimized async I/O with anyio for multi-loop compatibility - Smart worker process pool for CPU-bound tasks (OCR, doc conversion) - Efficient batch processing with concurrent extractions - Clean error handling with context-rich exceptions
I built this after struggling with existing solutions that were either synchronous-only, required complex deployments, or had poor async support. The goal was to create something that works well in modern async Python applications, can be easily dockerized or used in serverless contexts, and relies only on permissive OSS.
Key advantages over alternatives: - True async support with optimized I/O - Minimal dependencies (much smaller than alternatives) - Perfect for serverless and async web apps - Local processing without API calls - Built for modern Python codebases with rigorous typing and testing
I Would love feedback!
The library is MIT licensed and open to contributions.
Here is the repo: https://github.com/Goldziher/kreuzberg
Staring is caring
Comments URL: https://news.ycombinator.com/item?id=43057375
Points: 10
# Comments: 5
Chcete-li přidat komentář, přihlaste se
Ostatní příspěvky v této skupině
Article URL: https://davidgomes.com/async-queue-interview-ai/

Chrome now includes a native on-device LLM (Gemini Nano) starting in version 138. I've been building with it since it was in origin trials, it's powerful but the official Prompt API is still a bit

I managed to reverse engineer the encryption (refered to as “Obfuscation” in the framework) responsible for managing the safety filters of Apple Intelligence models. I have extracted them into a r

Article URL: https://github.com/MrLesk/Backlog.md
Comments URL: https://news.ycomb

Article URL: https://www.dwarkesh.com/p/timelines-june-2025