Show HN: Kreuzberg – Modern async Python library for document text extraction

I'm excited to showcase Kreuzberg!

Kreuzberg is a modern Python library built from the ground up with async/await, type hints, and optimized I/O handling.

It provides a unified interface for extracting text from documents (PDFs, images, office files) without external API dependencies.

Key technical features: - Built with modern Python best practices (async/await, type hints, functional-first) - Optimized async I/O with anyio for multi-loop compatibility - Smart worker process pool for CPU-bound tasks (OCR, doc conversion) - Efficient batch processing with concurrent extractions - Clean error handling with context-rich exceptions

I built this after struggling with existing solutions that were either synchronous-only, required complex deployments, or had poor async support. The goal was to create something that works well in modern async Python applications, can be easily dockerized or used in serverless contexts, and relies only on permissive OSS.

Key advantages over alternatives: - True async support with optimized I/O - Minimal dependencies (much smaller than alternatives) - Perfect for serverless and async web apps - Local processing without API calls - Built for modern Python codebases with rigorous typing and testing

I Would love feedback!

The library is MIT licensed and open to contributions.

Here is the repo: https://github.com/Goldziher/kreuzberg

Staring is caring


Comments URL: https://news.ycombinator.com/item?id=43057375

Points: 10

# Comments: 5

https://github.com/Goldziher/kreuzberg

Vytvorené 5mo | 15. 2. 2025, 11:50:08


Ak chcete pridať komentár, prihláste sa

Ostatné príspevky v tejto skupine

Ask HN: Who wants to be hired? (July 2025)

Share your information if you are looking for work. Please use this format:

  Location:
  Remote:
  Willing to relocate:
  Technologies:
  Résumé/CV:
  Email:
Please onl
1. 7. 2025, 21:20:21 | Hacker news
Show HN: Core – open source memory graph for LLMs – shareable, user owned

I keep running in the same problem of each AI app “remembers” me in its own silo. ChatGPT knows my project details, Cursor forgets them, Claude starts from zero… so I end up re-explaining myself d

1. 7. 2025, 21:20:19 | Hacker news
Show HN: Arch-Router – 1.5B model for LLM routing by preferences, not benchmarks

Hi HN — we're the team behind Arch (https://github.com/katanemo/archgw), an open-source proxy for LLMs written in Rust. Today we're releasing Arch-

1. 7. 2025, 21:20:17 | Hacker news
1KB JavaScript Demoscene Challenge Just Launched

I just launched JS1024 — a creative coding challenge with a strict limit: 1024 bytes of JavaScript.

No libraries. No frameworks. Just raw code.

You can submit visual effects, generative art, t

1. 7. 2025, 21:20:15 | Hacker news