Show HN: Chonkie – A Fast, Lightweight Text Chunking Library for RAG

I built Chonkie because I was tired of rewriting chunking code for RAG applications. Existing libraries were either too bloated (80MB+) or too basic, with no middle ground.

Core features:

- 21MB default install vs 80-171MB alternatives

- 33x faster token chunking than popular alternatives

- Supports multiple chunking strategies: token, word, sentence, and semantic

- Works with all major tokenizers (transformers, tokenizers, tiktoken)

- Zero external dependencies for basic functionality

Technical optimizations:

- Uses tiktoken with multi-threading for faster tokenization

- Implements aggressive caching and precomputation

- Running mean pooling for efficient semantic chunking

- Modular dependency system (install only what you need)

Benchmarks and code: https://github.com/bhavnicksm/chonkie

Looking for feedback on the architecture and performance optimizations. What other chunking strategies would be useful for RAG applications?

Comments URL: https://news.ycombinator.com/item?id=42100819

Points: 51

# Comments: 18

https://github.com/bhavnicksm/chonkie

Établi 8mo | 10 nov. 2024, 19:20:11

Connectez-vous pour ajouter un commentaire

Autres messages de ce groupe

Integrated photonic source of Gottesman–Kitaev–Preskill qubits

Article URL: https://www.nature.com/articles/s41586-025-09044-5

Comments URL:

8 juil. 2025, 06:20:10 | Hacker news

The Two Towers MUD

Article URL: https://t2tmud.org/

Comments URL: https://news.ycombinator.com/item?id=44474919

8 juil. 2025, 06:20:08 | Hacker news

SIMD.info – Reference tool for C intrinsics of all major SIMD engines

Article URL: https://simd.info/

Comments URL: https://news.ycombinator.com/item?id=44496229

8 juil. 2025, 06:20:08 | Hacker news

Lightfastness Testing of Colored Pencils

Article URL: https://sarahrenaeclark.com/lightfast-testing-pencils/

Comments URL:

8 juil. 2025, 04:10:09 | Hacker news

So you wanna build an aging company

Article URL: https://www.librariesforthefuture.bio/p/is-this-aging

Comments URL:

8 juil. 2025, 04:10:08 | Hacker news

CU Randomness Beacon

Article URL: https://random.colorado.edu/

Comments URL: https://news.ycombinator.com/item?

8 juil. 2025, 04:10:07 | Hacker news

BBC staff: we're forced to do pro-Israel PR

Article URL: https://www.owenjones.news/p/bbc-staff-were-forced-to-do-pro-israel

Comments URL:

8 juil. 2025, 04:10:06 | Hacker news

Techie