I built Chonkie because I was tired of rewriting chunking code for RAG applications. Existing libraries were either too bloated (80MB+) or too basic, with no middle ground.
Core features:
- 21MB default install vs 80-171MB alternatives
- 33x faster token chunking than popular alternatives
- Supports multiple chunking strategies: token, word, sentence, and semantic
- Works with all major tokenizers (transformers, tokenizers, tiktoken)
- Zero external dependencies for basic functionality
Technical optimizations:
- Uses tiktoken with multi-threading for faster tokenization
- Implements aggressive caching and precomputation
- Running mean pooling for efficient semantic chunking
- Modular dependency system (install only what you need)
Benchmarks and code: https://github.com/bhavnicksm/chonkie
Looking for feedback on the architecture and performance optimizations. What other chunking strategies would be useful for RAG applications?
Comments URL: https://news.ycombinator.com/item?id=42100819
Points: 51
# Comments: 18
Connectez-vous pour ajouter un commentaire
Autres messages de ce groupe
Article URL: https://davidgomes.com/async-queue-interview-ai/

Chrome now includes a native on-device LLM (Gemini Nano) starting in version 138. I've been building with it since it was in origin trials, it's powerful but the official Prompt API is still a bit

I managed to reverse engineer the encryption (refered to as “Obfuscation” in the framework) responsible for managing the safety filters of Apple Intelligence models. I have extracted them into a r

Article URL: https://github.com/MrLesk/Backlog.md
Comments URL: https://news.ycomb

Article URL: https://www.dwarkesh.com/p/timelines-june-2025