Show HN: I made a website to semantically search ArXiv papers

As a grad student (and an ADHDer), I had trouble doing literature review systematically. To combat this, I made a website that finds similar papers using the meaning of the thing I am looking for.

I used MixedBread's [^1] embedding model to generate vectors from the abstracts. I store and search similar vectors using Milvus [^2] and finally use Gradio [^3] to serve the frontend. I update the vector database weekly by pulling the metadata dataset from Kaggle [^4].

To speed up the search process on my free oracle instance, I binarise the embeddings and use Hamming distance as a metric.

I would love your feedback on the site :) Happy Holidays!

[1]: https://www.mixedbread.ai/docs/embeddings/mxbai-embed-large-... [2]: https://milvus.io/ [3]: https://www.gradio.app/ [4]: https://www.kaggle.com/datasets/Cornell-University/arxiv


Comments URL: https://news.ycombinator.com/item?id=42507116

Points: 14

# Comments: 0

https://papermatch.mitanshu.tech/

Creato 3mo | 25 dic 2024, 10:10:08


Accedi per aggiungere un commento

Altri post in questo gruppo

Show HN: Offline SOS signaling+recovery app for disasters/wars

A couple of months ago, I built this app to help identify people stuck under rubble.

First responders have awesome tools. But in tough situations, even common folks need to help.

After what ha

2 apr 2025, 00:10:07 | Hacker news
Show HN: Qwen-2.5-32B is now the best open source OCR model

Last week was big for open source LLMs. We got:

- Qwen 2.5 VL (72b and 32b)

- Gemma-3 (27b)

- DeepSeek-v3-0324

And a couple weeks ago we got the new mistral-ocr model. We updated our OCR b

1 apr 2025, 21:40:16 | Hacker news