Show HN: I made a website to semantically search ArXiv papers

As a grad student (and an ADHDer), I had trouble doing literature review systematically. To combat this, I made a website that finds similar papers using the meaning of the thing I am looking for.

I used MixedBread's [^1] embedding model to generate vectors from the abstracts. I store and search similar vectors using Milvus [^2] and finally use Gradio [^3] to serve the frontend. I update the vector database weekly by pulling the metadata dataset from Kaggle [^4].

To speed up the search process on my free oracle instance, I binarise the embeddings and use Hamming distance as a metric.

I would love your feedback on the site :) Happy Holidays!

[1]: https://www.mixedbread.ai/docs/embeddings/mxbai-embed-large-... [2]: https://milvus.io/ [3]: https://www.gradio.app/ [4]: https://www.kaggle.com/datasets/Cornell-University/arxiv


Comments URL: https://news.ycombinator.com/item?id=42507116

Points: 14

# Comments: 0

https://papermatch.mitanshu.tech/

Created 3mo | Dec 25, 2024, 10:10:08 AM


Login to add comment

Other posts in this group

Show HN: The C3 programming language (C alternative language)

Get it from here: https://github.com/c3lang/c3c

In 2019, while contributing to the C2 language, I started up "C3" as a pet project whil

Apr 3, 2025, 5:40:37 PM | Hacker news
Show HN: Benchi – A benchmarking tool written in Go

Benchi is a CLI tool for running benchmarks and collecting metrics. It's using Docker Compose to orchestrate the infrastructure and tools being benchmarked, making it repeatable and runnable on di

Apr 3, 2025, 5:40:35 PM | Hacker news
Show HN: Novanode, Global load balancing with Caddy, no vendor lock-in

I've been a long-time Cloudflare user, but sometimes I just want a global load balancer without the lock-in and with full configuration control (e.g., some Cloudflare rules require an enterprise p

Apr 3, 2025, 5:40:34 PM | Hacker news
Tell HN: Pocket (acquired by Moz) only works with FF now

They've pulled their browser extensions and integrations except for Firefox. The definition of blatant enshitification and anti-competitive. Moz needs to be held accountable by the EU for this kin

Apr 3, 2025, 5:40:31 PM | Hacker news