Show HN: I created a PoC for live descriptions of the surroundings for the blind

The difference in cost between products that are developed as accessibility tools compared to consumer products is huge. One example is camera glasses where the accessibility product costs ~$3000 (Envision Glasses), and the consumer product costs ~$300 (Ray-Ban Meta).

In this case the Ray-Ban Meta is getting accessibility features. The functionality is promising according to reviews, but requires the user to say "Hey meta what am I looking at" every time a scene is to be described. The battery life seem underwhelming as well.

It would be nice to have an cheap and open source alternative to the currently available products, where the user gets fed information rather than continuously requesting it. This is where I got interested to see if I could create a solution using an ESP32 WiFi camera, and learn some arduino development in the process.

I managed to create a solution where the camera connects to the phone "personal hotspot", and publishes an image every 7 seconds to an online server, which then uses the gpt-4o-mini model to describe the image and update a web page, that is read back to the user using voice synthesis. The latency for this is less than 2 seconds, and is generally faster.

I am happy with the result and learnt a lot, but I think I will pause this project for now. At least until some shiny new tech emerges (cheaper open source camera glasses).


Comments URL: https://news.ycombinator.com/item?id=42593919

Points: 14

# Comments: 2

https://github.com/o40/seesay

Created 1mo | Jan 5, 2025, 1:20:06 AM


Login to add comment

Other posts in this group

Ask HN: Physics PhD at Stanford or Berkeley

What should one consider while signing up for a Physics PhD program, with an focus on experimental Quantum/Molecular optics, program at either university if both of them offer?

I understand ther

Feb 10, 2025, 6:20:06 PM | Hacker news
Show HN: Global 3D Topography Explorer

I made a web app to generate 3D models of real places on earth from land cover and elevation data.

Click anywhere on the map to get a polygon, and then click "generate".

It should work at most

Feb 10, 2025, 6:20:04 PM | Hacker news
Show HN: Seen: rendering 1,000,000+ notes in <1s. speed, by default

Hello HN! I've been working on creating a new note-taking app called Seen. Right now, it's really just a preview: virtual-list rendering ~1,000,000 notes in a masonry layout, while trying to minim

Feb 10, 2025, 3:50:13 PM | Hacker news
What about K?
Feb 10, 2025, 3:50:12 PM | Hacker news
Show HN: HTML visualization of a PDF file's internal structure

Hi, I've just finished a rebuild of this function and added a lot of new features: info, page index, minimap, inverted index,... I think it may be useful for inspection, debugging or just as a le

Feb 10, 2025, 3:50:11 PM | Hacker news