The difference in cost between products that are developed as accessibility tools compared to consumer products is huge. One example is camera glasses where the accessibility product costs ~$3000 (Envision Glasses), and the consumer product costs ~$300 (Ray-Ban Meta).
In this case the Ray-Ban Meta is getting accessibility features. The functionality is promising according to reviews, but requires the user to say "Hey meta what am I looking at" every time a scene is to be described. The battery life seem underwhelming as well.
It would be nice to have an cheap and open source alternative to the currently available products, where the user gets fed information rather than continuously requesting it. This is where I got interested to see if I could create a solution using an ESP32 WiFi camera, and learn some arduino development in the process.
I managed to create a solution where the camera connects to the phone "personal hotspot", and publishes an image every 7 seconds to an online server, which then uses the gpt-4o-mini model to describe the image and update a web page, that is read back to the user using voice synthesis. The latency for this is less than 2 seconds, and is generally faster.
I am happy with the result and learnt a lot, but I think I will pause this project for now. At least until some shiny new tech emerges (cheaper open source camera glasses).
Comments URL: https://news.ycombinator.com/item?id=42593919
Points: 14
# Comments: 2
Connectez-vous pour ajouter un commentaire
Autres messages de ce groupe
Article URL: https://sidhion.com/blog/28h_days_update_1/
Comments URL: http
Article URL: https://aide.dev/blog/sota-bitter-lesson
Comments URL: https://ne