Show HN: Voice bots with 500ms response times

Last year when GPT-4 was released I started making lots of little voice + LLM experiments. Voice interfaces are fun; there are several interesting new problem spaces to explore.

I'm convinced that voice is going to be a bigger and bigger part of how we all interact with generative AI. But one thing that's hard, today, is building voice bots that respond as quickly as humans do in conversation. A 500ms voice-to-voice response time is just barely possible with today's AI models.

You can get down to 500ms if you: host transcription, LLM inference, and voice generation all together in one place; are careful about how you route and pipeline all the data; and the gods of both wifi and vram caching smile on you.

Here's a demo of a 500ms-capable voice bot, plus a container you can deploy to run it yourself on an A10/A100/H100 if you want to:

https://fastvoiceagent.cerebrium.ai/

We've been collecting lots of metrics. Here are typical numbers (in milliseconds) for all the easily measurable parts of the voice-to-voice response cycle.

  macOS mic input                 40
  opus encoding                   30
  network stack and transit       10
  packet handling                  2
  jitter buffer                   40
  opus decoding                   30
  transcription and endpointing  200
  llm ttfb                       100
  sentence aggregation          100
  tts ttfb                        80
  opus encoding                   30
  packet handling                  2
  network stack and transit       10
  jitter buffer                   40
  opus decoding                   30
  macOS speaker output           15
  ----------------------------------
  total ms                       759
Everything in AI is changing all the time. LLMs with native audio input and output capabilities will likely make it easier to build fast-responding voice bots soon. But for the moment, I think this is the fastest possible approach/tech stack.

Comments URL: https://news.ycombinator.com/item?id=40805010

Points: 77

# Comments: 22

https://fastvoiceagent.cerebrium.ai/

Létrehozva 3d | 2024. jún. 27. 8:30:04


Jelentkezéshez jelentkezzen be