STT Accuracy Issues with Single-Word Answers

I am using Deepgram Nova-3 for STT in my pipeline. The issue I am facing is that it sometimes does not transcribe answers correctly. While the overall transcription accuracy is quite impressive, there are occasional inaccuracies.

The pipeline performs very well for longer sentences and conversations. However, it struggles more with short, single-word responses, where the transcription quality is noticeably lower.

Do you have any recommendations regarding model selection or parameter tuning that could help improve accuracy for short utterances and single-word answers?

@Umer_Usman bhai, short-utterance dropouts on Nova-3 usually come from one of two things: the model has no context to disambiguate (deep-context models lean on surrounding tokens, which a single word doesn’t give them), or endpointing is cutting the audio before the word finishes.

For domain-bounded vocabulary (yes/no, names, products, status words), the biggest lever is keyterm. Nova-3 supports up to 100 terms with documented confidence lifts on isolated words ("tretinoin" 0.712 → 0.965, "escalation" 0.765 → 0.981 per Deepgram's published numbers) [ developers.deepgram.com/docs/keyterm ]. The LK Deepgram plugin exposes it directly [ docs.livekit.io/agents/models/stt/deepgram/ ].

If your expected answers are open-ended, check endpointing (LK default 25ms, worth bumping to 50-100ms if waveforms show words getting cut mid-syllable) and set language explicitly, since auto-detection has less to work with on one-word audio.

If the use case is heavily turn-based and you can switch, Flux is the alternative, its phrase-endpointing model uses acoustic + semantic cues and supports keyterm too.

To add to :up_arrow: , you can find more information about Flux custom endpointing, specifically how to enable it, at: