Allow configuring or replacing the SIP play_dialtone audio

play_dialtone: true plays tones.ETSIRinging — a hardcoded 425 Hz sine wave at 1s on / 4s off. For US callers this clashes with the carrier’s dual-tone 440+480 Hz ringback, producing an abrupt, jarring tone change the moment LiveKit answers the invite.

Would love a way to either:

  • Point play_dialtone at a custom audio file/URL, or
  • Pick from regional presets (US, UK, JP, ETSI) via the same field.

Use case: forward/bridge flows where the caller waits in the LiveKit room while we dial out to a human. Today the only workaround is play_dialtone: false plus a custom audio track published by a participant — much more setup than a config flag.

Let me know if there is some easier configuration I am overlooking, thanks!

@marc, Your read of the source matches. tones.ETSIRinging is defined as

[]Tone{{Freq: []Hz{425}, Dur: time.Second, Silence: 4 * time.Second}}

in livekit/media-sdk livekit/media-sdk/tones/tones.go, and livekit/sip calls it via tones.Play(rctx, dst, ringVolume, tones.ETSIRinging) in both the outbound dial flow and the transfer flow livekit/sip/pkg/sip/outbound.go. The only other presets in that file are ETSIDial and ETSIBusy. No USRinging, UKRinging, JPRinging.

So no, there’s no configuration you’re overlooking. play_dialtone: false plus publishing your own track is currently the only path, exactly as you wrote.

The Tone struct is {Freq []Hz, Dur time.Duration, Silence time.Duration} [same file], so adding regional presets to the media-sdk tones package is a small PR (constants only). The URL-pointer option is the larger ask: tones.Play consumes []Tone, not a media file, so pointing play_dialtone at an arbitrary audio URL needs a new code path in pkg/sip/outbound.go that loads and streams a file alongside (or instead of) the synthesized tone generator.

If you file a feature request in livekit/sip, splitting it into the two asks helps triage. The regional-preset version is mergeable as a media-sdk constants PR; the URL-pointer version is architecturally larger.

Are looking for full support of TR 101 041-2? Or do you want to produce an arbitrary freq range and congestion tones?

Sounds like options one is just a work around and option two would be preffered?

We’ve filed an internal enhancement request to do country based enum (second option).

Yeah I think option 2 would be preferred. Passing in a US option (which I’m assuming would play the standard US dial tone) would fix my issue.