Esp32 rtc error

ESP32-S3 + LiveKit Cloud: signaling succeeds, RTC fails after DTLS/TURN

I am testing livekit/livekit=0.3.6 on an ESP32-S3 Lichuang board.

Current setup:

  • Board: lichuang_esp32s3
  • ESP-IDF: v5.5.3
  • SDK dependency: livekit/livekit=0.3.6
  • LiveKit Cloud URL: wss://bo-996enr6g.livekit.cloud
  • Auth during development: LiveKit sandbox token server
  • Agent dispatch: Jordan-19de
  • ICE setting: relay-only enabled

I already confirmed these parts are working:

  • Wi-Fi connects successfully
  • The device gets an IP address, for example 10.0.0.44
  • LiveKit signaling connects successfully
  • TLS certificate validation succeeds
  • 401 Unauthorized is no longer happening after switching away from an embedded expiring JWT
  • A desktop client on the same network can join the same LiveKit project and talk to the same agent normally

The remaining problem is that the ESP32 reaches the RTC phase, then disconnects and starts reconnecting.

Representative serial log sequence:

livekit_signaling: Connecting to server: wss://bo-996enr6g.livekit.cloud/rtc?...
esp-x509-crt-bundle: Certificate validated
DTLS: Init SRTP OK
livekit_engine: Subscribing to audio track
AGENT: Start agent as Controlling/Controlled
Get candidate success ...
TURN relay candidates acquired from 140.238.63.214:...
Skip TCP stun server turns:otokyo1b.turn.livekit.cloud:443?transport=tcp
Skip TCP stun server turns:bo-996enr6g.turn.livekit.cloud:443?transport=tcp
publisher offer generated
subscriber answer generated
peer state: 1 -> 4
transport_ws: esp_transport_ws_poll_connection_closed: unexpected data readable on socket=54
websocket_client: Connection terminated while waiting for clean TCP close
UDP: Failed to select: Bad file number
livekit_engine: Reconnect ... reason=4
livekit_room_v2: Failure reason: RTC

What seems important to me:

  • This is no longer an auth problem.
  • The device gets past signaling and into DTLS/SRTP and ICE/TURN work.
  • TURN relay candidates are acquired successfully.
  • The failure seems to happen after SDP exchange begins and peer state advances.
  • Then the WebSocket side closes unexpectedly, and sometimes the UDP path reports Bad file number.

My questions:

  1. Is this a known issue or limitation for livekit/livekit=0.3.6 on ESP32-S3?
  2. Are there recommended esp_peer / ICE settings for LiveKit Cloud on ESP32-S3, especially for relay-only mode?
  3. Is skipping TCP TURN candidates expected here, or could that be related?
  4. Does transport_ws: unexpected data readable on socket usually indicate a signaling transport problem, or is it often secondary to an RTC teardown?
  5. Has anyone seen UDP: Failed to select: Bad file number in the ESP32 client during reconnect or peer restart?

Any pointers on where to inspect next in esp_peer, ICE transport, or the LiveKit ESP32 client would help.

Thanks for the detailed logs — this is helpful. Here’s what I can see from the SDK source and your trace:

Are you syncing the device’s time once WiFi is up? In my projects the first thing I do once WiFi is established is the device contact NTP server and synchronizes the clock. They can matter for encrypted connections like TLS. Since it appears you established connection I guess you are doing that already but the 401 can be a symptom of clock skew (among other things).

What peer state: 1 → 4 means: State 1 is CONNECTING, State 4 is FAILED. When either the publisher or subscriber peer connection reaches FAILED, the engine sets failure_reason=RTC and enters the reconnect backoff loop. The reason=4 In your reconnect log corresponds to LIVEKIT_FAILURE_REASON_RTC — confirming the WebRTC peer connection is failing, not the signaling layer.

On your specific questions:

  1. Known issue with 0.3.6? — There’s no known issue with 0.3.6 specifically for this failure pattern. The one documented known issue is that a remote participant leaving can occasionally trigger a disconnect. However, the underlying esp_peer library is evolving quickly, and the DTLS/ICE layer is where most edge-case behavior lives.

  2. Recommended ICE settings — The SDK passes ESP_PEER_ICE_TRANS_POLICY_RELAY when force_relay is enabled (which your Cloud project appears to be doing via client_configuration). The default CONFIG_LK_MAX_ICE_SERVERS is 3 — verify your join response isn’t providing more than 3 TURN servers, because extras would be silently dropped.

  3. TCP TURN skipping — Yes, esp_peer Currently skips TCP-based TURN candidates. This is expected behavior in the Espressif WebRTC stack — it only uses UDP TURN relays. This is fine as long as UDP TURN is reachable, which your logs confirm (TURN relay candidates were acquired). The skipping itself is not the cause of the failure.

  4. transport_ws: unexpected data readable on socket — This comes from esp_websocket_client, not the LiveKit SDK. It typically means the WebSocket received data at an unexpected time (often during or right after the peer connection teardown). It’s usually a symptom rather than the root cause — the RTC peer failed first, which triggered cleanup that raced with the signaling transport.

  5. UDP: Failed to select: Bad file number — This is EBADF from the FreeRTOS/lwIP network stack, meaning a socket file descriptor was invalidated while another task was still selecting on it. This is a race condition in esp_peer’s transport layer during reconnect/teardown. It’s a secondary symptom of the same underlying failure.

Where I’d look next:

  • Memory pressure. The DTLS handshake and SRTP context allocation require a significant heap. Check esp_get_free_heap_size() and esp_get_free_internal_heap_size() right before the peer state transitions. If the internal heap is below ~40-50 KB at that point, you may be running out of memory during the DTLS handshake. Make sure PSRAM is properly configured (octal mode, CONFIG_SPIRAM_MODE_OCT=y, CONFIG_SPIRAM_MALLOC_ALWAYSINTERNAL=256).

  • ESP-IDF version. You’re on v5.5.3 — the SDK is tested against v5.4.x. There may be esp_peer or esp_websocket_client behavioral changes in 5.5. x. If possible, try building against ESP-IDF 5.4 to rule this out.

  • Timing. Add a log right before the peer connection is created, showing free heap. The DTLS handshake can fail silently if mbedTLS can’t allocate its buffers, which then manifests as a peer state → FAILED transition.

  • Board BSP. The Lichuang board isn’t in the SDK’s tested set. Can you confirm the PSRAM, flash size, and cache settings in your sdkconfig? The reference configs in the SDK examples use 8MB octal PSRAM, 16MB QIO flash, and a 64KB data cache with a 64B line size — these matter for real-time WebRTC.

Hope this helps narrow it down. If you can share your sdkconfig.defaults and heap numbers right before the failure, that would help diagnose further. Be sure to remove any passwords or keys.

The logs you shared appear to cover the important parts, but sometimes there is an indication of what happens before the actual problems start. If you can share the full monitor output from the reset after the failure, that may be helpful too, in the off chance something was missed in the initial logs you shared.

I found the problem, because my device connect the livekit cloud with vpn, and the vpn use the fake-ip rule. So the device get the ip of livekit is different. And the ICE in esp_peer is limited. When I change the network, it worked. Thanks!!

Great, glad you sorted it out.