Multiple rooms reconnection fails

GitHub Issue #858
Hi, I have an Android application that connects to multiple rooms at the same time.

The issue occurs during reconnection (I simulated a connection drop by turning WiFi off and on). While some rooms reconnect successfully, others fail. The room that fails to reconnect is not always the same, and the issue happens frequently, although not every time.

I am logging the Room events, and this is what I see when I start my app (6 rooms connected):

Room Status: onConnected: 1002
Room Status: onConnected: 1006
Room Status: onConnected: 1004
Room Status: onConnected: 1003
Room Status: onConnected: 1005
Room Status: onConnected: 1001

Then I turn off the WiFi and I don’t receive the event onReconnecting for the room 1002:

Room Status: onReconnecting: 1005
Room Status: onReconnecting: 1006
Room Status: onReconnecting: 1001
Room Status: onReconnecting: 1003
Room Status: onReconnecting: 1004

Finally, when I turn the Wi-Fi back on, I get:

Room Status: onReconnected: 1006
Room Status: onReconnected: 1001
Room Status: onReconnected: 1003
Room Status: onReconnected: 1005
Room Status: onReconnected: 1004

I also have a loop that checks the Room status every 5 seconds. When I turn off the Wi-Fi, it logs the following:

All Room Status every 5 seconds: 
1006: RECONNECTING
1005: RECONNECTING
1004: RECONNECTING
1003: RECONNECTING
1002: CONNECTED
1001: RECONNECTING

So the room 1002 stays in CONNECTED status, but it won’t work after every other room reconnected.

In other cases, a Room remains in the RECONNECTING state and never transitions to CONNECTED or DISCONNECTED.

Server

  • Version: 1.9.11
  • Environment: running the exe locally

Client

  • SDK: Android
  • Version: 2.23.2

I see no activity on your open issue: Multiple rooms reconnection fails · Issue #858 · livekit/client-sdk-android · GitHub

Do you have a test harness for this?
Also, what device(s) / emulator(s) did you test this on? It feels like a timing issue somewhere so I wonder if it’s more prevalent on certain (older?) devices?

I’m currently using a Samsung XCover 6 Pro (2022) with our app. Unfortunately, I cannot share it for testing.

I understand you can’t share the app, I thought maybe you had a test harness you had used to isolate the root cause to the reconnection logic

I don’t, but if it’s useful I can create a small test app. I just need some details about your environment, i.e., whether you can generate the tokens, whether you can run the app from source code (with Android Studio), etc…

When this happens, it can be helpful to capture device logs with adb:

adb logcat -v threadtime

and pipe that to a file. From there, analyze what may have happened.

If you share those logs, we can have a look to see if anything stands out.

Is it okay if I capture logcat filtered by my app’s PID while enabling your SDK logs at the VERBOSE level, or do I need to use a specific tag or leave the logs unfiltered?

I usually look at them unfiltered and filter while viewing so I drill in to see if there is something at the radio or OS layer that maybe causing an issue.

Here are the logs.

Just to clarify what I did:

  • I started the app which connected successfully to 6 rooms, with names from 1001 to 1006
  • Then I turned off the WiFi, waited for 5-6 seconds and turned it back on.
  • After this, all rooms, except 1004, reconnected succesfully.
  • The room 1004 is stuck in RECONNECTING status.

logs.txt (4.4 MB)

One interesting thing is, I do not see a refresh token for the 1004 reconnect. Not sure what this means yet.

Are you connecting to the LiveKit cloud or a self-hosted server? I did not find those sessions in our server logs. I think the answers are in the server logs, so check there.

It’s a self hosted server. These are the Android and server logs.
logs_android.txt (4.1 MB)
logs_server.txt (1.4 MB)

In this case, both issues occurred. After turning off the WiFi, room 1004 did not change to RECONNECTING state and remained CONNECTED.
After turning the WiFi back on, room 1005 remained stuck in the RECONNECTING state.

Even though room 1004 appears as CONNECTED in the logs, from the server’s point of view the app is actually disconnected.

Thanks we are looking into it.

1 Like

I would be curious if you see the same issue if you tested again LiveKit cloud

Hi, I switched my app to LiveKit cloud and I’m having the same problems. The logs of the Android app:
logs.txt (4.1 MB)

The issues are the same:

  • room 1004 stuck in RECONNECTING
  • room 1005 shows CONNECTED , but is actually DISCONNECTED

Our team has been trying to reproduce this, but we have not been able to. Can you please try 2.23.4?

Thank you for looking into this.
I tested the version 2.23.4, but the bug is still present.

We are not sure how to reproduce. Can you make a simple app code we can run that demonstrates the issue? If you can log what happens when it happens that is great. But can probably see it in the logs too.

There is a PR to maybe fix it but without being able to reproduce it is basically throwing darts in the dark.

I created this simple app. The README should contains all the necessary information to run the app and reproduce the issue. If you have any questions I’m here. Thank you.

GitHub