M5Stack CoreS3 + AW88298/ES7210 shared I2S — no speaker output (capture works)

Hi,

I’m using the **LiveKit ESP32 SDK (v0.3.6)** voice_agent example on **M5Stack CoreS3** with the **tempotian/codec_board** component. The board has:

- **Playback:** AW88298 (DAC) on shared I2S TX

- **Capture:** ES7210 (ADC) on shared I2S RX

- **Same I2S port** for both (full-duplex, one I2S_NUM_0 with tx_handle + rx_handle)

**Problem:** Microphone capture works (Room connects, we send audio, agent responds). Subscribed audio is received, Opus is decoded, and PCM is written to the codec (we see `I2S_RENDER: write #N size=1280 max_abs=…` with normal levels). **But there is no sound from the speaker.**

**What we’ve tried:**

1. **Init order:** We tried opening the playback device early in `media_init` (before capture). Then capture opens when the Room connects and reconfigures the shared I2S for input (16k, 4ch TDM). When the first RTP frame arrives, `esp_codec_dev_open(play_handle, &fs)` was returning early because the handle was “already open”, so the output path was never reconfigured after capture had changed the I2S.

2. **Fix in `esp_codec_dev_open`:**

  • For a **shared IN_OUT handle**: if only one direction is open, we now open the other direction (set_fmt, enable, codec set_fs/enable for that direction).

  • For **OUT-only handle already open**: we re-apply output config (set_fmt OUT, enable OUT, codec set_fs, _update_codec_setting) so that after capture has opened and reconfigured the shared I2S, playback is configured again.

3. **No early open:** We removed the early open of the playback device in `media_init`. So capture opens first (when the pipeline starts), then playback is opened on the first received audio frame. We still get no speaker output.

4. **Mute/volume:** We do a mute toggle (mute true → false) and set volume to 100 when opening playback, similar to other working BSPs.

So far the issue persists: data is decoded and written to I2S with reasonable PCM levels, but the physical speaker stays silent. We suspect the **shared I2S + dual-codec (AW88298 + ES7210) init/order** or **AW88298-specific** behaviour.

**Questions:**

1. Is there a **recommended open order or pattern** for boards where capture and playback share one I2S (full-duplex) and use two separate codecs (e.g. AW88298 + ES7210)?

2. Has anyone successfully run the voice_agent example (or minimal with playback) on **M5Stack CoreS3** with speaker output?

3. Any known quirks with **AW88298** in this SDK/codec_board (e.g. mute, format, or enable sequence)?

Happy to share more logs or code snippets if useful. Thanks.

-–

## Logs (paste this block at the end of your post)

```

# Codec / I2S init (at boot)

I (728) CODEC_INIT: Init i2s 0 type: 3 mclk:0 bclk:34 ws:33 din:14 dout:13

I (735) CODEC_INIT: tx:0x3c28102c rx:0x3c2811e8

I (740) CODEC_INIT: output init std ret 0

W (743) i2s_tdm: the current mclk multiple is too small, adjust the mclk multiple to 384

I (751) CODEC_INIT: Input init tdm ret 0

I (806) I2S_IF: channel mode 0 bits:16/16 channel:2 mask:3

I (806) I2S_IF: STD Mode 1 bits:16/16 channel:2 sample_rate:16000 mask:3

I (811) Adev_Codec: Open codec device OK

# After Room connects — capture opens (record path)

I (6932) livekit_example: Room state changed: Connected

I (6935) I2S_IF: channel mode 2 bits:16/16 channel:4 mask:3

I (6946) I2S_IF: TDM Mode 0 bits:16/16 channel:4 sample_rate:16000 mask:3

I (6952) I2S_IF: channel mode 0 bits:16/32 channel:2 mask:3

I (6958) I2S_IF: STD Mode 1 bits:16/32 channel:2 sample_rate:16000 mask:3

I (6965) ES7210: Bits 16

I (6973) ES7210: Enable ES7210_INPUT_MIC1

I (6985) ES7210: Enable TDM mode

I (6991) ES7210: Unmuted

I (6991) Adev_Codec: Open codec device OK

I (6904) AUD_PIPE_NEGO: Negotiate return 0 src_format:541934416 sample_rate:16000

I (6914) AUD_PIPE_NEGO: Path mask 1 select sink:0 format 1398100047

I (6920) AUD_SRC: Get rate:16000, ch:1, bits:16

I (6931) I2S_IF: Mode 1 need extend bits 32 to 64

I (7092) ESP_GMF_AENC: Open, type:OPUS, acquire in frame: 640, out frame: 100

# When agent sends audio we also see (not in this snippet):

# I (xxxx) I2S_RENDER: open channel:2 sample_rate:16000 bits:16

# I (xxxx) I2S_RENDER: write #1 size=1280 max_abs=… peak=…

# (PCM levels look normal but speaker stays silent)

```

This is a great question — I can see exactly what’s happening from the logs. Let me break it down.

The Smoking Gun: I2S Slot Width Change

Look at your TX path before and after capture opens:

# Boot (playback configured):
I2S_IF: STD Mode 1 bits:16/16 channel:2 sample_rate:16000 mask:3

# After capture opens (TX reconfigured as side-effect):
I2S_IF: STD Mode 1 bits:16/32 channel:2 sample_rate:16000 mask:3

The TX slot width changed from 16-bit to 32-bit when capture opened its TDM path. On ESP32-S3 with a shared I2S port, reconfiguring RX (for ES7210 TDM) forces the TX clock/format to be rederived from the same BCLK. The I2S driver widened the TX slots to 32 bits to accommodate the TDM frame.

The AW88298 was initialized for 16/16 at boot. After capture reconfigures the shared I2S, the AW88298 is still expecting 16-bit slots but the I2S is now sending 16-bit samples packed into 32-bit slots. The AW88298 sees the wrong data alignment — effectively silence or garbage.

The Fix

The core issue is open order + missing re-init of the playback codec after capture changes the I2S. Here’s the recommended pattern for shared-I2S dual-codec boards:

Option A: Open capture first, playback second (recommended)

Don’t open the playback device early in media_init(). Let the pipeline handle it:

  1. Room connects → capture opens (ES7210 TDM, reconfigures I2S including TX path)

  2. First subscribed audio arrives → playback opens (AW88298 configures itself for the current I2S state, which is now 16/32)

The key is that esp_codec_dev_open(play_handle, &fs) must be called after capture has finished reconfiguring the I2S. If it’s called before, the AW88298 is configured for the pre-capture format which then gets invalidated.

You said you tried this (“No early open”) and it still didn’t work. That suggests the AW88298 codec driver itself may not handle 32-bit slots. Check the codec_board’s AW88298 driver — look for where it calls i2s_set_fmt or sets the codec’s I2S format register. The AW88298 has an I2S format register (typically at 0x06 or similar) that controls slot width. If the driver hardcodes 16-bit, it won’t match the 32-bit slots from the shared I2S.

Option B: Force TX to stay 16/16

If you can configure the I2S TX channel independently of RX, set the TX slot width explicitly to 16 bits after capture opens:

// After capture opens and reconfigures the shared I2S:
i2s_channel_disable(tx_handle);
// Reconfigure TX to 16/16 STD
i2s_std_config_t tx_cfg = {
    .clk_cfg = I2S_STD_CLK_DEFAULT_CONFIG(16000),
    .slot_cfg = I2S_STD_PHILIPS_SLOT_DEFAULT_CONFIG(I2S_DATA_BIT_WIDTH_16BIT,
                                                      I2S_SLOT_MODE_STEREO),
    // ...
};
i2s_channel_reconfig_std_slot(tx_handle, &tx_cfg.slot_cfg);
i2s_channel_enable(tx_handle);

But this may not work on all ESP-IDF versions — shared I2S ports derive TX and RX clocks from the same source, so the driver may reject it.

Option C: Close and reopen playback after capture

The most reliable approach:

// In your on_state_changed callback, after capture has opened:
if (state == LIVEKIT_CONNECTION_STATE_CONNECTED) {
    // Capture is now open and I2S is in its final configuration
    // Force-close and reopen playback so AW88298 sees the current format
    esp_codec_dev_close(play_handle);

    esp_codec_dev_sample_info_t fs = {
        .sample_rate = 16000,
        .channel = 2,
        .bits_per_sample = 16,
    };
    esp_codec_dev_open(play_handle, &fs);
    esp_codec_dev_set_out_mute(play_handle, false);
    esp_codec_dev_set_out_vol(play_handle, 100);
}

AW88298-Specific Things to Check

  1. I2S format register: The AW88298 has a register that configures its expected I2S format (slot width, data width, left/right justification). After the I2S reconfiguration, this register needs to match the new 32-bit slot width. Check the codec_board’s aw88298.c — look for the set_fs or set_fmt function and verify it handles 32-bit slots.

  2. Mute register: The AW88298 has a system control register with a mute bit. Some driver implementations leave it muted after init. After opening, explicitly:

    esp_codec_dev_set_out_mute(play_handle, true);   // mute
    vTaskDelay(pdMS_TO_TICKS(50));
    esp_codec_dev_set_out_mute(play_handle, false);  // unmute
    
    
  3. Amplifier enable GPIO: The M5Stack CoreS3 may have a GPIO that enables the AW88298’s amplifier stage (separate from I2C control). Check the M5Stack schematic for an AMP_EN or SPK_EN pin. If it exists, it needs to be driven high.

  4. I2C address: The AW88298 on CoreS3 is typically at 0x36. Verify the codec_board is using the right address.

Diagnostic Step

Add this after playback opens to confirm the I2S TX format matches what the AW88298 expects:

ESP_LOGI(TAG, "Play codec opened — verify I2S TX is 16/16 or that AW88298 is configured for 16/32");

And in the AW88298 driver’s set_fs function, log the slot width it’s configuring:

ESP_LOGI("AW88298", "set_fs: rate=%d bits=%d slots=%d", rate, bits, slot_width);

If you see bits=16 slots=32 and the AW88298 register write still sets 16-bit slots, that’s your bug.

TL;DR

The shared I2S reconfiguration changes TX from 16/16 to 16/32 when capture opens. The AW88298 is never told about this change. Either (a) close/reopen the playback codec after capture opens, or (b) fix the AW88298 driver to handle 32-bit slots, or (c) find a way to keep TX at 16/16 independently of the RX TDM configuration.

Hi,

I wanted to say thank you for your help debugging the M5Stack CoreS3 setup with shared I2S (ES7210 TDM + AW88298) and the LiveKit voice_agent path.

Your explanation about TX slot width changing after capture opens (16/16 → 16/32) and the need to realign the playback codec / REG06 was exactly what we needed. On top of that, we had to fix sample-rate mismatch (SDP/decoding vs 16 kHz hardware) and channel mismatch (SDP mono vs Opus decoder stereo), which were causing the slow, low-pitched “male” voice.

Playback now sounds normal end-to-end.

Thanks again for your time and clear guidance — it made a big difference.

Wow, thanks for checking in. It will be great for the next person who has a similar situation.