Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with Quantizing EEGNet Model on ESP32-S3 (AIV-744) #194

Open
3 tasks done
manhdatbn93 opened this issue Jan 21, 2025 · 16 comments
Open
3 tasks done

Issue with Quantizing EEGNet Model on ESP32-S3 (AIV-744) #194

manhdatbn93 opened this issue Jan 21, 2025 · 16 comments

Comments

@manhdatbn93
Copy link

manhdatbn93 commented Jan 21, 2025

Checklist

  • Checked the issue tracker for similar issues to ensure this is not a duplicate.
  • Provided a clear description of your suggestion.
  • Included any relevant context or examples.

Issue or Suggestion Description

Hi Team, I am currently working on quantizing the EEGNet model for deployment on my custom board based on the ESP32-S3. My goal is to use this model for a simple classification task: detecting eye blink and eye non-blink from EEG data.

Here is the EEGNet model I am using: https://github.com/YuDongPan/DL_Classifier/blob/main/Model/EEGNet.py

Issue
When I attempt to quantize this model using ESPDL, I encounter an error in the avg_pool2d function. The error details are as follows:

File "~\Python\Python310\lib\site-packages\ppq\executor\op\torch\default.py", line 800, in AveragePool_forward
output = F.avg_pool2d(
RuntimeError: Given input size: (96x1x1). Calculated output size: (96x1x0). Output size is too small

My batch size is

torch.Size([32, 1, 8, 500])

Here is the function for quantizing

quant_ppq_graph = espdl_quantize_torch(
        model=model,
        espdl_export_file=ESPDL_MODLE_PATH,
        calib_dataloader=dataloader,
        calib_steps=32,
        input_shape=[1] + [1] + INPUT_SHAPE,
        target=TARGET,
        num_of_bits=NUM_OF_BITS,
        collate_fn=collate_fn2,
        setting=None,
        device=DEVICE,
        error_report=True,
        skip_export=False,
        export_test_values=False,
        verbose=1,
    )

I would appreciate it if you could help identify the root cause of this issue and provide guidance on:

  1. Whether the pooling operations in EEGNet require modification for compatibility with ESPDL quantization.
  2. Any specific changes needed in the model or data pipeline for successful quantization.
  3. Do you have any example of Quantize model with 2-dimensions input like EEG input signal? The input shape like this: [batch_size, channels, samples].

Thank you for your support. Please let me know if you need additional details about the setup or the error logs.

Additional Details
Data Input Shape: [batch_size, 1, 8, 500]
Target Board: ESP32-S3
Quantization Tool: ESPDL
Framework: PyTorch

Full modified code, other codes are same as example:

def collate_eeg_fn(x: Tuple) -> torch.Tensor:
    return torch.cat([sample[0].unsqueeze(0).unsqueeze(1) for sample in x], dim=0)

if __name__ == "__main__":
    BATCH_SIZE = 32
    INPUT_SHAPE = [8, 500]
    DEVICE = "cpu"  #  'cuda' or 'cpu', if you use cuda, please make sure that cuda is available
    TARGET = "esp32s3"
    NUM_OF_BITS = 8
    ESPDL_MODLE_PATH = "models/torch/eegnet.espdl"

    model = eeg.EEGNet(8, 500, 2)
    model = model.to(DEVICE)
    # Load the pretrained weights
    pretrained_weights_path = "eyeblink_eegnet.pt"
    model.load_state_dict(torch.load(pretrained_weights_path))

    loaded_dataset = torch.load("eyeblink_dataset.pt", weights_only=True)
    print(f"Loaded dataset with {len(loaded_dataset)} samples.")

    dataloader = DataLoader(
            dataset=loaded_dataset,
            batch_size=BATCH_SIZE,
            shuffle=False,
            num_workers=1,
            pin_memory=False,
            collate_fn=collate_eeg_fn,
        )

    for batch in dataloader:
        print(batch.shape)
        break
    
   
    # quant_setting, model = quant_setting_eegnet(
    #     model, ["LayerwiseEqualization_quantization"]
    # )

    quant_ppq_graph = espdl_quantize_torch(
        model=model,
        espdl_export_file=ESPDL_MODLE_PATH,
        calib_dataloader=dataloader,
        calib_steps=32,
        input_shape=[1] + [1] + INPUT_SHAPE,
        target=TARGET,
        num_of_bits=NUM_OF_BITS,
        collate_fn=collate_fn2,
        setting=None,
        device=DEVICE,
        error_report=True,
        skip_export=False,
        export_test_values=False,
        verbose=1,
    )

Attached is the model I got as *.onnx although the run failed.
Image

@github-actions github-actions bot changed the title Issue with Quantizing EEGNet Model on ESP32-S3 Issue with Quantizing EEGNet Model on ESP32-S3 (AIV-744) Jan 21, 2025
@BlueSkyB
Copy link
Collaborator

We will download the code to analyze the issue. If it's convenient for you to upload the ONNX model file, it will accelerate our debugging process.

@manhdatbn93
Copy link
Author

manhdatbn93 commented Jan 23, 2025

Hi @BlueSkyB, Yes. I have uploaded my trained model in pytorch (*.pt) and ONNX model that I got from quantize script in zip file

eyeblink_trained_model.zip

@manhdatbn93
Copy link
Author

Is it feasible to operate a small model, like the one I have, on an ESP32S3 module without a PSRAM component? because the ESP32-S3-WROOM-1-N16 I received does not have a PSRAM option.
I tried to flash an example from GitHub and always triggers CPU reset due to PSRAM

@BlueSkyB
Copy link
Collaborator

You can disable the configuration of CONFIG_SPIRAM by using idf.py menuconfig. If there is no PSRAM, the system will default to using internal RAM. In this case, you must ensure that the model is small enough so that the memory required for its parameters and feature maps is less than the available system memory; otherwise, it will not function properly.

Alternatively, you can refer to the document https://docs.espressif.com/projects/esp-dl/en/latest/tutorials/how_to_load_model.html to disable param_copy, but this will significantly reduce performance.

Additionally, your model contains the Elu operator, which is currently not supported by ESP-DL. You can check the supported operators at https://github.com/espressif/esp-dl/blob/master/operator_support_state.md. We will support it in the future.
You can also add the operator yourself by referring to https://docs.espressif.com/projects/esp-dl/en/latest/tutorials/how_to_add_a_new_module%28operator%29.html

@manhdatbn93
Copy link
Author

manhdatbn93 commented Jan 23, 2025

Hi @BlueSkyB, thank you so much for your information.

I have modified the model to make it simpler for testing, and it successfully passes the quantization script.

I imported the eegnet.espdl (18 KB) in the same way as demonstrated in the mobilenet_v2 example, but I encountered the following issue:
Warning:

calloc_aligned:` heap_caps_aligned_calloc failed, retry with MALLOC_CAP_8BIT

Updated (250124): This warning has been fixed once I switched to my ESP32S3 EVK with PSRAM support. The PSRAM effect appears to call the heap within. However, the Error is still visible.

Error:

assert failed: virtual std::vector<std::vector> dl::module::Gemm::get_output_shape(std::vector<std::vector>& ) dl_module_gemm.hpp:71 (input_shapes[0][input_shapes[0].size() - 1] == filter-

Do you know how to resolve the warning and the error?
I added some code to debug in dl_module_gemm.hpp module as below:

ESP_LOGI("input_shapes", "get_output_shape");
ESP_LOGI("input_shapes", "input_shapes size: %d", input_shapes.size());
ESP_LOGI("input_shapes", "input_shapes[0] shape: %s", shape_to_string(input_shapes[0]).c_str());
ESP_LOGI("filter", "filter shape: %s", shape_to_string(filter->shape).c_str());
ESP_LOGI("filter", "filter->shape[2]: %d", filter->shape[2]);
ESP_LOGI("test", "input_shapes[0][input_shapes[0].size() - 1]: %d", input_shapes[0][input_shapes[0].size() - 1]);
assert(input_shapes.size() == 1);
assert(filter->shape.size() == 4);
assert(filter->shape[0] == 1);
assert(filter->shape[1] == 1);
assert(input_shapes[0][input_shapes[0].size() - 1] == filter->shape[2]);

And here is the log

I (536) input_shapes: get_output_shape
I (536) input_shapes: input_shapes size: 1
I (546) input_shapes: input_shapes[0] shape: [1, 176]
I (546) filter: filter shape: [1, 1, 352, 16]
I (556) filter: filter->shape[2]: 352
I (556) test: input_shapes[0][input_shapes[0].size() - 1]: 176

It seems something miss in my code because 176 = 352/2

For reference, I have included the log from the monitor below.

I (476) dl::Model: PPQ_Operation_0: Transpose
I (476) dl::Model: /flatten/Flatten: Flatten
I (486) dl::Model: /fc1/Gemm: Gemm
W (486) calloc_aligned: heap_caps_aligned_calloc failed, retry with MALLOC_CAP_8BIT
W (496) calloc_aligned: heap_caps_aligned_calloc failed, retry with MALLOC_CAP_8BIT
I (506) dl::Model: /prelu3/PRelu: PRelu
W (506) calloc_aligned: heap_caps_aligned_calloc failed, retry with MALLOC_CAP_8BIT
I (516) dl::Model: /fc2/Gemm: Gemm
W (516) calloc_aligned: heap_caps_aligned_calloc failed, retry with MALLOC_CAP_8BIT
W (526) calloc_aligned: heap_caps_aligned_calloc failed, retry with MALLOC_CAP_8BIT

assert failed: virtual std::vector<std::vector<int> > dl::module::Gemm::get_output_shape(std::vector<std::vector<int> >&) dl_module_gemm.hpp:71 (input_shapes[0][input_shapes[0].size() - 1] == filter-


Backtrace: 0x40375a8d:0x3fc99560 0x4037b071:0x3fc99580 0x40381a4d:0x3fc995a0 0x42021979:0x3fc996c0 0x4203b631:0x3fc996f0 0x4203c706:0x3fc998f0 0x4202f3b8:0x3fc99960 0x4202f857:0x3fc999f0 0x42009dd2:0x3fc99a20 0x4209251f:0x3fc99a50 0x4037b899:0x3fc99a80
--- To exit from IDF monitor please use "Ctrl+]". Alternatively, you can use Ctrl+T Ctrl+X to exit.
I (23) boot: ESP-IDF HEAD-HASH-NOTFOUND 2nd stage bootloader
I (23) boot: compile time Jan 23 2025 23:01:11
I (23) boot: Multicore bootloader
I (24) boot: chip revision: v0.2
I (27) boot: efuse block revision: v1.3
I (30) boot.esp32s3: Boot SPI Speed : 80MHz
I (34) boot.esp32s3: SPI Mode       : DIO
I (38) boot.esp32s3: SPI Flash Size : 16MB
I (42) boot: Enabling RNG early entropy source...
I (46) boot: Partition Table:
I (49) boot: ## Label            Usage          Type ST Offset   Length
I (55) boot:  0 factory          factory app      00 00 00010000 003e8000
I (62) boot:  1 model            Unknown data     01 82 003f8000 000fa000
I (68) boot: End of partition table
I (72) esp_image: segment 0: paddr=00010020 vaddr=3c0a0020 size=10708h ( 67336) map
I (91) esp_image: segment 1: paddr=00020730 vaddr=3fc93b00 size=02af0h ( 10992) load
I (94) esp_image: segment 2: paddr=00023228 vaddr=40374000 size=0cdf0h ( 52720) load
I (106) esp_image: segment 3: paddr=00030020 vaddr=42000020 size=930d8h (602328) map
I (213) esp_image: segment 4: paddr=000c3100 vaddr=40380df0 size=02c04h ( 11268) load
I (215) esp_image: segment 5: paddr=000c5d0c vaddr=600fe100 size=0001ch (    28) load
I (222) boot: Loaded app from partition at offset 0x10000
I (223) boot: Disabling RNG early entropy source...
I (238) cpu_start: Multicore app
I (247) cpu_start: Pro cpu start user code
I (247) cpu_start: cpu freq: 240000000 Hz
I (247) app_init: Application information:
I (250) app_init: Project name:     eyeblink_v2
I (255) app_init: App version:      1
I (260) app_init: Compile time:     Jan 23 2025 23:00:21
I (266) app_init: ELF file SHA256:  84eb69f48...
I (271) app_init: ESP-IDF:          HEAD-HASH-NOTFOUND
I (277) efuse_init: Min chip rev:     v0.0
I (281) efuse_init: Max chip rev:     v0.99
I (286) efuse_init: Chip rev:         v0.2
I (291) heap_init: Initializing. RAM available for dynamic allocation:
I (298) heap_init: At 3FC97108 len 00052608 (329 KiB): RAM
I (304) heap_init: At 3FCE9710 len 00005724 (21 KiB): RAM
I (311) heap_init: At 3FCF0000 len 00008000 (32 KiB): DRAM
I (317) heap_init: At 600FE11C len 00001ECC (7 KiB): RTCRAM
I (324) spi_flash: detected chip: gd
I (327) spi_flash: flash io: dio
I (331) sleep_gpio: Configure to isolate all GPIO pins in sleep state
I (339) sleep_gpio: Enable automatic switching of GPIO sleep configuration
I (346) main_task: Started on CPU0
I (376) main_task: Calling app_main()
I (376) EYEBLINK_DETECTION_TEST: get into app_main
I (376) FbsLoader: The storage free size is 32000 KB
I (376) FbsLoader: The partition size is 1000 KB
I (386) dl::Model: model:main_graph, version:0

I (396) dl::Model: /temporal_conv/Conv: Conv
W (396) calloc_aligned: heap_caps_aligned_calloc failed, retry with MALLOC_CAP_8BIT
W (406) calloc_aligned: heap_caps_aligned_calloc failed, retry with MALLOC_CAP_8BIT
I (416) dl::Model: /prelu1/PRelu: PRelu
W (416) calloc_aligned: heap_caps_aligned_calloc failed, retry with MALLOC_CAP_8BIT
I (426) dl::Model: /pool1/AveragePool: AveragePool
I (436) dl::Model: /spatial_conv/Conv: Conv
W (436) calloc_aligned: heap_caps_aligned_calloc failed, retry with MALLOC_CAP_8BIT
W (446) calloc_aligned: heap_caps_aligned_calloc failed, retry with MALLOC_CAP_8BIT
I (456) dl::Model: /prelu2/PRelu: PRelu
W (456) calloc_aligned: heap_caps_aligned_calloc failed, retry with MALLOC_CAP_8BIT
I (466) dl::Model: /pool2/AveragePool: AveragePool
I (476) dl::Model: PPQ_Operation_0: Transpose
I (476) dl::Model: /flatten/Flatten: Flatten
I (486) dl::Model: /fc1/Gemm: Gemm
W (486) calloc_aligned: heap_caps_aligned_calloc failed, retry with MALLOC_CAP_8BIT
W (496) calloc_aligned: heap_caps_aligned_calloc failed, retry with MALLOC_CAP_8BIT
I (506) dl::Model: /prelu3/PRelu: PRelu
W (506) calloc_aligned: heap_caps_aligned_calloc failed, retry with MALLOC_CAP_8BIT
I (516) dl::Model: /fc2/Gemm: Gemm
W (516) calloc_aligned: heap_caps_aligned_calloc failed, retry with MALLOC_CAP_8BIT
W (526) calloc_aligned: heap_caps_aligned_calloc failed, retry with MALLOC_CAP_8BIT

assert failed: virtual std::vector<std::vector<int> > dl::module::Gemm::get_output_shape(std::vector<std::vector<int> >&) dl_module_gemm.hpp:71 (input_shapes[0][input_shapes[0].size() - 1] == filter-


Backtrace: 0x40375a8d:0x3fc99560 0x4037b071:0x3fc99580 0x40381a4d:0x3fc995a0 0x42021979:0x3fc996c0 0x4203b631:0x3fc996f0 0x4203c706:0x3fc998f0 0x4202f3b8:0x3fc99960 0x4202f857:0x3fc999f0 0x42009dd2:0x3fc99a20 0x4209251f:0x3fc99a50 0x4037b899:0x3fc99a80
--- 0x40375a8d: panic_abort at ~/esp/v5.4/esp-idf/components/esp_system/panic.c:454
0x4037b071: esp_system_abort at ~/esp/v5.4/esp-idf/components/esp_system/port/esp_system_chip.c:92
0x40381a4d: __assert_func at ~/esp/v5.4/esp-idf/components/newlib/assert.c:80
0x42021979: dl::module::Gemm::get_output_shape(std::vector<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > >&) at ~/Firmware/eyeblink_v2/esp-dl/dl/module/include/dl_module_gemm.hpp:71 (discriminator 1)
0x4203b631: dl::memory::MemoryManagerGreedy::get_tensor_info_from_fbs(fbs::FbsModel*, std::vector<dl::module::Module*, std::allocator<dl::module::Module*> >, std::vector<dl::memory::TensorInfo*, std::allocator<dl::memory::TensorInfo*> >&) at ~/Firmware/eyeblink_v2/esp-dl/dl/model/src/dl_memory_manager_greedy.cpp:140
0x4203c706: dl::memory::MemoryManagerGreedy::alloc(fbs::FbsModel*, std::vector<dl::module::Module*, std::allocator<dl::module::Module*> >&) at ~/Firmware/eyeblink_v2/esp-dl/dl/model/src/dl_memory_manager_greedy.cpp:41 (discriminator 1)
0x4202f3b8: dl::Model::build(unsigned int, dl::memory_manager_t, bool) at ~/Firmware/eyeblink_v2/esp-dl/dl/model/src/dl_model_base.cpp:171
0x4202f857: dl::Model::Model(char const*, fbs::model_location_type_t, int, dl::memory_manager_t, unsigned char*, bool) at ~/Firmware/eyeblink_v2/esp-dl/dl/model/src/dl_model_base.cpp:20
 (inlined by) dl::Model::Model(char const*, fbs::model_location_type_t, int, dl::memory_manager_t, unsigned char*, bool) at ~/Firmware/eyeblink_v2/esp-dl/dl/model/src/dl_model_base.cpp:12
0x42009dd2: app_main at~/Firmware/eyeblink_v2/main/main.cpp:103 (discriminator 1)
0x4209251f: main_task at ~/esp/v5.4/esp-idf/components/freertos/app_startup.c:208
0x4037b899: vPortTaskWrapper at ~/esp/v5.4/esp-idf/components/freertos/FreeRTOS-Kernel/portable/xtensa/port.c:139





ELF file SHA256: 84eb69f48

Rebooting...

And here is the code in app_main function

extern "C" void app_main(void)
{
    ESP_LOGI(TAG, "get into app_main");
    Model *model = new Model("model", fbs::MODEL_LOCATION_IN_FLASH_PARTITION, 0, MEMORY_MANAGER_GREEDY, nullptr, false);
    delete model;
    ESP_LOGI(TAG, "exit app_main");
}

My ONNX model (change Relu to PRelu)

Image

@BlueSkyB
Copy link
Collaborator

BlueSkyB commented Jan 26, 2025

Hi @BlueSkyB, Yes. I have uploaded my trained model in pytorch (*.pt) and ONNX model that I got from quantize script in zip file

eyeblink_trained_model.zip

Hi @manhdatbn93 . We have fixed the bug and the model is now running normally. Please update esp-ppq and esp-dl as well.
After the update, if there are still other issues with the network you want to deploy, you can raise another issue. The network you mentioned later can also be tried again after the update.

@manhdatbn93
Copy link
Author

Hi @BlueSkyB, Yes. I have uploaded my trained model in pytorch (*.pt) and ONNX model that I got from quantize script in zip file
eyeblink_trained_model.zip

Hi @manhdatbn93 . We have fixed the bug and the model is now running normally. Please update esp-ppq and esp-dl as well. After the update, if there are still other issues with the network you want to deploy, you can raise another issue. The network you mentioned later can also be tried again after the update.

Hi @BlueSkyB, Thank you very much for the update. I just wanted to clarify your comment, "the model is now running normally." Does this mean the model is successfully running on the ESP32S3 series chip, or are you referring to the quantization script working correctly for this model?

@manhdatbn93
Copy link
Author

Hi @BlueSkyB, After updating, the quantization script is working, and I successfully obtained the model.espdl. However, when embedding this model into the ESP32S3-DevKitC-1 v1.1, I encountered an issue with an assertion failure in the reshape module.

Here is the error log

assert failed: virtual std::vector<std::vector > dl::module::Reshape::get_output_shape(std::vector<std::vector >&) dl_module_reshape.hpp:69 (input_size % shape_param_size == 0)

I added some code to print the input shape in the get_output_shape() function of Reshape module (dl_module_reshape.hpp) as below:

std::vector<std::vector<int>> get_output_shape(std::vector<std::vector<int>> &input_shapes)
    {
        assert(input_shapes.size() == 1);

        int input_size = 1;
        std::cout << "Input Shapes: ";
        for (const auto &dim : input_shapes[0]) {
            std::cout << dim << " ";
        }
        std::cout << std::endl;

        for (int i = 0; i < input_shapes[0].size(); i++) {
            assert(input_shapes[0][i] > 0);
            input_size *= input_shapes[0][i];
            std::cout << "Input Size: " << input_size << std::endl;
        }

        int64_t *shape_param = static_cast<int64_t *>(m_shape->get_element_ptr());
        int negative_index = -1;
        int shape_param_size = 1;
        for (int i = 0; i < m_shape->get_size(); i++) {
            if (negative_index == -1 && shape_param[i] == -1) {
                negative_index = i;
            } else if (shape_param[i] > 0) {
                shape_param_size *= shape_param[i];
            } else {
                assert(false);
            }
        }

        std::cout << "\nNegative Index: " << negative_index << std::endl;
        std::cout << "Shape Param Size: " << shape_param_size << std::endl;

        std::vector<int> output(m_shape->get_size());
        if (negative_index == -1) {
            assert(shape_param_size == input_size);
            for (int i = 0; i < m_shape->get_size(); i++) {
                output[i] = static_cast<int>(shape_param[i]);
            }
        } else {
            assert(input_size % shape_param_size == 0);
            for (int i = 0; i < m_shape->get_size(); i++) {
                if (i == negative_index) {
                    output[i] = input_size / shape_param_size;
                } else {
                    output[i] = static_cast<int>(shape_param[i]);
                }
            }
        }
        std::vector<std::vector<int>> output_shapes(1, output);
        return output_shapes;
    }

Below is the log after adding codes.

I (37) boot: ESP-IDF v5.4-dirty 2nd stage bootloader
I (38) boot: compile time Jan 28 2025 01:26:59
I (38) boot: Multicore bootloader
I (38) boot: chip revision: v0.1
I (41) boot: efuse block revision: v1.2
I (45) boot.esp32s3: Boot SPI Speed : 80MHz
I (48) boot.esp32s3: SPI Mode : SLOW READ
I (53) boot.esp32s3: SPI Flash Size : 32MB
I (57) boot: Enabling RNG early entropy source...
I (61) boot: Partition Table:
I (64) boot: ## Label Usage Type ST Offset Length
I (70) boot: 0 factory factory app 00 00 00010000 003e8000
I (77) boot: 1 model Unknown data 01 82 003f8000 001f4000
I (83) boot: End of partition table
I (86) esp_image: segment 0: paddr=00010020 vaddr=3c0d0020 size=14970h ( 84336) map
I (114) esp_image: segment 1: paddr=00024998 vaddr=3fc98800 size=030d8h ( 12504) load
I (117) esp_image: segment 2: paddr=00027a78 vaddr=40378000 size=085a0h ( 34208) load
I (127) esp_image: segment 3: paddr=00030020 vaddr=42000020 size=c6200h (811520) map
I (317) esp_image: segment 4: paddr=000f6228 vaddr=403805a0 size=081dch ( 33244) load
I (327) esp_image: segment 5: paddr=000fe40c vaddr=600fe100 size=0001ch ( 28) load
I (333) boot: Loaded app from partition at offset 0x10000
I (334) boot: Disabling RNG early entropy source...
I (345) octal_psram: vendor id : 0x0d (AP)
I (345) octal_psram: dev id : 0x02 (generation 3)
I (345) octal_psram: density : 0x03 (64 Mbit)
I (347) octal_psram: good-die : 0x01 (Pass)
I (351) octal_psram: Latency : 0x01 (Fixed)
I (356) octal_psram: VCC : 0x00 (1.8V)
I (360) octal_psram: SRF : 0x01 (Fast Refresh)
I (365) octal_psram: BurstType : 0x01 (Hybrid Wrap)
I (370) octal_psram: BurstLen : 0x01 (32 Byte)
I (374) octal_psram: Readlatency : 0x02 (10 cycles@Fixed)
I (379) octal_psram: DriveStrength: 0x00 (1/1)
I (384) esp_psram: Found 8MB PSRAM device
I (387) esp_psram: Speed: 40MHz
I (390) cpu_start: Multicore app
I (1124) esp_psram: SPI SRAM memory test OK
I (1133) cpu_start: Pro cpu start user code
I (1133) cpu_start: cpu freq: 240000000 Hz
I (1133) app_init: Application information:
I (1133) app_init: Project name: eyeblink_v2
I (1138) app_init: App version: ab8969b-dirty
I (1142) app_init: Compile time: Jan 28 2025 10:51:04
I (1147) app_init: ELF file SHA256: 9d74a2e1c...
I (1152) app_init: ESP-IDF: v5.4-dirty
I (1156) efuse_init: Min chip rev: v0.0
I (1160) efuse_init: Max chip rev: v0.99
I (1164) efuse_init: Chip rev: v0.1
I (1168) heap_init: Initializing. RAM available for dynamic allocation:
I (1174) heap_init: At 3FC9DA78 len 0004BC98 (303 KiB): RAM
I (1179) heap_init: At 3FCE9710 len 00005724 (21 KiB): RAM
I (1185) heap_init: At 600FE11C len 00001ECC (7 KiB): RTCRAM
I (1190) esp_psram: Adding pool of 8192K of PSRAM memory to heap allocator
I (1197) spi_flash: detected chip: mxic (opi)
I (1201) spi_flash: flash io: opi_str
I (1205) sleep_gpio: Configure to isolate all GPIO pins in sleep state
I (1211) sleep_gpio: Enable automatic switching of GPIO sleep configuration
I (1217) main_task: Started on CPU0
I (1247) esp_psram: Reserving pool of 32K of internal memory for DMA/internal allocations
I (1247) main_task: Calling app_main()
I (1247) EYEBLINK_DETECTION_TEST: get into app_main
I (1257) FbsLoader: The storage free size is 23616 KB
I (1257) FbsLoader: The partition size is 2000 KB
I (1257) dl::Model: model:main_graph, version:0

I (1267) dl::Model: /conv1/Conv: Conv
I (1267) dl::Model: /Depthwise_Convolution/Conv: Conv
I (1277) dl::Model: /activation/Elu: Elu
I (1277) dl::Model: /pool1/AveragePool: AveragePool
I (1287) dl::Model: /depthwiseConv2/Conv: Conv
I (1287) dl::Model: /pointwiseConv2/Conv: Conv
I (1287) dl::Model: /activation_1/Elu: Elu
I (1297) dl::Model: /pool2/AveragePool: AveragePool
I (1297) dl::Model: PPQ_Operation_0: Transpose
I (1307) dl::Model: /Reshape: Reshape
I (1307) dl::Model: /fc/Gemm: Gemm
Input Shapes: 1 25 28 1
Input Size: 1
Input Size: 25
Input Size: 700
Input Size: 700

Negative Index: 0
Shape Param Size: 1440

assert failed: virtual std::vector<std::vector > dl::module::Reshape::get_output_shape(std::vector<std::vector >&) dl_module_reshape.hpp:79 (input_size % shape_param_size == 0)

@manhdatbn93
Copy link
Author

I suspect there is something missing in my model, but I am not sure what it is. To debug, I reverted to the example provided in the repository.

In the \esp-dl\tools\quantization\quantize_torch_model.py script, I changed the TARGET to the ESP32S3 device while keeping the rest of the code unchanged, then ran the script.

However, I found that the mobilenet_v2.info file generated by the script is different from the example file provided in the repository (\esp-dl\examples\mobilenet_v2\models\esp32s3\mobilenet_v2.info).

I used the new mobilenet_v2.espdl file generated from my script to flash the ESP32S3 with PRSAM support, but it has not been successful.

Could you please help clarify: Why is the mobilenet_v2.info file generated by the script different from the example file in the repository?

Below is the log from the example code

I (216) octal_psram: vendor id : 0x0d (AP)
I (216) octal_psram: dev id : 0x02 (generation 3)
I (216) octal_psram: density : 0x03 (64 Mbit)
I (218) octal_psram: good-die : 0x01 (Pass)
I (222) octal_psram: Latency : 0x01 (Fixed)
I (227) octal_psram: VCC : 0x00 (1.8V)
I (231) octal_psram: SRF : 0x01 (Fast Refresh)
I (236) octal_psram: BurstType : 0x01 (Hybrid Wrap)
I (241) octal_psram: BurstLen : 0x01 (32 Byte)
I (245) octal_psram: Readlatency : 0x02 (10 cycles@Fixed)
I (251) octal_psram: DriveStrength: 0x00 (1/1)
I (255) MSPI Timing: PSRAM timing tuning index: 5
I (259) esp_psram: Found 8MB PSRAM device
I (263) esp_psram: Speed: 80MHz
I (266) cpu_start: Multicore app
I (280) cpu_start: Pro cpu start user code
I (280) cpu_start: cpu freq: 240000000 Hz
I (281) app_init: Application information:
I (281) app_init: Project name: mobilenet_v2_example
I (286) app_init: App version: v3.0.0-46-g971abed-dirty
I (291) app_init: Compile time: Jan 28 2025 16:07:49
I (296) app_init: ELF file SHA256: 125e1eb0f...
I (300) app_init: ESP-IDF: v5.4-dirty
I (305) efuse_init: Min chip rev: v0.0
I (308) efuse_init: Max chip rev: v0.99
I (312) efuse_init: Chip rev: v0.1
I (316) heap_init: Initializing. RAM available for dynamic allocation:
I (322) heap_init: At 3FC9CC78 len 0004CA98 (306 KiB): RAM
I (328) heap_init: At 3FCE9710 len 00005724 (21 KiB): RAM
I (333) esp_psram: Adding pool of 8192K of PSRAM memory to heap allocator
W (340) spi_flash: Octal flash chip is using but qio mode is selected, will automatically switch to Octal mode
I (349) spi_flash: detected chip: mxic (opi)
I (353) spi_flash: flash io: opi_str
W (356) spi_flash: Detected size(32768k) larger than the size in the binary image header(16384k). Using the size in the binary image header.
I (369) sleep_gpio: Configure to isolate all GPIO pins in sleep state
I (375) sleep_gpio: Enable automatic switching of GPIO sleep configuration
I (382) main_task: Started on CPU0
I (392) esp_psram: Reserving pool of 32K of internal memory for DMA/internal allocations
I (392) main_task: Calling app_main()
I (392) MOBILENET_V2_EXAMPLE: get into app_main
I (402) FbsLoader: The storage free size is 23808 KB
I (402) FbsLoader: The partition size is 3900 KB
I (412) dl::Model: model:main_graph, version:0

I (412) dl::Model: /features/features.0/features.0.0/Conv: Conv
I (422) dl::Model: /features/features.1/conv/conv.0/conv.0.0/Conv: Conv
I (422) dl::Model: /features/features.1/conv/conv.1/Conv: Conv
I (432) dl::Model: /features/features.2/conv/conv.0/conv.0.0/Conv: Conv
I (442) dl::Model: /features/features.2/conv/conv.1/conv.1.0/Conv: Conv
I (442) dl::Model: /features/features.2/conv/conv.2/Conv: Conv
I (452) dl::Model: /features/features.3/conv/conv.0/conv.0.0/Conv: Conv
I (452) dl::Model: /features/features.3/conv/conv.1/conv.1.0/Conv: Conv
I (462) dl::Model: /features/features.3/conv/conv.2/Conv: Conv
I (472) dl::Model: /features/features.3/Add: Add
I (472) dl::Model: /features/features.4/conv/conv.0/conv.0.0/Conv: Conv
I (482) dl::Model: /features/features.4/conv/conv.1/conv.1.0/Conv: Conv
I (482) dl::Model: /features/features.4/conv/conv.2/Conv: Conv
I (492) dl::Model: /features/features.5/conv/conv.0/conv.0.0/Conv: Conv
I (502) dl::Model: PPQ_Operation_0: RequantizeLinear
I (502) dl::Model: /features/features.5/conv/conv.1/conv.1.0/Conv: Conv
I (512) dl::Model: /features/features.5/conv/conv.2/Conv: Conv
I (512) dl::Model: /features/features.5/Add: Add
I (522) dl::Model: /features/features.6/conv/conv.0/conv.0.0/Conv: Conv
I (522) dl::Model: /features/features.6/conv/conv.1/conv.1.0/Conv: Conv
I (532) dl::Model: /features/features.6/conv/conv.2/Conv: Conv
I (532) dl::Model: /features/features.6/Add: Add
I (542) dl::Model: /features/features.7/conv/conv.0/conv.0.0/Conv: Conv
I (552) dl::Model: /features/features.7/conv/conv.1/conv.1.0/Conv: Conv
I (552) dl::Model: /features/features.7/conv/conv.2/Conv: Conv
I (562) dl::Model: /features/features.8/conv/conv.0/conv.0.0/Conv: Conv
I (562) dl::Model: /features/features.8/conv/conv.1/conv.1.0/Conv: Conv
I (572) dl::Model: /features/features.8/conv/conv.2/Conv: Conv
I (582) dl::Model: /features/features.8/Add: Add
I (582) dl::Model: /features/features.9/conv/conv.0/conv.0.0/Conv: Conv
I (592) dl::Model: /features/features.9/conv/conv.1/conv.1.0/Conv: Conv
I (592) dl::Model: /features/features.9/conv/conv.2/Conv: Conv
I (602) dl::Model: /features/features.9/Add: Add
I (602) dl::Model: /features/features.10/conv/conv.0/conv.0.0/Conv: Conv
I (612) dl::Model: /features/features.10/conv/conv.1/conv.1.0/Conv: Conv
I (622) dl::Model: /features/features.10/conv/conv.2/Conv: Conv
I (622) dl::Model: /features/features.10/Add: Add
I (622) dl::Model: /features/features.11/conv/conv.0/conv.0.0/Conv: Conv
I (632) dl::Model: /features/features.11/conv/conv.1/conv.1.0/Conv: Conv
I (642) dl::Model: /features/features.11/conv/conv.2/Conv: Conv
I (642) dl::Model: /features/features.12/conv/conv.0/conv.0.0/Conv: Conv
I (652) dl::Model: PPQ_Operation_1: RequantizeLinear
I (652) dl::Model: /features/features.12/conv/conv.1/conv.1.0/Conv: Conv
I (662) dl::Model: /features/features.12/conv/conv.2/Conv: Conv
I (672) dl::Model: /features/features.12/Add: Add
I (672) dl::Model: /features/features.13/conv/conv.0/conv.0.0/Conv: Conv
I (682) dl::Model: /features/features.13/conv/conv.1/conv.1.0/Conv: Conv
I (682) dl::Model: /features/features.13/conv/conv.2/Conv: Conv
I (692) dl::Model: /features/features.13/Add: Add
I (692) dl::Model: /features/features.14/conv/conv.0/conv.0.0/Conv: Conv
I (702) dl::Model: /features/features.14/conv/conv.1/conv.1.0/Conv: Conv
I (712) dl::Model: /features/features.14/conv/conv.2/Conv: Conv
I (722) dl::Model: /features/features.15/conv/conv.0/conv.0.0/Conv: Conv
I (732) dl::Model: PPQ_Operation_2: RequantizeLinear
I (732) dl::Model: /features/features.15/conv/conv.1/conv.1.0/Conv: Conv
I (732) dl::Model: /features/features.15/conv/conv.2/Conv: Conv
I (752) dl::Model: /features/features.15/Add: Add
I (752) dl::Model: /features/features.16/conv/conv.0/conv.0.0/Conv: Conv
I (762) dl::Model: PPQ_Operation_3: RequantizeLinear
I (762) dl::Model: /features/features.16/conv/conv.1/conv.1.0/Conv: Conv
I (762) dl::Model: /features/features.16/conv/conv.2/Conv: Conv
I (772) dl::Model: /features/features.16/Add: Add
I (772) dl::Model: /features/features.17/conv/conv.0/conv.0.0/Conv: Conv
I (782) dl::Model: /features/features.17/conv/conv.1/conv.1.0/Conv: Conv
I (782) dl::Model: /features/features.17/conv/conv.2/Conv: Conv
I (802) dl::Model: /features/features.18/features.18.0/Conv: Conv
I (832) dl::Model: /GlobalAveragePool: GlobalAveragePool
I (832) dl::Model: PPQ_Operation_4: Transpose
I (832) dl::Model: /Flatten: Flatten
I (832) dl::Model: /classifier/classifier.1/Gemm: Gemm
I (932) MemoryManagerGreedy: Maximum mermory size: 1705984

I (2252) MOBILENET_V2_EXAMPLE: infer_output, name: 466, shape: [1, 1000]
I (2262) MOBILENET_V2_EXAMPLE: output size: 1000
I (2262) MOBILENET_V2_EXAMPLE: exit app_main
I (2262) main_task: Returned from app_main()

Here is the log after changing the mobilenet_v2.espdl (called mobilenet_v2_new.espdl)

I (216) octal_psram: vendor id : 0x0d (AP)
I (216) octal_psram: dev id : 0x02 (generation 3)
I (216) octal_psram: density : 0x03 (64 Mbit)
I (218) octal_psram: good-die : 0x01 (Pass)
I (222) octal_psram: Latency : 0x01 (Fixed)
I (227) octal_psram: VCC : 0x00 (1.8V)
I (231) octal_psram: SRF : 0x01 (Fast Refresh)
I (236) octal_psram: BurstType : 0x01 (Hybrid Wrap)
I (241) octal_psram: BurstLen : 0x01 (32 Byte)
I (245) octal_psram: Readlatency : 0x02 (10 cycles@Fixed)
I (251) octal_psram: DriveStrength: 0x00 (1/1)
I (255) MSPI Timing: PSRAM timing tuning index: 5
I (259) esp_psram: Found 8MB PSRAM device
I (263) esp_psram: Speed: 80MHz
I (266) cpu_start: Multicore app
I (280) cpu_start: Pro cpu start user code
I (280) cpu_start: cpu freq: 240000000 Hz
I (281) app_init: Application information:
I (281) app_init: Project name: mobilenet_v2_example
I (286) app_init: App version: v3.0.0-46-g971abed-dirty
I (291) app_init: Compile time: Jan 28 2025 16:16:49
I (296) app_init: ELF file SHA256: 32b1bfb0e...
I (300) app_init: ESP-IDF: v5.4-dirty
I (305) efuse_init: Min chip rev: v0.0
I (308) efuse_init: Max chip rev: v0.99
I (312) efuse_init: Chip rev: v0.1
I (316) heap_init: Initializing. RAM available for dynamic allocation:
I (322) heap_init: At 3FC9CC78 len 0004CA98 (306 KiB): RAM
I (328) heap_init: At 3FCE9710 len 00005724 (21 KiB): RAM
I (333) esp_psram: Adding pool of 8192K of PSRAM memory to heap allocator
W (340) spi_flash: Octal flash chip is using but qio mode is selected, will automatically switch to Octal mode
I (349) spi_flash: detected chip: mxic (opi)
I (353) spi_flash: flash io: opi_str
W (356) spi_flash: Detected size(32768k) larger than the size in the binary image header(16384k). Using the size in the binary image header.
I (369) sleep_gpio: Configure to isolate all GPIO pins in sleep state
I (375) sleep_gpio: Enable automatic switching of GPIO sleep configuration
I (382) main_task: Started on CPU0
I (392) esp_psram: Reserving pool of 32K of internal memory for DMA/internal allocations
I (392) main_task: Calling app_main()
I (392) MOBILENET_V2_EXAMPLE: get into app_main
I (402) FbsLoader: The storage free size is 23808 KB
I (402) FbsLoader: The partition size is 3900 KB
I (412) dl::Model: model:main_graph, version:0

I (412) dl::Model: /features/features.0/features.0.0/Conv: Conv
W (422) calloc_aligned: heap_caps_aligned_calloc failed, retry with MALLOC_CAP_8BIT
E (422) calloc_aligned: Fail to malloc -1021925216 bytes from DRAM(291179 bytyes) and PSRAM(8386148 bytes), PSRAM is on.

Guru Meditation Error: Core 0 panic'ed (LoadProhibited). Exception was unhandled.

Core 0 register dump:
PC : 0x4037c5aa PS : 0x00060f30 A0 : 0x82032034 A1 : 0x3fc9f120
--- 0x4037c5aa: dl_tie728_memcpy at ??:?

A2 : 0x00000000 A3 : 0x3cb744d0 A4 : 0xc316a88c A5 : 0x3fc9f140
A6 : 0x3fc9f120 A7 : 0x0000000c A8 : 0x82096bad A9 : 0x3fc9f0b0
A10 : 0x0000007a A11 : 0x00000010 A12 : 0x3fc9f120 A13 : 0x00000000
A14 : 0x00000001 A15 : 0x00000010 SAR : 0x00000004 EXCCAUSE: 0x0000001c
EXCVADDR: 0x00000000 LBEG : 0x400556d5 LEND : 0x400556e5 LCOUNT : 0xffffffff
--- 0x400556d5: strlen in ROM
0x400556e5: strlen in ROM

For the reference, I have uploaded these files are generated from script in esp-dl\models\torch..
torch.zip

@BlueSkyB
Copy link
Collaborator

BlueSkyB commented Feb 5, 2025

The ONNX file you uploaded for the first time previously, after being quantized by ESP-PPQ, can be loaded and run normally on the ESP32-S3.
When you update ESP-PPQ, it is best to uninstall it first and then reinstall it. As I mentioned above, ESP-DL also needs to be updated to avoid anomalies caused by mismatches between the two.
The difference between the .info file in the example and the one generated by quantize_torch_model.py is normal. This is because the pre-trained models are different, and there are also some differences in the quantization methods.
Additionally, using the latest versions of ESP-PPQ and ESP-DL, I re-ran quantize_torch_model.py and tested it on the ESP32-S3, and no crash occurred. The weight shapes of each layer in your .info file are clearly abnormal and excessively large, which leads to the error and exhausts the memory of the ESP32-S3. You should check your environment again.

@manhdatbn93
Copy link
Author

Hi @BlueSkyB , Thank you so much for your time to debug this issue.

Could you let me know the version of your libraries—such as esp-ppq, torch, torchvision, and Python—you are using? Because the mobilenet_v2 example still provides an abnormal value that differs from the mobilenet_v2 even after I create a new environment and install the lib in the quantization folder.
Here is my list of library

Image

@BlueSkyB
Copy link
Collaborator

BlueSkyB commented Feb 6, 2025

I have installed quite a few packages in my environment. I will list the packages directly related to esp-ppq first:
numpy 1.24.3
onnx 1.15.0
protobuf 3.20.2
pytorch 1.13.0
torchaudio 0.13.0
torchvision 0.14.0
onnxsim 0.4.36
tqdm 4.66.2
flatbuffers 23.5.26
cryptography 43.0.0

I'm a bit suspicious that it might be due to the latest flatbuffers being incompatible with the code generated by the older version. Try downgrading the flatbuffers version and see if it can return to normal.

@manhdatbn93
Copy link
Author

Hi @BlueSkyB , It looks strange! I have updated all libraries to match your version, but I am still facing problems with the quantization tools. The output of mobilenet_v2.info shows abnormal values.

Image

What is the "PyTorch 1.13.0" library? I am using a Windows environment and I think it is torch library.

@BlueSkyB
Copy link
Collaborator

BlueSkyB commented Feb 7, 2025

Yes, your phenomenon is quite strange.
What is your Python version? I am currently using Python version 3.10.13, and there might be issues with Python 3.12 and above.

@manhdatbn93
Copy link
Author

Hi @BlueSkyB , I am using Python version 3.10.11 Do you have any suggestions for me to address this issue?

@BlueSkyB
Copy link
Collaborator

BlueSkyB commented Feb 8, 2025

Remaining points of suspicion:

  1. There may have been residual cache from the installation of esp-ppq, which affected the installation of the latest package. For instance, the pip cache or the installation method via pip install git+, where the build directory in the temporarily downloaded code was not properly cleaned up, and so on. You can directly download the esp-ppq source code through git clone https://github.com/espressif/esp-ppq.git, then proceed to install it from the root directory of the esp-ppq source code by using pip uninstall ppq followed by python setup install.

  2. Another point of suspicion is that, based on the data you exported, it seems like there might be an issue due to data type mismatch or data overflow. Could it be that your computer's CPU is in big-endian mode? Since flatbuffers is serialized in little-endian, this mismatch in modes might cause abnormal shape information? This is uncertain.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants