Welcome to the Sony Spresense TFLite Deployment Guide! This repository provides a comprehensive, step-by-step guide to deploying TensorFlow Lite (TFLite) models on the Sony Spresense board. Whether you're setting up your development environment or deploying your first model, this guide will help you navigate the process with ease, referencing official documentation where necessary.
Credits: Special thanks to @YoshinoTaro for developing the Spresense Arduino Package for TensorFlow.
- Introduction
- Hardware Overview
- Setting Up the Development Environment
- Installing and Setting Up Spresense SDK
- Setting Up Arduino IDE for TensorFlow Lite
- Deploying TensorFlow Lite Models on Spresense
- Flashing the Model and Running Inference
- Notes on Optimization and Memory Usage
- References
- Citation and Reaching Out
This repository serves as a Getting Started Guide for deploying TensorFlow Lite models on the Sony Spresense board. The primary focus is on deployment, but it also covers setting up the user's computer and ensuring proper usage of the board. The guide leverages TFLite for Microcontrollers (TFLite Micro) and utilizes C++ to run inference on the device using the TFLite Micro library and process the results.
- Processor: 6-Core ARM Cortex-M4F
- Flash Memory: 8 MB
- SRAM: 1.5 MB
- Expansion: Supports external microSD cards via the optional extension board
- Used for runtime data, including variables, stack, heap, and dynamically allocated memory.
- During inference, weights and activations are stored in SRAM temporarily.
- Limited capacity requires careful optimization of model size to prevent memory overflow.
- Used for storing the bootloader, firmware, applications, and other persistent data.
- Model weights are stored in flash memory before being loaded into SRAM for inference.
- Effective utilization of this space is crucial for deploying larger models within the board's constraints.
- Supports storage of auxiliary data, such as images, CSV files, or other non-model-related content.
- Cannot be used to store model weights or binaries for inference.
Understanding the hardware specifications is crucial for optimizing model deployment, especially regarding memory usage. The Spresense board's flash memory and RAM limitations will dictate the size and complexity of the models you can deploy.
SDK Compatibility: The Spresense Arduino Package for TensorFlow is based on SDK v2.5.0, ensuring compatibility. Later SDK versions are not supported with this package.
- Operating System: macOS (steps provided; refer to official documentation for other OS)
- Python: Recommended version 3.10.14
- Package Manager: Conda or similar for virtual environments
- Xcode Tools: Required for macOS
-
Install USB-to-Serial Drivers
- macOS:
- Installation:
- Users must install the drivers before connecting the Spresense board to the PC.
-
Install Python and Create Virtual Environment
# Install Python (if not already installed) # Recommended version: Python 3.10.14 # Create a virtual environment using Conda conda create -n spresense_env python=3.10.14 conda activate spresense_env
-
Install Xcode Tools (macOS Only) Open Terminal and run:
xcode-select --install
-
Install Required Tools
-
Install Homebrew (if not already installed):
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install.sh)"
-
Install wget:
brew install wget
-
Install flock (necessary for SDK v3.2.0 or later):
brew tap discoteq/discoteq brew install flock
-
-
Install Development Tools
curl -L https://raw.githubusercontent.com/sonydevworld/spresense/master/install-tools.sh > install-tools.sh bash install-tools.sh
Note for macOS Users: Ensure you are using
bash
instead ofzsh
for the shell during installation.-
Switch to bash:
exec bash
-
Activate Installed Tools:
source ~/.bash_profile # For bash source ~/spresenseenv/setup
-
Before cloning the SDK, download the required firmware binaries:
- Firmware v2.4.0 Download: Download Spresense Firmware v2.4.0
-
Clone the Repository with Submodules:
git clone --recursive https://github.com/sonydevworld/spresense.git cd spresense
-
Checkout Specific SDK Version:
git checkout v2.5.0 git submodule init git submodule update
- SDK Version: 2.5.0
- Bootloader Version: v2.4.0
Ensure the correct paths are set for SPRESENSE_SDK
and SPRESENSE_HOME
. Add the following lines to your ~/.bash_profile
(or equivalent for your shell):
source ~/spresenseenv/setup
export SPRESENSE_SDK=/Users/<your-username>/spresense
export SPRESENSE_HOME=/Users/<your-username>/myapps
Replace <your-username>
with your actual username.
Then activate the changes:
source ~/.bash_profile
-
Flash the Bootloader (v2.4.0):
./tools/flash.sh -e /path/to/spresense-binaries-v2.4.0.zip ./tools/flash.sh -l /path/to/spresense/firmware/spresense -c /dev/cu.SLAB_USBtoUART
Note: Replace
/path/to/
with the actual path to your downloaded binaries.
-
Navigate to SDK Directory and Source Setup Scripts:
cd spresense/sdk source ~/spresenseenv/setup source tools/build-env.sh
You should see output similar to:
======================================= SDK_VERSION = SDK2.5.0 NUTTX_VERSION = 10.2.0 SPRESENSE_SDK = /Users/<your-username>/spresense SPRESENSE_HOME = /Users/<your-username>/myapps GCC_VERSION = arm-none-eabi-gcc (GNU Arm Embedded Toolchain 10.3-2021.10) 10.3.1 20210824 (release) HOST = Darwin x86_64 =======================================
Ensure that
SPRESENSE_SDK
andSPRESENSE_HOME
paths are correctly set.-
Directory Structure:
-
Your
spresense
directory should contain thesdk
folder, along withspresense_env.sh
,install-tools.sh
, and other files likeexamples
andnuttx
. -
Example:
/Users/<your-username>/spresense/ ├── sdk/ ├── spresense_env.sh ├── install-tools.sh ├── examples/ └── nuttx/
-
-
Application Root:
- Refer to the Spresense SDK Documentation for setting the application root, typically at
/home/user/myapps
.
- Refer to the Spresense SDK Documentation for setting the application root, typically at
-
Set Initial Configuration:
cd spresense/sdk tools/config.py examples/hello
-
Handling Bootloader Warnings:
If you encounter a message like:WARNING: New loader vX.Y.Z is required, please download and install.
- Solution: Flash the new bootloader.
./tools/flash.sh -e /path/to/spresense-binaries-v2.4.0.zip ./tools/flash.sh -l /path/to/spresense/firmware/spresense -c /dev/cu.SLAB_USBtoUART
-
Build the Example Image:
make
After running the
make
command, anuttx.spk
file will be created in thesdk
folder. This file is the final build artifact that can be flashed onto the Spresense board. -
Flash the Example to the Board:
Use the
tools/flash.sh
script to flash thenuttx.spk
file onto the board.tools/flash.sh -c /dev/cu.SLAB_USBtoUART nuttx.spk
Note: Replace
/dev/cu.SLAB_USBtoUART
with your board's serial port if it differs. -
Reboot and Verify:
After flashing, the Spresense board will reboot automatically.
-
Identify the Serial Port:
On macOS, the serial port for Spresense is typically/dev/cu.SLAB_USBtoUART
. To confirm the correct port, run:ls /dev/{tty,cu}.*
Look for
/dev/cu.SLAB_USBtoUART
in the output. -
Install
screen
(if not already installed):brew install screen
-
Connect to the Serial Port:
Open a serial monitor with a baud rate of 115200:
screen /dev/cu.SLAB_USBtoUART 115200
-
Run the Hello World Example:
At the
nsh>
prompt, type:nsh> hello
You should see:
Hello, World!!
-
Exit
screen
:- Press
Ctrl + A
, thenK
to kill the session. - Confirm by typing
Y
.
- Press
Once you see the Hello, World!!
output, your environment is correctly set up, and the SDK configuration is complete.
- Download: Arduino IDE
- Install the IDE appropriate for your operating system.
- Ensure the CP210x USB-to-Serial driver is installed as per the Setting Up the Development Environment section.
-
Open Arduino IDE and Navigate to Preferences:
- Path:
Arduino IDE > Preferences
- Path:
-
Add Additional Boards Manager URL:
https://raw.githubusercontent.com/YoshinoTaro/spresense-arduino-tensorflow/main/package_spresense_tensorflow_index.json
-
Open Board Manager:
- Path:
Tools > Board > Boards Manager
- Path:
-
Search and Install "Spresense Tensorflow Board"
- Path:
Tools > Board > Spresense Tensorflow
- Refer to YoshinoTaro's GitHub Repository for example sketches.
- Successful installation is confirmed when you see a sine curve on the plotter.
Convert your trained TensorFlow model to TFLite format. Refer to Post-Training Integer Quantization for detailed instructions.
import tensorflow as tf
converter = tf.lite.TFLiteConverter.from_keras_model(model)
tflite_model = converter.convert()
# Save the model to disk
with open('model.tflite', 'wb') as f:
f.write(tflite_model)
converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
tflite_model_quant = converter.convert()
with open('model_dynamic_quant.tflite', 'wb') as f:
f.write(tflite_model_quant)
Representative Dataset Example:
train_generator_qat = train_datagen_qat.flow_from_dataframe(
dataframe=train_data_qat,
x_col='file_path',
y_col='label',
target_size=(img_width, img_height),
batch_size=batch_size,
class_mode='categorical',
shuffle=True,
seed=42
)
def representative_data_gen():
# Number of samples you want to use for calibration
num_samples = 100
count = 0
for input_value, _ in train_generator_qat:
yield [input_value]
count += 1
if count >= num_samples:
break
Conversion Script:
converter = tf.lite.TFLiteConverter.from_keras_model(qat_model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.int8
converter.inference_output_type = tf.int8
converter.representative_dataset = representative_data_gen
# Convert and save the model
tflite_model = converter.convert()
with open("model_int8.tflite", "wb") as f:
f.write(tflite_model)
import tensorflow as tf
# Load the TensorFlow Lite model
interpreter = tf.lite.Interpreter(model_path="model_int8.tflite")
interpreter.allocate_tensors()
# Get input and output details
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
# Display Input Details
print("Input Details with Quantization Parameters:")
for input_detail in input_details:
print(f" Name: {input_detail['name']}")
print(f" Shape: {input_detail['shape']}")
print(f" Data Type: {input_detail['dtype']}")
print(f" Quantization Parameters: {input_detail['quantization']}")
print(f" Quantization Scale: {input_detail['quantization_parameters']['scales']}")
print(f" Quantization Zero Points: {input_detail['quantization_parameters']['zero_points']}\n")
# Display Output Details
print("\nOutput Details with Quantization Parameters:")
for output_detail in output_details:
print(f" Name: {output_detail['name']}")
print(f" Shape: {output_detail['shape']}")
print(f" Data Type: {output_detail['dtype']}")
print(f" Quantization Parameters: {output_detail['quantization']}")
print(f" Quantization Scale: {output_detail['quantization_parameters']['scales']}")
print(f" Quantization Zero Points: {output_detail['quantization_parameters']['zero_points']}\n")
Example Output:
Input Details with Quantization Parameters:
Name: serving_default_input_7:0
Shape: [1, 128, 128, 3]
Data Type: <class 'numpy.int8'>
Quantization Parameters: (0.007843137718737125, 0)
Quantization Scale: [0.00784314]
Quantization Zero Points: [0]
Output Details with Quantization Parameters:
Name: StatefulPartitionedCall:0
Shape: [1, 3]
Data Type: <class 'numpy.int8'>
Quantization Parameters: (0.00390625, -128)
Quantization Scale: [0.00390625]
Quantization Zero Points: [-128]
Model Optimization Notes:
- The Sony Spresense board features an ARM Cortex-M4F, which accelerates INT8 models using the CMSIS-NN library.
- Recommendation: Use INT8 models for optimal speed.
- Constraints: The board does not support mixed precision or mixed data types. Use either full FLOAT32 or full INT8 models.
- Alternative: You can set
converter.inference_output_type = tf.float32
to retain full precision at the output while keeping activations and weights as 8-bit.
- Alternative: You can set
Conversion Script:
import binascii
def convert_to_c_array(bytes_data) -> str:
hexstr = binascii.hexlify(bytes_data).decode("UTF-8").upper()
array = ["0x" + hexstr[i:i + 2] for i in range(0, len(hexstr), 2)]
array = [array[i:i+10] for i in range(0, len(array), 10)]
return ",\n ".join([", ".join(e) for e in array])
# Read the TFLite model
with open('model_int8.tflite', 'rb') as f:
tflite_binary = f.read()
ascii_bytes = convert_to_c_array(tflite_binary)
header_file_content = (
"const unsigned char model_tflite[] = {\n "
+ ascii_bytes
+ "\n};\nunsigned int model_tflite_len = "
+ str(len(tflite_binary)) + ";"
)
# Write to .h file
with open("model.h", "w") as f:
f.write(header_file_content)
Ensure: The .h
file size does not exceed 2.1 MB.
-
Create a New Arduino Sketch:
- Save the sketch in a dedicated folder.
-
Add the
.h
File:- Place
model.h
in the same folder as the.ino
file.
- Place
-
Include the Model in Your Sketch:
#include "model.h"
This example sketch is based on the TensorFlow Lite for Microcontrollers Get Started guide.
// TensorFlow Lite Micro includes
#include "tensorflow/lite/micro/all_ops_resolver.h"
#include "tensorflow/lite/micro/micro_error_reporter.h"
#include "tensorflow/lite/micro/micro_interpreter.h"
#include "tensorflow/lite/micro/system_setup.h"
#include "tensorflow/lite/schema/schema_generated.h"
#include "model.h" // Your model file after conversion
// Globals for TensorFlow Lite Micro
namespace {
tflite::ErrorReporter* error_reporter = nullptr;
const tflite::Model* model = nullptr;
tflite::MicroInterpreter* interpreter = nullptr;
TfLiteTensor* input = nullptr;
TfLiteTensor* output = nullptr;
int inference_count = 0;
// Define the size of the tensor arena (adjust as needed)
constexpr int kTensorArenaSize = 300 * 1024;
uint8_t tensor_arena[kTensorArenaSize];
}
// Function to set up the TensorFlow Lite Micro model
void setupModel() {
// Initialize TensorFlow Lite Micro target
tflite::InitializeTarget();
// Initialize the tensor arena to zero
memset(tensor_arena, 0, sizeof(tensor_arena));
// Set up the error reporter
static tflite::MicroErrorReporter micro_error_reporter;
error_reporter = µ_error_reporter;
// Load the TensorFlow Lite model
model = tflite::GetModel(model_tflite);
if (model->version() != TFLITE_SCHEMA_VERSION) {
Serial.println("Model schema version mismatch.");
while (1); // Halt if version mismatch
}
// Create an operator resolver
static tflite::AllOpsResolver resolver;
// Create the interpreter
static tflite::MicroInterpreter static_interpreter(
model, resolver, tensor_arena, kTensorArenaSize, error_reporter);
interpreter = &static_interpreter;
// Allocate memory for the tensors
if (interpreter->AllocateTensors() != kTfLiteOk) {
Serial.println("AllocateTensors() failed.");
while (1); // Halt if allocation fails
}
// Obtain pointers to the model's input and output tensors
input = interpreter->input(0);
output = interpreter->output(0);
// Get information about the memory area to use for the model's input
Serial.println("Model input:");
Serial.println("Number of dimensions: " + String(input->dims->size));
for (int n = 0; n < input->dims->size; ++n)
{
Serial.println("dims->data[" + String(n) + "]: " + String(input->dims->data[n]));
}
Serial.print("Input type: ");
Serial.println(input->type);
Serial.println("\nModel output:");
Serial.println("dims->size: " + String(output->dims->size));
for (int n = 0; n < output->dims->size; ++n)
{
Serial.println("dims->data[" + String(n) + "]: " + String(output->dims->data[n]));
}
Serial.print("Output type: ");
Serial.println(output->type);
Serial.println("Completed TensorFlow setup");
Serial.println();
}
void setup() {
// Initialize serial communication
Serial.begin(115200);
while (!Serial) { /* Wait for Serial to initialize */ }
// Set up the TensorFlow Lite Micro model
setupModel();
}
void loop() {
// Placeholder for your main code
// You can add inference calls or other logic here
// For example, a simple delay to prevent the loop from running too fast
delay(1000);
}
- Memory Allocation: Adjust
kTensorArenaSize
based on your model's memory requirements. Monitor memory usage withinterpreter->arena_used_bytes()
.
Refer to the Spresense Arduino Developer Guide to select appropriate memory configurations.
-
Default Memory Configuration:
- MainCore: 768 KB (first six tiles)
- Shared Memory: 768 KB (remaining six tiles) for SubCore, Audio DSP, and other libraries
-
Adjusting Memory:
- Increase MainCore Memory: Up to 1.5 MB (1536 KB) if not using SubCore or Audio DSP.
- Reduce MainCore Memory: Allocate more memory to SubCore and Audio DSP based on your use case.
Memory Configuration in Arduino IDE:
- Path:
Arduino IDE > Tools > Memory Configuration
- Adjust the memory sizes according to your application's needs.
- Open Arduino IDE.
- Select the Spresense Tensorflow Board:
Tools > Board > Spresense Tensorflow
- Compile the Sketch:
- Click the Verify button.
- Upload the Sketch:
- Click the Upload button.
-
Open Serial Monitor:
Tools > Serial Monitor
- Set baud rate to 115200
-
Verify Inference Results:
-
You should see inference outputs similar to:
AllocateTensors() Success Arena used bytes: 15000 Model input: Number of dimensions: 4 dims->data[0]: 1 dims->data[1]: 28 dims->data[2]: 28 dims->data[3]: 1 Input type: 1 Model output: Number of dimensions: 2 dims->data[0]: 1 dims->data[1]: 10 Output type: 1 Completed TensorFlow setup ...
-
The
input->type
andoutput->type
fields represent the data types of the tensors used by the TensorFlow Lite model. Below is a list of the possible data types and their corresponding integer values:KTfLiteNoType
(0): Undefined or unspecified type.kTfLiteFloat32
(1): 32-bit floating-point numbers.kTfLiteInt32
(2): 32-bit signed integers.kTfLiteUInt8
(3): 8-bit unsigned integers.KTfLiteInt64
(4): 64-bit signed integers.KTfLiteString
(5): String data type.KTfLiteBool
(6): Boolean values (true
orfalse
).KTfLiteInt16
(7): 16-bit signed integers.KTfLiteComplex64
(8): Complex numbers with 64-bit precision.KTfLiteInt8
(9): 8-bit signed integers.kTfLiteFloat16
(10): 16-bit floating-point numbers.
-
-
Performance Metrics:
- Refer to repository files for scripts that run multiple inferences (e.g., 1000 inferences) to calculate average inference time and frames per second (FPS).
Exiting Serial Monitor:
- To stop the serial monitor, simply close the Serial Monitor window in Arduino IDE.
-
Use INT8 Models:
- The CMSIS-NN library accelerates INT8 models for the ARM Cortex-M4F processor.
- Recommendation: Use INT8 models for optimal speed.
- Constraints: The board does not support mixed precision or mixed data types. Use either full FLOAT32 or full INT8 models.
- Alternative: You can set
converter.inference_output_type = tf.float32
to retain full precision at the output while keeping activations and weights as 8-bit.
- Alternative: You can set
-
Adjust Memory Configuration:
- MainCore Default Memory: 768 KB
- Maximum Memory for MainCore: 1.5 MB (if SubCore and Audio DSP are disabled).
-
Model Size:
- Ensure the
.h
file size does not exceed 2.1 MB.
- Ensure the
- Sony Spresense Official Documentation
- Spresense SDK Getting Started Guide (CLI)
- Spresense Arduino Developer Guide
- YoshinoTaro's Spresense Arduino TensorFlow Repository
- TensorFlow Lite for Microcontrollers
- CMSIS-NN Library
- Post-Training Integer Quantization
- Download Spresense Firmware v2.4.0
If you use this repository or its contents in your work, please cite this paper:
If you have any questions, please feel free to reach out to me through email (b00090279@alumni.aus.edu) or by connecting with me on LinkedIn.
Thank you for using this guide! I hope it helps you successfully deploy your TensorFlow Lite models on the Sony Spresense board.