Note: This project is currently under development and this README will be periodcally updated.
This repository aims to refactor and simplify the SwiftUI example provided by k2-fsa/sherpa-onnx, specifically focusing on Speech Diarization.
I wrote a companion article breaking down how and why I built this project.
Additionally, I recently created an algorithm for Active Speaker Detection using this project as a base.
Before building this project, ensure the required frameworks are in place:
onnxruntime
is too large to be included directly. You must download it manually.Sherpa-Onnx.xcframework
must also be built and added to your project. See Building from Sherpa Onnx.
Without these, building the project will fail.
Note: After setup, test the app using the File Picker to load an audio file. Alternatively, hardcode a file path in
ContentView
(line 18) for testing.
Download the onnxruntime
framework:
onnxruntime.xcframework-1.17.1.tar.bz2
Steps:
- Extract the archive.
- Copy
onnxruntime.xcframework
into your Xcode project directory.
To build Sherpa-Onnx.xcframework
, follow these steps:
Visit this link for more detailed build instructions.
-
Clone the reposity
git clone https://github.com/k2-fsa/sherpa-onnx
-
Enter the repo directory
cd sherpa-onnx
-
Run the ios build script with
./build-ios.sh
-
After the script completes, a
build-ios
folder will be created. -
Copy
sherpa-onnx.xcframework
from build-ios into your Xcode project. -
You’ll also find
onnxruntime.xcframework
in:ios-onnxruntime/1.17.1/onnxruntime.xcframework
This is the same xcframework from the previous section

The App requires you to select an Audio/Video file via File Picker. Alternatively, you can change line 18
in ContentView
to hardcode a file in your bundle for testing.
It then converts it to a format that the speech diarization model accepts
Afterwards, run the model and the results will eventually replace the placehodler text
Screen.Recording.2025-04-11.at.8.55.42.PM.mov
Contributions and suggestions are welcome as the project is actively evolving.
Updates and additional documentation will be provided as development progresses.