Skip to content

Template Project For iOS Apps using .onnx Speech Models for Speech Diarization

Notifications You must be signed in to change notification settings

carlosmbe/SpeechDiarizationStarter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 

Repository files navigation

SwiftUI Speech Diarization Example

Note: This project is currently under development and this README will be periodcally updated.

Project Overview

This repository aims to refactor and simplify the SwiftUI example provided by k2-fsa/sherpa-onnx, specifically focusing on Speech Diarization.

I wrote a companion article breaking down how and why I built this project.

Additionally, I recently created an algorithm for Active Speaker Detection using this project as a base.

Getting Started

1. Required Frameworks

Before building this project, ensure the required frameworks are in place:

Without these, building the project will fail.

Note: After setup, test the app using the File Picker to load an audio file. Alternatively, hardcode a file path in ContentView (line 18) for testing.


Download Required Framework

Download the onnxruntime framework:

onnxruntime.xcframework-1.17.1.tar.bz2

Steps:

  1. Extract the archive.
  2. Copy onnxruntime.xcframework into your Xcode project directory.

Building from Sherpa Onnx

To build Sherpa-Onnx.xcframework, follow these steps:

Visit this link for more detailed build instructions.

Summary of Build Steps

  1. Clone the reposity

     git clone https://github.com/k2-fsa/sherpa-onnx 
  2. Enter the repo directory

    cd sherpa-onnx
    
  3. Run the ios build script with

    ./build-ios.sh
    
  4. After the script completes, a build-ios folder will be created.

  5. Copy sherpa-onnx.xcframework from build-ios into your Xcode project.

  6. You’ll also find onnxruntime.xcframework in:

    ios-onnxruntime/1.17.1/onnxruntime.xcframework

This is the same xcframework from the previous section

Screenshot of files to copy

The Actual App

The App requires you to select an Audio/Video file via File Picker. Alternatively, you can change line 18 in ContentView to hardcode a file in your bundle for testing.

It then converts it to a format that the speech diarization model accepts

Afterwards, run the model and the results will eventually replace the placehodler text

Screen.Recording.2025-04-11.at.8.55.42.PM.mov

Contributing

Contributions and suggestions are welcome as the project is actively evolving.


Updates and additional documentation will be provided as development progresses.

About

Template Project For iOS Apps using .onnx Speech Models for Speech Diarization

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published