Transcriber

A modern, Swift-native wrapper around Apple's Speech framework and SFSpeechRecognizer that provides an actor-based interface for speech recognition with automatic silence detection and custom language model support.

Features

✨ Modern Swift concurrency with async/await
🔒 Thread-safe actor-based design
🎯 Automatic silence detection using RMS power analysis
🔊 Support for custom language models
📱 Works across iOS, macOS, and other Apple platforms
💻 SwiftUI-ready with MVVM support
🔍 Comprehensive error handling
📊 Debug logging support

Requirements

iOS 17.0+ / macOS 14.0+
Swift 5.9+
Xcode 15.0+

Installation

Swift Package Manager

Add the following to your Package.swift file:

dependencies: [
    .package(url: "https://github.com/Compiler-Inc/Transcriber.git", from: "0.1.1")
]

Or in Xcode:

File > Add Packages...
Enter https://github.com/Compiler-Inc/Transcriber.git
Select "Up to Next Major Version" with "0.1.1"

Privacy Keys

The service requires microphone and speech recognition access. Add these keys to your Info.plist:

<key>NSMicrophoneUsageDescription</key>
<string>We need microphone access to transcribe your speech.</string>
<key>NSSpeechRecognitionUsageDescription</key>
<string>We need speech recognition to convert your voice to text.</string>

Or in Xcode:

Select your project in the sidebar
Select your target
Select the "Info" tab
Add Privacy - Microphone Usage Description and Privacy - Speech Recognition Usage Description

Usage

Basic Implementation

The simplest way to use the service is with the default configuration:

func startRecording() async throws {
    // Initialize with default configuration
    let transcriber = Transcriber()
    
    // Request authorization
    let status = await transcriber.requestAuthorization()
    guard status == .authorized else {
        throw TranscriberError.notAuthorized
    }
    
    // Start recording and receive transcriptions
    let stream = try await transcriber.startStream()
    for try await transcription in stream {
        print("Transcribed text: \(transcription)")
    }
}

Configuration Options

The service is highly configurable through defining your own TranscriberConfiguration.

    let myConfig = TranscriberConfiguration(
        appIdentifier: "com.myapp.speech",
        locale: .current,                       // Recognition language
        silenceThreshold: 0.01,                 // RMS power threshold (0.0 to 1.0)
        silenceDuration: 2,                     // Duration of silence before stopping
        languageModelInfo: nil,                 // For domain-specific recognition
        requiresOnDeviceRecognition: false,     // Force on-device processing
        shouldReportPartialResults: true,       // Get results as they're processed
        contextualStrings: ["Custom", "Words"], // Improve recognition of specific terms
        taskHint: .unspecified,                 // Optimize for specific speech types
        addsPunctuation: true                   // Automatic punctuation
    )

Using in SwiftUI

For SwiftUI applications, we provide a protocol-based MVVM pattern:

// 1. Create your view model
@Observable
@MainActor
class MyViewModel: Transcribable {
    public var isRecording = false
    public var transcribedText = ""
    public var rmsLevel: Float = 0
    public var authStatus: SFSpeechRecognizerAuthorizationStatus = .notDetermined
    public var error: Error?
    
    public let transcriber: Transcriber?
    private var recordingTask: Task<Void, Never>?
    
    init() {
        self.transcriber = Transcriber()
    }
    
    // Required protocol methods
    public func requestAuthorization() async throws {
        guard let transcriber else {
            throw TrannscriberError.noRecognizer
        }
        authStatus = await transcriber.requestAuthorization()
        guard authStatus == .authorized else {
            throw TranscriberError.notAuthorized
        }
    }
    
    public func toggleRecording() {
        guard let transcriber else {
            error = TranscriberError.noRecognizer
            return
        }
        
        if isRecording {
            recordingTask?.cancel()
            recordingTask = nil
            isRecording = false
        } else {
            recordingTask = Task {
                do {
                    isRecording = true
                    let stream = try await transcriber.startRecordingStream()
                    
                    for try await signal in stream {
                        switch signal {
                            case .rms(let float):
                                rmsLevel = float
                            case .transcription(let string):
                                transcribedText = string
                        }
                    }
                    
                    isRecording = false
                } catch {
                    self.error = error
                    isRecording = false
                }
            }
        }
    }
}
// 2. Use in your SwiftUI view
struct MySpeechView: View {
    @State private var viewModel = MyViewModel()
    
    var body: some View {
        VStack {
            Text(viewModel.transcribedText)
            Button(viewModel.isRecording ? "Stop" : "Start") {
                viewModel.toggleRecording()
            }
            .disabled(viewModel.authStatus != .authorized)
            
            if let error = viewModel.error {
                Text(error.localizedDescription)
                    .foregroundColor(.red)
            }
        }
        .task {
            try? await viewModel.requestAuthorization()
        }
    }
}

Advanced Features

Debug Logging

Enable detailed logging for debugging:

let transcriber = Transcriber(debugLogging: true)

Custom Language Models

Support for custom language models with version tracking:

let model = LanguageModelInfo(url: modelURL,version: "2.0-beta")
let config = TranscriberConfiguration(languageModelInfo: model)

You can easily build SFCustomLanguageModelData models with our SpeechModelBuilder CLI Tool

Silence Detection

Automatic silence detection using RMS power analysis with configurable threshold and duration:

struct SensitiveConfig: TranscriberConfiguration {
    var silenceThreshold: Float = 0.001  // Very sensitive
    var silenceDuration: TimeInterval = 2.0  // Longer confirmation
    // ... other properties
}

License

This project is licensed under the MIT License

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
Demo		Demo
Sources/Transcriber		Sources/Transcriber
.gitignore		.gitignore
Package.swift		Package.swift
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Transcriber

Features

Requirements

Installation

Swift Package Manager

Privacy Keys

Usage

Basic Implementation

Configuration Options

Using in SwiftUI

Advanced Features

Debug Logging

Custom Language Models

Silence Detection

License

Contributing

About

Releases 2

Packages

Contributors 2

Languages

Compiler-Inc/Transcriber

Folders and files

Latest commit

History

Repository files navigation

Transcriber

Features

Requirements

Installation

Swift Package Manager

Privacy Keys

Usage

Basic Implementation

Configuration Options

Using in SwiftUI

Advanced Features

Debug Logging

Custom Language Models

Silence Detection

License

Contributing

About

Topics

Resources

Stars

Watchers

Forks

Releases 2

Packages 0

Contributors 2

Languages

Packages