A modern, Swift-native wrapper around Apple's Speech
framework and SFSpeechRecognizer
that provides an actor-based interface for speech recognition with automatic silence detection and custom language model support.
- β¨ Modern Swift concurrency with async/await
- π Thread-safe actor-based design
- π― Automatic silence detection using RMS power analysis
- π Support for custom language models
- π± Works across iOS, macOS, and other Apple platforms
- π» SwiftUI-ready with MVVM support
- π Comprehensive error handling
- π Debug logging support
- iOS 17.0+ / macOS 14.0+
- Swift 5.9+
- Xcode 15.0+
Add the following to your Package.swift
file:
dependencies: [
.package(url: "https://github.com/Compiler-Inc/Transcriber.git", from: "0.1.1")
]
Or in Xcode:
- File > Add Packages...
- Enter
https://github.com/Compiler-Inc/Transcriber.git
- Select "Up to Next Major Version" with "0.1.1"
The service requires microphone and speech recognition access. Add these keys to your Info.plist
:
<key>NSMicrophoneUsageDescription</key>
<string>We need microphone access to transcribe your speech.</string>
<key>NSSpeechRecognitionUsageDescription</key>
<string>We need speech recognition to convert your voice to text.</string>
Or in Xcode:
- Select your project in the sidebar
- Select your target
- Select the "Info" tab
- Add
Privacy - Microphone Usage Description
andPrivacy - Speech Recognition Usage Description
The simplest way to use the service is with the default configuration:
func startRecording() async throws {
// Initialize with default configuration
let transcriber = Transcriber()
// Request authorization
let status = await transcriber.requestAuthorization()
guard status == .authorized else {
throw TranscriberError.notAuthorized
}
// Start recording and receive transcriptions
let stream = try await transcriber.startStream()
for try await transcription in stream {
print("Transcribed text: \(transcription)")
}
}
The service is highly configurable through defining your own TranscriberConfiguration
.
let myConfig = TranscriberConfiguration(
appIdentifier: "com.myapp.speech",
locale: .current, // Recognition language
silenceThreshold: 0.01, // RMS power threshold (0.0 to 1.0)
silenceDuration: 2, // Duration of silence before stopping
languageModelInfo: nil, // For domain-specific recognition
requiresOnDeviceRecognition: false, // Force on-device processing
shouldReportPartialResults: true, // Get results as they're processed
contextualStrings: ["Custom", "Words"], // Improve recognition of specific terms
taskHint: .unspecified, // Optimize for specific speech types
addsPunctuation: true // Automatic punctuation
)
For SwiftUI applications, we provide a protocol-based MVVM pattern:
// 1. Create your view model
@Observable
@MainActor
class MyViewModel: Transcribable {
public var isRecording = false
public var transcribedText = ""
public var rmsLevel: Float = 0
public var authStatus: SFSpeechRecognizerAuthorizationStatus = .notDetermined
public var error: Error?
public let transcriber: Transcriber?
private var recordingTask: Task<Void, Never>?
init() {
self.transcriber = Transcriber()
}
// Required protocol methods
public func requestAuthorization() async throws {
guard let transcriber else {
throw TrannscriberError.noRecognizer
}
authStatus = await transcriber.requestAuthorization()
guard authStatus == .authorized else {
throw TranscriberError.notAuthorized
}
}
public func toggleRecording() {
guard let transcriber else {
error = TranscriberError.noRecognizer
return
}
if isRecording {
recordingTask?.cancel()
recordingTask = nil
isRecording = false
} else {
recordingTask = Task {
do {
isRecording = true
let stream = try await transcriber.startRecordingStream()
for try await signal in stream {
switch signal {
case .rms(let float):
rmsLevel = float
case .transcription(let string):
transcribedText = string
}
}
isRecording = false
} catch {
self.error = error
isRecording = false
}
}
}
}
}
// 2. Use in your SwiftUI view
struct MySpeechView: View {
@State private var viewModel = MyViewModel()
var body: some View {
VStack {
Text(viewModel.transcribedText)
Button(viewModel.isRecording ? "Stop" : "Start") {
viewModel.toggleRecording()
}
.disabled(viewModel.authStatus != .authorized)
if let error = viewModel.error {
Text(error.localizedDescription)
.foregroundColor(.red)
}
}
.task {
try? await viewModel.requestAuthorization()
}
}
}
Enable detailed logging for debugging:
let transcriber = Transcriber(debugLogging: true)
Support for custom language models with version tracking:
let model = LanguageModelInfo(url: modelURL,version: "2.0-beta")
let config = TranscriberConfiguration(languageModelInfo: model)
You can easily build SFCustomLanguageModelData
models with our SpeechModelBuilder CLI Tool
Automatic silence detection using RMS power analysis with configurable threshold and duration:
struct SensitiveConfig: TranscriberConfiguration {
var silenceThreshold: Float = 0.001 // Very sensitive
var silenceDuration: TimeInterval = 2.0 // Longer confirmation
// ... other properties
}
This project is licensed under the MIT License
Contributions are welcome! Please feel free to submit a Pull Request.