speechtotextkit

Simplifies speech-to-text integration with a unified API across platforms. Offers reactive API, Jetpack Compose compatibility, error handling, and minimal setup for seamless application integration.

#web
#macos
#desktop

Suggest an edit

Android JVMJVMKotlin/NativeWasm

GitHub stars19

Authorseslamwael74

Dependents0

OSS Health—

LicenseApache License 2.0

Creation dateabout 1 year ago

Last activityabout 1 year ago

Latest release1.0.0 (about 1 year ago)

GitHub repository Wiki page

🎙️ SpeechToTextKit

SpeechToTextKit is a Kotlin Multiplatform library that provides a simple and unified API for speech-to-text functionality across multiple platforms: Android, iOS, Desktop (JVM), and Web (Wasm).

📋 Current Features

Cross-Platform Support: Works on Android and iOS
Reactive API: Receive speech recognition results as a Flow
Compose Integration: Easy to use with Jetpack Compose via rememberSpeechToText()
Seamless Integration: Integrates easily with existing KMP applications
State Callbacks: Monitor recognition state changes through Flow
Low Friction Setup: Minimal dependencies and configuration required
Error Handling: Detailed error reporting through the result API

📸 Screenshots

🚀 Installation

The library is available on Maven Central via Sonatype.

Add the following to your module's build.gradle.kts:

dependencies {
    // Core library
    implementation("io.github.eslamwael74.speechtotextkit:speechToText:1.0.0")
    
    // Optional: Compose UI components
    implementation("io.github.eslamwael74.speechtotextcompose:speechToTextCompose:1.0.0")
}

📱 Usage

There are currently two ways to use this library:

1. In Jetpack Compose

import androidx.compose.material.Button
import androidx.compose.material.Text
import androidx.compose.runtime.*
import io.github.eslamwael74.speechtotextcompose.rememberSpeechToText

@Composable
fun SpeechRecognitionScreen() {
    var recognizedText by remember { mutableStateOf("") }
    val speechRecognizer = rememberSpeechToText()
    var isListening by remember { mutableStateOf(false) }
    
    LaunchedEffect(Unit) {
        speechRecognizer.results.collect { result ->
            recognizedText = result.text
        }
    }
    
    Column(modifier = Modifier.fillMaxSize().padding(16.dp)) {
        Text(
            text = recognizedText.ifEmpty { "Tap the button and speak" },
            modifier = Modifier.weight(1f)
        )
        
        Button(onClick = {
            if (isListening) {
                // Stop listening
                speechRecognizer.stopListening()
                isListening = false
            } else {
                // Start listening
                speechRecognizer.startListening()
                isListening = true
            }
        }) {
            Text(if (isListening) "Stop Listening" else "Start Listening")
        }
    }
}

2. In a ViewModel

import com.eslamwael74.speechtotext.SpeechRecognizer
import com.eslamwael74.speechtotext.SpeechRecognizerFactory
import kotlinx.coroutines.flow.launchIn
import kotlinx.coroutines.flow.onEach

// Using dependency injection
class YourViewModel(
    private val speechRecognizer: SpeechRecognizer
) {
    init {
        // Listen for speech recognition results
        speechRecognizer.results.onEach { result ->
            // Handle result
            println("Recognized text: ${result.text}")
        }.launchIn(viewModelScope)
        
        // Monitor state changes
        speechRecognizer.state.onEach { state ->
            // Handle state changes
            println("Recognition state: $state")
        }.launchIn(viewModelScope)
    }
    
    fun startListening() {
        viewModelScope.launch {
            speechRecognizer.startListening()
        }
    }
    
    fun stopListening() {
        viewModelScope.launch {
            speechRecognizer.stopListening()
        }
    }
    
    fun cleanup() {
        speechRecognizer.destroy()
    }
}

// Example of factory/provider to create the SpeechRecognizer
class SpeechRecognizerProvider(
    private val applicationContext: Context
) {
    fun provideSpeechRecognizer(): SpeechRecognizer {
        return SpeechRecognizerFactory(applicationContext).createSpeechRecognizer()
    }
}

// Usage with Manual DI
class YourActivity : AppCompatActivity() {
    private val speechRecognizerProvider by lazy {
        SpeechRecognizerProvider(applicationContext)
    }
    
    private val viewModel by viewModels {
        viewModelFactory { 
            YourViewModel(speechRecognizerProvider.provideSpeechRecognizer())
        }
    }
}

// Or with Hilt/Dagger
@Module
@InstallIn(SingletonComponent::class)
object SpeechModule {
    @Provides
    @Singleton
    fun provideSpeechRecognizer(@ApplicationContext context: Context): SpeechRecognizer {
        return SpeechRecognizerFactory(context).createSpeechRecognizer()
    }
}

📝 Platform-Specific Setup

Android

Add the following permission to your AndroidManifest.xml:

<uses-permission android:name="android.permission.RECORD_AUDIO" />

You'll also need to request this permission at runtime.

iOS

Add the following to your Info.plist:

<key>NSMicrophoneUsageDescription</key>
<string>This app needs access to your microphone for speech recognition</string>
<key>NSSpeechRecognitionUsageDescription</key>
<string>This app uses speech recognition to convert your speech to text</string>

🚧 Upcoming Features

The following features are planned but not yet implemented:

Compose UI component with built-in microphone button
Customizable recognition parameters (language, timeout, etc.)
Offline recognition support where available
Improved error handling and recovery
Text-to-Speech capabilities
Support requesting permissions for Android at runtime
TextField Composable with integrated microphone button
Support for more languages and dialects
Support WebAssembly (Wasm) for web applications
Support for desktop platforms (JVM)
Support for macOS

🧪 Example App

Check out the included example app in the /example directory for a complete implementation of SpeechToTextKit.

🙌 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Fork the Project
Create your Feature Branch (git checkout -b feature/amazing-feature)
Commit your Changes (git commit -m 'Add some amazing feature')
Push to the Branch (git push origin feature/amazing-feature)
Open a Pull Request

📄 License

Distributed under the Apache 2.0 License. See LICENSE for more information.

📞 Contact

Eslam Wael - @eslamwael74

Project Link: https://github.com/eslamwael74/speechtotextkit