yx_speech_to_text_flutter/README.md

8.7 KiB
Raw Permalink Blame History

YX ASR - Flutter Speech-to-Text Plugin

基于 sherpa_onnx 的 Flutter 语音识别插件,提供完全离线的实时语音转文字功能。

特性

  • 🎤 实时语音识别: 边说边转换的实时转录功能
  • 🔄 切换录音: 简单的开始/停止录音,带有视觉反馈
  • 🌍 多语言支持: 支持中文、英文等多种语言
  • 📱 跨平台: 支持 iOS 和 Android 平台
  • 🎛️ 自定义UI: 灵活的录音按钮组件,支持自定义外观
  • 🔒 权限管理: 自动处理麦克风权限申请
  • 完全离线: 基于 sherpa_onnx无需网络连接
  • 🎯 高精度识别: 使用先进的神经网络模型
  • 🚀 低延迟: 实时处理,响应迅速
  • 🔐 隐私保护: 语音数据不会上传到云端

安装

在您的 pubspec.yaml 文件中添加依赖:

dependencies:
  yx_asr: ^1.0.0

然后运行:

flutter pub get

模型文件准备

由于使用 sherpa_onnx您需要下载对应的模型文件

  1. 中文模型 (推荐)

  2. 英文模型

    • 模型名称: sherpa-onnx-streaming-zipformer-en-2023-02-21
    • 解压到: assets/models/en-us/
  3. 模型文件结构

assets/models/
├── zh-cn/
│   ├── encoder.onnx
│   ├── decoder.onnx
│   ├── joiner.onnx
│   └── tokens.txt
└── en-us/
    ├── encoder.onnx
    ├── decoder.onnx
    ├── joiner.onnx
    └── tokens.txt

平台配置

Android

android/app/src/main/AndroidManifest.xml 中添加权限:

<uses-permission android:name="android.permission.RECORD_AUDIO" />

iOS

ios/Runner/Info.plist 中添加权限:

<key>NSMicrophoneUsageDescription</key>
<string>此应用需要麦克风权限来录制您的语音进行识别</string>

注意:由于使用 sherpa_onnx 进行离线识别,不需要网络权限和语音识别权限。

快速开始

基本使用

import 'package:yx_asr/yx_asr.dart';

class MyApp extends StatefulWidget {
  @override
  _MyAppState createState() => _MyAppState();
}

class _MyAppState extends State<MyApp> {
  final YxAsr _speechToText = YxAsr();
  String _recognizedText = '';
  bool _isListening = false;

  @override
  void initState() {
    super.initState();
    _initializeSpeechToText();
  }

  Future<void> _initializeSpeechToText() async {
    // 使用中文模型初始化
    bool initialized = await _speechToText.initializeWithModel('assets/models/zh-cn');

    if (initialized) {
      // 监听识别结果
      _speechToText.onResult.listen((result) {
        setState(() {
          _recognizedText = result.recognizedWords;
        });
      });

      // 监听错误
      _speechToText.onError.listen((error) {
        print('语音识别错误: ${error.errorMsg}');
      });

      // 监听状态变化
      _speechToText.onListeningStatusChanged.listen((isListening) {
        setState(() {
          _isListening = isListening;
        });
      });
    }
  }

  Future<void> _toggleRecording() async {
    if (_isListening) {
      await _speechToText.stopListening();
    } else {
      await _speechToText.startListening(
        partialResults: true, // 启用部分结果
      );
    }
  }

  @override
  Widget build(BuildContext context) {
    return Scaffold(
      appBar: AppBar(title: Text('语音识别')),
      body: Column(
        children: [
          Text('识别结果: $_recognizedText'),
          ElevatedButton(
            onPressed: _toggleRecording,
            child: Text(_isListening ? '停止' : '开始'),
          ),
        ],
      ),
    );
  }
}

Using the Recording Button Widget

The plugin includes a customizable RecordingButton widget:

import 'package:yx_asr/yx_asr.dart';

RecordingButton(
  onResult: (result) {
    print('Result: ${result.recognizedWords}');
  },
  onError: (error) {
    print('Error: ${error.errorMsg}');
  },
  onListeningStatusChanged: (isListening) {
    print('Listening: $isListening');
  },
  localeId: 'en-US',
  partialResults: true,
  size: 80.0,
  tooltip: 'Tap to record',
)

API Reference

YxAsr Class

Methods

  • Future<bool> initialize() - Initialize the speech recognition service
  • Future<bool> isAvailable() - Check if speech recognition is available
  • Future<bool> hasPermission() - Check if microphone permission is granted
  • Future<bool> requestPermission() - Request microphone permission
  • Future<void> startListening({String localeId, bool partialResults, bool onDevice}) - Start listening
  • Future<void> stopListening() - Stop listening and get final result
  • Future<void> cancel() - Cancel current recognition session
  • Future<bool> get isListening - Check if currently listening

Streams

  • Stream<SpeechRecognitionResult> onResult - Stream of recognition results
  • Stream<SpeechRecognitionError> onError - Stream of recognition errors
  • Stream<bool> onListeningStatusChanged - Stream of listening status changes

SpeechRecognitionResult

class SpeechRecognitionResult {
  final String recognizedWords;    // The recognized text
  final bool finalResult;          // Whether this is a final result
  final double confidence;         // Confidence level (0.0 to 1.0)
  final List<String> alternatives; // Alternative recognition results
}

SpeechRecognitionError

class SpeechRecognitionError {
  final SpeechRecognitionErrorType errorType; // Type of error
  final String errorMsg;                      // Human-readable error message
  final String? errorCode;                    // Platform-specific error code
}

RecordingButton Widget

Properties

  • onResult - Callback for recognition results
  • onError - Callback for recognition errors
  • onListeningStatusChanged - Callback for status changes
  • localeId - Language locale (default: 'en-US')
  • partialResults - Enable partial results (default: true)
  • onDevice - Use on-device recognition on iOS (default: false)
  • size - Button size (default: 80.0)
  • idleColor - Button color when not recording
  • recordingColor - Button color when recording
  • disabledColor - Button color when disabled
  • enabled - Whether the button is enabled (default: true)
  • tooltip - Tooltip text

Supported Languages

The plugin supports multiple languages including:

  • English (en-US, en-GB)
  • Chinese (zh-CN, zh-TW)
  • Japanese (ja-JP)
  • Korean (ko-KR)
  • Spanish (es-ES)
  • French (fr-FR)
  • German (de-DE)
  • Italian (it-IT)

Error Handling

The plugin provides comprehensive error handling through the SpeechRecognitionError class:

_speechToText.onError.listen((error) {
  switch (error.errorType) {
    case SpeechRecognitionErrorType.permissionDenied:
      // Handle permission denied
      break;
    case SpeechRecognitionErrorType.network:
      // Handle network errors
      break;
    case SpeechRecognitionErrorType.noSpeech:
      // Handle no speech detected
      break;
    // ... handle other error types
  }
});

Best Practices

  1. Always check permissions before starting recognition
  2. Handle errors gracefully to provide good user experience
  3. Use partial results for real-time feedback
  4. Stop listening when done to conserve battery
  5. Test on real devices as speech recognition doesn't work well on simulators

Example App

Check out the example/ directory for a comprehensive example app that demonstrates:

  • Real-time speech recognition
  • Multiple language support
  • Error handling
  • Recognition history
  • Customizable settings

Contributing

Contributions are welcome! Please read our contributing guidelines and submit pull requests to our repository.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Troubleshooting

Common Issues

  1. Permission Denied Error

    • Ensure microphone permissions are added to platform manifests
    • Call requestPermission() before starting recognition
  2. Speech Recognition Not Available

    • Check if device supports speech recognition with isAvailable()
    • Ensure Google app is installed and updated on Android
  3. No Speech Detected

    • Check microphone hardware
    • Ensure app has microphone permission
    • Try speaking louder or closer to the microphone
  4. Network Errors

    • Check internet connectivity
    • Some platforms require network for speech recognition

Testing

  • Speech recognition doesn't work well on simulators/emulators
  • Always test on real devices
  • Test in quiet environments for better accuracy

Support

For issues and feature requests, please use the GitHub issue tracker.