语音识别技术入门指南

在探索如何使用文本到语音技术的过程中,发现了一种相反的需求:如何将语音转换为文本。尽管有许多文章介绍了前者,但后者的介绍却相对匮乏。因此,决定撰写一篇基础文章,分享在这方面的经验。

解决方案概述

首先,需要在应用程序中引用位于GAC中的System.Speech程序集。这个程序集包含了实现语音识别所需的所有命名空间和类。

在开始使用SpeechRecognitionEngine之前,需要设置一些属性并调用一些方法。以下是C#代码示例:

SpeechRecognitionEngine speechRecognitionEngine = null; speechRecognitionEngine = createSpeechEngine("de-DE"); speechRecognitionEngine.AudioLevelUpdated += new EventHandler(engine_AudioLevelUpdated); speechRecognitionEngine.SpeechRecognized += new EventHandler(engine_SpeechRecognized); loadGrammarAndCommands(); speechRecognitionEngine.SetInputToDefaultAudioDevice(); speechRecognitionEngine.RecognizeAsync(RecognizeMode.Multiple);

接下来,将详细介绍createSpeechEngine方法。这个方法允许选择语音引擎使用的语言。如果所需的语言没有安装,那么将使用默认语言(Windows桌面语言)。

private SpeechRecognitionEngine createSpeechEngine(string preferredCulture) { foreach (RecognizerInfo config in SpeechRecognitionEngine.InstalledRecognizers()) { if (config.Culture.ToString() == preferredCulture) { speechRecognitionEngine = new SpeechRecognitionEngine(config); break; } } if (speechRecognitionEngine == null) { MessageBox.Show("The desired culture is not installed on this machine, the speech-engine will continue using " + SpeechRecognitionEngine.InstalledRecognizers()[0].Culture.ToString() + " as the default culture.", "Culture " + preferredCulture + " not found!"); speechRecognitionEngine = new SpeechRecognitionEngine(); } return speechRecognitionEngine; }

接下来,需要设置SpeechRecognitionEngine使用的语法。在这个例子中,创建了一个自定义的文本文件,其中包含了文本的键值对,这些文本被包装在自定义类SpeechToText.Word中。

namespace SpeechToText { public class Word { public string Text { get; set; } public string AttachedText { get; set; } public bool IsShellCommand { get; set; } } }

以下是设置Grammar使用的Choices的方法。在foreach循环中,创建并插入Word类,并将其存储在查找List<Word>中。然后,将解析出的单词插入到Choices类中,并最终使用GrammarBuilder构建Grammar,并使用SpeechRecognitionEngine同步加载它。

private void loadGrammarAndCommands() { try { Choices texts = new Choices(); string[] lines = File.ReadAllLines(Environment.CurrentDirectory + "\\example.txt"); foreach (string line in lines) { if (line.StartsWith("--") || line == String.Empty) continue; var parts = line.Split(new char[] { '|' }); words.Add(new Word() { Text = parts[0], AttachedText = parts[1], IsShellCommand = (parts[2] == "true") }); texts.Add(parts[0]); } Grammar wordsList = new Grammar(new GrammarBuilder(texts)); speechRecognitionEngine.LoadGrammar(wordsList); } catch (Exception ex) { throw ex; } }

要启动SpeechRecognitionEngine,调用SpeechRecognitionEngine.StartRecognizeAsync(RecognizeMode.Multiple)。这意味着识别器将继续执行异步识别操作,直到调用RecognizeAsyncCancel()或RecognizeAsyncStop()方法。要检索异步识别操作的结果,请附加事件处理程序到识别器的SpeechRecognized事件。

speechRecognitionEngine.SpeechRecognized += new EventHandler(engine_SpeechRecognized); speechRecognitionEngine.RecognizeAsync(RecognizeMode.Multiple);

当识别器识别到预定义的单词之一时,决定是返回关联的文本,还是执行一个shell命令。这是在以下函数中完成的:

private string getKnownTextOrExecute(string command) { try { var cmd = words.Where(c => c.Text == command).First(); if (cmd.IsShellCommand) { Process proc = new Process(); proc.EnableRaisingEvents = false; proc.StartInfo.FileName = cmd.AttachedText; proc.Start(); return "you just started : " + cmd.AttachedText; } else { return cmd.AttachedText; } } catch (Exception) { return command; } }
沪ICP备2024098111号-1
上海秋旦网络科技中心:上海市奉贤区金大公路8218号1幢 联系电话:17898875485