在探索如何使用文本到语音技术的过程中,发现了一种相反的需求:如何将语音转换为文本。尽管有许多文章介绍了前者,但后者的介绍却相对匮乏。因此,决定撰写一篇基础文章,分享在这方面的经验。
首先,需要在应用程序中引用位于GAC中的System.Speech程序集。这个程序集包含了实现语音识别所需的所有命名空间和类。
在开始使用SpeechRecognitionEngine之前,需要设置一些属性并调用一些方法。以下是C#代码示例:
SpeechRecognitionEngine speechRecognitionEngine = null;
speechRecognitionEngine = createSpeechEngine("de-DE");
speechRecognitionEngine.AudioLevelUpdated += new EventHandler(engine_AudioLevelUpdated);
speechRecognitionEngine.SpeechRecognized += new EventHandler(engine_SpeechRecognized);
loadGrammarAndCommands();
speechRecognitionEngine.SetInputToDefaultAudioDevice();
speechRecognitionEngine.RecognizeAsync(RecognizeMode.Multiple);
接下来,将详细介绍createSpeechEngine方法。这个方法允许选择语音引擎使用的语言。如果所需的语言没有安装,那么将使用默认语言(Windows桌面语言)。
private SpeechRecognitionEngine createSpeechEngine(string preferredCulture)
{
foreach (RecognizerInfo config in SpeechRecognitionEngine.InstalledRecognizers())
{
if (config.Culture.ToString() == preferredCulture)
{
speechRecognitionEngine = new SpeechRecognitionEngine(config);
break;
}
}
if (speechRecognitionEngine == null)
{
MessageBox.Show("The desired culture is not installed on this machine, the speech-engine will continue using " + SpeechRecognitionEngine.InstalledRecognizers()[0].Culture.ToString() + " as the default culture.", "Culture " + preferredCulture + " not found!");
speechRecognitionEngine = new SpeechRecognitionEngine();
}
return speechRecognitionEngine;
}
接下来,需要设置SpeechRecognitionEngine使用的语法。在这个例子中,创建了一个自定义的文本文件,其中包含了文本的键值对,这些文本被包装在自定义类SpeechToText.Word中。
namespace SpeechToText
{
public class Word
{
public string Text { get; set; }
public string AttachedText { get; set; }
public bool IsShellCommand { get; set; }
}
}
以下是设置Grammar使用的Choices的方法。在foreach循环中,创建并插入Word类,并将其存储在查找List<Word>中。然后,将解析出的单词插入到Choices类中,并最终使用GrammarBuilder构建Grammar,并使用SpeechRecognitionEngine同步加载它。
private void loadGrammarAndCommands()
{
try
{
Choices texts = new Choices();
string[] lines = File.ReadAllLines(Environment.CurrentDirectory + "\\example.txt");
foreach (string line in lines)
{
if (line.StartsWith("--") || line == String.Empty)
continue;
var parts = line.Split(new char[] { '|' });
words.Add(new Word() { Text = parts[0], AttachedText = parts[1], IsShellCommand = (parts[2] == "true") });
texts.Add(parts[0]);
}
Grammar wordsList = new Grammar(new GrammarBuilder(texts));
speechRecognitionEngine.LoadGrammar(wordsList);
}
catch (Exception ex)
{
throw ex;
}
}
要启动SpeechRecognitionEngine,调用SpeechRecognitionEngine.StartRecognizeAsync(RecognizeMode.Multiple)。这意味着识别器将继续执行异步识别操作,直到调用RecognizeAsyncCancel()或RecognizeAsyncStop()方法。要检索异步识别操作的结果,请附加事件处理程序到识别器的SpeechRecognized事件。
speechRecognitionEngine.SpeechRecognized += new EventHandler(engine_SpeechRecognized);
speechRecognitionEngine.RecognizeAsync(RecognizeMode.Multiple);
当识别器识别到预定义的单词之一时,决定是返回关联的文本,还是执行一个shell命令。这是在以下函数中完成的:
private string getKnownTextOrExecute(string command)
{
try
{
var cmd = words.Where(c => c.Text == command).First();
if (cmd.IsShellCommand)
{
Process proc = new Process();
proc.EnableRaisingEvents = false;
proc.StartInfo.FileName = cmd.AttachedText;
proc.Start();
return "you just started : " + cmd.AttachedText;
}
else
{
return cmd.AttachedText;
}
}
catch (Exception)
{
return command;
}
}