So, finally put a pin in my voice recognition adventures today. I’d been deciding between multiple different open source options: julius, cmusphinx, and simon(briefly).
For my purposes, I really wanted as close to a silent API as I could get, so simon didn’t last long. Next I looked at Julius, which after a lot of twiddling, did finally work (I have a USB audio device — that caused a lot of problems, as well as some lack of clarity on how to set up language models for julius), but I ultimately scrapped because of the availability of documentation and language models for cmusphinx. There are of course also models available for julius, but the out-of-the-box performance just wasn’t good enough for me.
In the end, I have to build my own language model to cut out the possible mistakes (or at least restrict the mistake space), and cmusphinx even has some resources to help with that, and it’s a breeze to use. I still need to hook up functionality but I sure am pleased with the dev process, it looks like it’s going to be extremely easy to integrate sphinx. Thoroughly pleased to finally have this part of the project off the ground.
Here’s a SUPER rough writeup on what I did:
(I’m Running Linux mint, but most of these things should apply well enough to Ubuntu)
-A microphone (obviously)
Download and install cmusphinx
Download and install pocketsphinx
4.1 Download CMUSphinx full text dictionary & language model
4.2 Build your own language model (really fast and easy, plus site is a great resource)
- Try and run pocketsphinx_continuous (w/ audio file or with direct mic input)
If your microphone is being recognized, you need to change alsa’s default microphone to the correct device, using ~/.asoundrc file
arecord -l #will list your recording devices (you should see your USB device there)