Their system starts with a basic statistical language model to make the best guess about what you say. It then improves upon that by taking into account context, and positive and negative user feedback down to the individual. Context helps the system by narrowing the number of possible words you said. For instance, if the context is an address, the number of possible street names is limited to the ones in the city. User feedback correcting the systemâ€™s output or leaving it be helps the system learn how you speak (e.g correcting Austin to Boston).
This seems like a pretty daunting task as voice/speech recognition usually takes quite a bit of a) computing power and b) dictionaries of pattern matching for comparison. Of course it appears that the two co-founders heading up the project are up to the task.
The two co-founders (Mike Phillips and John Nguyen) worked for SpeechWorks, which was acquired by ScanSoft, which then renamed itself Nuance. Nuance most recently paid $293 million for VoiceSignal, a company using speech recognition for mobile search in 21 languages.
Finally, there’s a demo that you can watch of the technology in action. Check it out (please click through to the site if you cannot see the video).