Today my latest client project Focus was released in the Apple and Google Play app stores.
Focus is a safe driving app, which uses proprietary, state of the art speech to text technology to allow users to send and reply to text messages, make calls, send inter-app messages to other Focus users, and more. In addition, Focus leverages built-in text to speech technology to power a fully voice controlled user interface and read-back of messages, notes, and more.
Focus uses a proprietary blend of speech recognition systems — notably Siri, OpenEars, and Nuance. What makes it proprietary exactly? I built a simple AI system that determines the best library for the job, given the requirements of the user and the ambient sound conditions. As an example: matching speech against a list of known commands in a noisy car would require one combination of audio library and settings, whereas speech transcription in a quiet environment would be better suited using another combination. This degree of intelligent fine tuning resulted in a speech recognition app that outperformed the biggest names in voice recognition at release — an amazing feat considering the relatively small size of the team, and the constrained budget of a modest, bootstrapped startup.
But before that blending technology could be built, I had to build an iOS wrapper for the low level C code that makes up the Nuance speech framework. Nuance is an embedded speech recognition platform, not designed out of the box for high level use, as in an app like Focus. I essentially built the SDK a company like Nuance would normally provide to end users to use in client apps like Focus. This is no easy task, but luckily it’s something I have done before (as on the Capture app with the Audible Magic library). Tasks with a difficulty of this magnitude — which often come up well into the development process — are why it is absolutely critical that your development team is top notch.
Last modified: February 13, 2019