Openframeworks & Pocketsphinx

I wanted to make an openFrameworks app which used Pocketsphinx to do some automatic transcribing of audio files and provide output. In this tutorial I’ll explain how I managed it.

Introduction

When I need to pull something together to test out a quick idea I reach for openFrameworks. It’s an open source toolkit designed for “creative coding”. OpenFrameworks is written in C++ and runs on Windows, Mac OS X, Linux, iOS, and Android.

Recently I also needed a library which could perform speech recognition both on audio files and directly from a microphone. I turned to CMUSphinx, which collects over 20 years of CMU’s research on speech processing. It uses state of the art speech recognition algorithms for efficient processing of audio data.

Within this repository is a library called Pocketsphinx. Pocketsphinx is CMU’s recognition library for embedded devices, but it works just as well on desktop machines too. It depends on SphinxBase which provides common functionality across all CMUSphinx projects. To get started, you need to install Pocketsphinx and Sphinxbase.

I wanted to build an openFrameworks application which used Pocketsphinx for transcribing audio files and output this to my app. I’m using OSX and XCode but the steps will be slightly similar for anyone working on Linux and Windows.

Installing Pocketsphinx

First thing we have to do is to install Sphinxbase and Pocketsphinx by following their tutorial.

Check that sphinx is installed properly by running pocketsphinx_continious.c from the command line. Our only problem here is that sphinx is a 64 bit library by default, and openFrameworks is a 32 library.

Where you built sphinx, run the code below and it should install libraries for i386.

In your Xcode project you’ll need to include the following files, make sure to hit ‘add to target’

And then we’ll set up a few project attributes. First in ‘Header Search Paths’ add

and in ‘Library Search Paths’ add

Here’s what you should have by now:

Project Setup

Example App

Now I’ll walk you through an example which reads in a directory of files and processes each one when we press a key.

We start up with some setup, set our directory path, read them through and add them to a vector so we can access them later. It’s important to note that Pocketsphinx only works with files that are 16 bit headerless raw files samples at 16KHz. Now you can do this with a program like Audacity or we can write a bash script which makes use of SoX which does it automatically. You can read how I did that here.

Here is where we initialise the engine, the important step is the config call which sets up some parameters for the command line arguments which pocketsphinx uses. MODELDIR is defined in our header file and is specific to where you place the “en-us” language model. Mine was on my desktop so mine was #define MODELDIR “/Users/benjgorman/desktop”.

Here is how we open the engine, and open our file.

Here is where we close the engine and perform our final checks.

Here we exit the engine, free up some objects etc.

Then what we need to do is is transcribe the next file when we press the ‘n’ key.

Last thing to do is simply write the current sentence to the screen!

You can download the full example from Github, but remember you need to follow the steps at the start of this tutorial in order to install Pocketsphinx on your system.

Leave a comment and let me know if this all works out for you. I’ll be adding a similar tutorial for getting microphone intput in to work as well.

Find this full example on Github.