Conventional HCI(Human Computer Interface) mechanisms involve human contact, making them clumsy as compared to more state-of-the-art development in this area. The problem is compounded with multiple devices which is becoming a norm owing to the development in handheld devices. The challenge is to identify where the bottleneck is in the conventional approaches and provide effective solutions on the same. As mentioned earlier human contact is the bottleneck and speech interface is rapidly emerging as a stable solution. The following project is aimed towards illustrating our hypothesis and taking first step towards the solution. In nutshell we aim to configure hardware and write a software capable of responding to speech and carry out the set tasks.
Recent developments in this field, suggests the use of embedded devices for detection of speech and responding, due to its features like compact size and minimum power consumption.
Hardware Components Used
USB Sound Adapter
Software Components Used
Custom Linux built for BeagleBone Black(Or could be pre-built Linux : Debian : Debian is used in this demo).
Julius LVCSR : Open source speech recognition software.
Voxforge Datasets : Acoustic Models for Julius.
ALSA packages are expected to be installed.
Following steps describe the whole procedure of the experiment
Building everything from scratch was the major goal, hence BeagleBone Black was the most optimum choice. It's hardware schematic is open as well, hence it can be customized according to our applications requirement. Customizing the Linux kernel and building it for Beaglebone Board, namely Cross Linux From Scratch (CLFS) was next most important step.
Julius is a high-performance, two-pass large vocabulary continuous speech recognition (LVCSR) decoder software for speech-related researchers and developers. The acoustic models and language models are pluggable, and you can build various types of speech recognition system by building your own models and modules to be suitable for your task. In this demo we used models from Voxforge. This helped Julius to use a vocabulary file , a grammar file and a language model to accurately detect the speech.
Voxforge Datasets was set up to collect transcribed speech for use in Open Source Speech Recognition Engines ("SRE"s) such as such as ISIP, HTK, Julius and Sphinx. Unlike most acoustic models it is open source.
While cross compiling Julius for BeagleBone Black, we configured it to take full support of ALSA(Advanced Linux Sound Architecture). This helped in detection of the microphone, which was connected using USB Sound card Adapter.
In this demo Julius is the server that uses ALSA to detect the speech and send the output data to python client script. This script parses the received command, and simply compares with the available commands and takes the respective action. The action could be anything ranging playing a wav file, sensing and displaying temperature, or actuating a motor (these are, as far as this demo is concerned, could be extended to provide with any feature according to the application). And instead of simple comparison, a state machine can be built depending upon the application. For example : It responds to "HI" by playing a welcome wav file; starts the motor on command "START MOTOR", etc.
The following video shows a demonstration of the prototype proposed above. Its shows an LED, Motor and a sensor being controlled by the user using speech recognition software.
Extending this experiment into a full-fledged product is our primary focus. Hence making it more interactive and providing end to end solutions from design, production to deployment is our motive. Do contact us if you find this idea interesting.