The Near-LATTE (Lock And Track Type Engine) variant for ASR-listening within the Yobe SDK detects the voice of a pre-enrolled user and extracts it from surrounding noise and human crosstalk. The software enrolls a user on the basis of 10-20 seconds of unscripted speech from that user. Yobe Near-LATTE provides high quality speaker voice signals enabling Automatic Speech Recognition (ASR) platforms to function properly and with a high degree of accuracy in extremely complex environments . A typical use case for the Near-LATTE variant for ASR-listening is an enrolled user talking to a kiosk or point of sale tablet in noisy human crosstalk environments (e.g. restaurants, drive-thrus, shopping malls and other public spaces).
The Near-LATTE variant for ASR-listening has the following capabilities:
Note: Only one user can be enrolled at a time.
This Yobe Android SDK package contains the following elements:
speech.aar
library that provides the Yobe Android SDKPlace the provided .aar in the proper location for implementing in your android project. It is common to include the library in your project in the dependencies
of the build.gradle
.
See the Android Studio Documentation on adding build dependencies for more details.
The IDListenerSpeech requires permissions to use the device's microphones in order to capture audio data for processing. In your project's AndroidManifest.xml
you can place this entry inside the manifest
tag:
LATTE's main functionality is accessed via the com.yobe.speech.IDListener class.
Create a new com.yobe.speech.IDListener instance. The instance must then be initialized using the license provided by Yobe, as well as two configuration arguments: the Microphone Orientation and the Output Buffer Type.
A com.yobe.speech.IDTemplate must be created using the desired user's voice so the IDListener can select that template and identify the voice.
Register the user by inputting their voice audio data using com.yobe.speech.IDListener.RegisterTemplate. This is done using a continuous array of audio samples.
It is recommended to first process the audio data so that only speech is present in the audio; this will yield better identification results. To achieve this, the IDListener can be placed into Enrollment Mode. In this mode, the audio that should be used to create a BiometricTemplate is processed, buffer-by-buffer, using Yobe::IDListener::ProcessBuffer. This ProcessBuffer will return a status of com.yobe.speech.Status.ENROLLING as long as the buffers are being processed in Enrollment Mode. Enrollment Mode is started by calling com.yobe.speech.IDListener.StartEnrollment, and is stopped by either manually calling com.yobe.speech.IDListener.StopEnrollment or by processing enough buffers for it to stop automatically based on an internal counter (currently, this is enough buffers to equal 20 seconds of audio).
Enrollment Mode can be started at any point after initial calibration. Any samples processed while in Enrollment Mode will not be matched for identification to a selected template, if there is one.
Select the user using the template returned by the registration.
Any new audio buffers passed to com.yobe.speech.IDListener.ProcessBuffer while not in Enrollment Mode will be processed with respect to the selected user's voice.
Audio data is passed into the com.yobe.speech.IDListener one buffer at a time. See Audio Buffers for more details on their format. The audio is encoded as PCM 16-bit Shorts.
result
in the above example has an entry that contains the processed version of the audio that is contained in buffer
. An example of what to do with this data is to append its contents to a stream or larger buffer.
Note: You can find the library's built in buffer size using com.yobe.speech.Util.GetBufferSizeSamples.
To ensure proper clean up of the IDListener, simply call com.yobe.speech.IDListener.Deinit.
LATTE's real-time functionality is accessed via the com.yobe.speech.IDListenerSpeech class.
Then, in your project, you must either prompt the user for relevant permission or enable the permission in the app's settings on the device.
An IDListenerSpeech object can be created using a class that implements com.yobe.speech.AudioConsumer.
see Define Real-Time Processing Callbacks for details on MyAudioConsumer
Creating a com.yobe.speech.IDListenerSpeech object requires an object that implements the callback functions in the com.yobe.speech.AudioConsumer interface. These callback functions will receive processed audio buffers for further processing in real-time. Audio data is captured by the device's microphones, processed, and sent to the com.yobe.speech.AudioConsumer.onDataFeed callback function one buffer at a time. See Audio Buffers for more details on audio buffers. The status after each processing step is sent via a call to the com.yobe.speech.AudioConsumer.onResponse callback function.
The output buffers are arrays of short
values.
Note: The originalBuffer
will contain two channels of interleaved audio data, while the processedBuffer
will only contain one channel of audio data.
The processedBuffer
callback argument in onDataFeed
can be thought of as the output of the com.yobe.speech.IDListener.ProcessBuffer function. Further processing or storage can be done. However, it's important to minimize runtime spent in the callback function to keep the audio processing running at real-time speeds.
The functions for performing IDTemplate registration and selection are the same as the IDListener, in the section Register and Select User. For live, processed enrollment, simply implement Option 2 in the onDataFeed
callback.
Processing is started via com.yobe.speech.IDListenerSpeech.Start and stopped via com.yobe.speech.IDListenerSpeech.Stop. Once started, the callback functions will start being called with the processed audio buffers in real-time.
Note: A IDListenerSpeech object has a startup time of 5 seconds upon calling Start. The IDListenerSpeech object status will prompt for more data by reporting the com.yobe.speech.SpeechUtil.Status.NEEDS_MORE_DATA code in onResponse
until this startup time has passed. After the startup time has passed, the onResponse
will report com.yobe.speech.SpeechUtil.Status.OK.
To stop and clean up the IDListenerSpeech object, simply call com.yobe.speech.IDListenerSpeech.Stop.