Introduction

The GrandE (Grand Engine) variant within the Yobe SDK combines all the functionality that is possible within the Yobe SDK. This includes the following capabilities and settings (take note of the exclusivity of certain settings):

BioListener voice extraction
- with a voice target of EITHER:
  - Near-Listening OR
  - Far-Listening
IDListener voice identification
- with a voice target of EITHER:
  - Near-Listening OR
  - Far-Listening
- AND an output mode of EITHER
  - Automatic Speech Recognition (ASR) Listening OR
  - Automatic Voice Recognition (AVR) Listening

This guide will be broken up into sections describing each function of the Yobe SDK and how to access them within the GrandE variant.

Package Contents

This Yobe Android SDK package contains the following elements:

The speech.aar library that provides the Yobe Android SDK
A sample Android Studio project that implements the Yobe Android SDK
The Yobe license

Prerequisites

Library

Place the provided .aar in the proper location for implementing in your android project. It is common to include the library in your project in the dependencies of the build.gradle.

See the Android Studio Documentation on adding build dependencies for more details.

Permissions

The BioListenerSpeech and IDListenerSpeech requires permissions to use the device's microphones in order to capture audio data for processing. In your project's AndroidManifest.xml you can place this entry inside the manifest tag:

<uses-permission android:name="android.permission.RECORD_AUDIO" />

BioListener (Voice Extraction)

The BioListener is meant for situations where a person is talking while there are noise sources (including other voices). The job of the BioListener is to extract the voice of a person while pushing the noise sources further into the auditory background.

Initialization

Create a new com.yobe.speech.BioListener instance. The instance must then be initialized using the license provided by Yobe, as well as two configuration arguments: the Microphone Orientation and the Output Buffer Type.

// java
BioListener bioListener = new BioListener();
bioListener.Init(yobe_license, MicOrientation.END_FIRE, VoiceTarget.NEAR);

Process and Use Audio

Audio data is passed into the com.yobe.speech.BioListener one buffer at a time. See Audio Buffers for more details on their format. The audio is encoded as PCM 16-bit Shorts.

// java
Object[] result = bioListener.ProcessBuffer(buffer);
short[] processedAudio = SpeechUtil.GetAudioFromResults(result);

result in the above example has an entry that contains the processed version of the audio that is contained in buffer. An example of what to do with this data is to append its contents to a stream or larger buffer.

Clean Up

To ensure proper clean up of the BioListener, simply call com.yobe.speech.BioListener.Deinit.

// java

bioListener.Deinit();

BioListenerSpeech (Real-Time Voice Extraction)

Real-time functionality is accessed via the com.yobe.speech.BioListenerSpeech class.

Then, in your project, you must either prompt the user for relevant permission or enable the permission in the app's settings on the device.

Initialization

A BioListenerSpeech object can be created using a class that implements com.yobe.speech.AudioConsumer.

// java

BioListenerSpeech bioListenerSpeech = new BioListenerSpeech(new MyAudioConsumer()); // creation of object

see Define Real-Time Processing Callbacks for details on MyAudioConsumer

Define Real-Time Processing Callbacks

Creating a com.yobe.speech.BioListenerSpeech object requires an object that implements the callback functions in the com.yobe.speech.AudioConsumer interface. These callback functions will receive processed audio buffers for further processing in real-time. Audio data is captured by the device's microphones, processed, and sent to the com.yobe.speech.AudioConsumer.onDataFeed callback function one buffer at a time. See Audio Buffers for more details on audio buffers. The status after each processing step is sent via a call to the com.yobe.speech.AudioConsumer.onResponse callback function.

The output buffers are arrays of short values. The output buffer size can also vary from call to call.

Note: The originalBuffer will contain two channels of interleaved audio data, while the processedBuffer will only contain one channel of audio data.

// java
import com.yobe.speech.*;
 
// create a class implementing the AudioConsumer callback functions
class MyAudioConsumer implements AudioConsumer {
    @Override
    public void onDataFeed(short[] originalBuffer, short[] processedBuffer) { /* do something with the original and/or processed buffers */ }
 
    @Override
    public void onResponse(Status code) { /* do something with the status */ }
}

Start Processing

Processing is started via com.yobe.speech.BioListenerSpeech.Start and stopped via com.yobe.speech.BioListenerSpeech.Stop. Once started, the callback functions will start being called with the processed audio buffers in real-time.

Note: A BioListenerSpeech object has a startup time of 5 seconds upon calling Start. The BioListenerSpeech object status will prompt for more data by reporting the com.yobe.speech.SpeechUtil.Status.NEEDS_MORE_DATA code in onResponse until this startup time has passed. After the startup time has passed, the onResponse will report com.yobe.speech.SpeechUtil.Status.OK.

// java

bioListenerSpeech.Start("YOBE_LICENSE"); // audio will start getting captured and processed

Clean Up

To stop and clean up the BioListenerSpeech object, simply call com.yobe.speech.BioListenerSpeech.Stop.

// java

bioListenerSpeech.Stop(); // no audio data is captured nor processed

IDListener (Voice Identification) The IDListener is used to perform certain actions with the voice of a pre-enrolled user. The IDListener has two output targets that define its functionality:

Automatic Speech Recognition (ASR) Listening
Automatic Voice Recognition (AVR) Listening

Automatic Speech Recognition (ASR) Listening

The IDListener with ASR output target detects the voice of a pre-enrolled user and extracts it from surrounding noise and human crosstalk. The software enrolls a user on the basis of 10-20 seconds of unscripted speech from that user. The IDListener provides high quality speaker voice signals enabling Automatic Speech Recognition (ASR) platforms to function properly and with a high degree of accuracy in extremely complex environments . A typical use case for this is an enrolled user talking to a kiosk or point of sale tablet in noisy human crosstalk environments (e.g. restaurants, drive-thrus, shopping malls and other public spaces).

Automatic Voice Recognition (AVR) Listening

The IDListener with AVR output target performs AVR on the voice of a pre-enrolled user to extract their voice from surrounding noise and human crosstalk to determine when the pre-enrolled user is silent and consequently mute the entire signal at those times. The software also has the capability of enrolling a user on the basis of 10-20 seconds of unscripted speech. The Voice Template is used to determine the intervals that a pre-enrolled user is not talking. During those intervals, the audio signal is muted. Typical use cases for this mode are:

a registered user engaged in a two way conversation and needing the signal to be entirely muted when they are not speaking
a solution wanting to input or record only the authorized speaker at all times, even though there may be other people speaking as well as other sources of noise

Initialization

Create a new com.yobe.speech.IDListener instance. The instance must then be initialized using the license provided by Yobe, as well as two configuration arguments: the Microphone Orientation and the Output Buffer Type.

// java
IDListener idListener = new IDListener();
idListener.Init("YOBE_LICENSE", MicOrientation.END_FIRE, OutputBufferType.FIXED);

Register and Select User

A com.yobe.speech.IDTemplate must be created using the desired user's voice so the IDListener can select that template and identify the voice.

Register the user by inputting their voice audio data using com.yobe.speech.IDListener.RegisterTemplate. This is done using a continuous array of audio samples.

It is recommended to first process the audio data so that only speech is present in the audio; this will yield better identification results. To achieve this, the IDListener can be placed into Enrollment Mode. In this mode, the audio that should be used to create a BiometricTemplate is processed, buffer-by-buffer, using Yobe::IDListener::ProcessBuffer. This ProcessBuffer will return a status of com.yobe.speech.Status.ENROLLING as long as the buffers are being processed in Enrollment Mode. Enrollment Mode is started by calling com.yobe.speech.IDListener.StartEnrollment, and is stopped by either manually calling com.yobe.speech.IDListener.StopEnrollment or by processing enough buffers for it to stop automatically based on an internal counter (currently, this is enough buffers to equal 20 seconds of audio).

Enrollment Mode can be started at any point after initial calibration. Any samples processed while in Enrollment Mode will not be matched for identification to a selected template, if there is one.

// java
 
/****(Option 1) register with unprocessed samples****/
IDTemplate idTemplate = idListener.RegisterTemplate(samples)
 
/****(Option 2) register with processed samples****/
// process enough unspecified audio to calibrate
Status status = Status.NEEDS_MORE_DATA;
do {
    status = SpeechUtil.GetStatusFromResults(idListener.ProcessBuffer(inputBuffer));
    inputBuffer = GetNextInputBuffer(); // some example function to get next buffer
} while (status == Status.NEEDS_MORE_DATA)
 
// now that we've calibrated, process desired user audio in Enrollment Mode
short[] processedVoiceAudio = new short[someLength];
idListener.StartEnrollment();
Status status = Status.OK;
do {
    Object[] result = idListener.ProcessBuffer(voiceInputBuffer);
    voiceInputBuffer = GetNextVoiceBuffer(); // some example function to get next buffer
    if (voiceInputBuffer == nullptr) {
        // the case where we've run out of voice audio before enrollment automatically stops
        idListener.StopEnrollment()
        break;
    }
 
    status = SpeechUtil.GetStatusFromResults(result);
    short[] processedAudio = SpeechUtil.GetAudioFromResults(result);
    // store processedAudio in processedVoiceAudio, such as with a for-loop
    //...
} while (status == Status.ENROLLING)
 
// register with the processed audio
IDTemplate idTemplate = idListener.RegisterTemplate(processedVoiceAudio);

Select the user using the template returned by the registration.

// java

idListener.SelectUser(idTemplate);

Any new audio buffers passed to com.yobe.speech.IDListener.ProcessBuffer will be processed with respect to the selected user's voice.

Process and Use Audio

Audio data is passed into the com.yobe.speech.IDListener one buffer at a time. See Audio Buffers for more details on their format. The audio is encoded as PCM 16-bit Shorts.

// java
Object[] result = idListener.ProcessBuffer(buffer);
short[] processedAudio = SpeechUtil.GetAudioFromResults(result);

result in the above example has an entry that contains the processed version of the audio that is contained in buffer. An example of what to do with this data is to append its contents to a stream or larger buffer.

Note: You can find the library's built in buffer size using com.yobe.speech.Util.GetBufferSizeSamples.

Clean Up

To ensure proper clean up of the IDListener, simply call com.yobe.speech.IDListener.Deinit.

// java

idListener.Deinit();

IDListenerSpeech (Real-Time Voice Extraction)

LATTE's real-time functionality is accessed via the com.yobe.speech.IDListenerSpeech class.

Then, in your project, you must either prompt the user for relevant permission or enable the permission in the app's settings on the device.

Initialization

An IDListenerSpeech object can be created using a class that implements com.yobe.speech.AudioConsumer.

// java

IDListenerSpeech idListenerSpeech = new IDListenerSpeech(new MyAudioConsumer()); // creation of object

see Define Real-Time Processing Callbacks for details on MyAudioConsumer

Define Real-Time Processing Callbacks

Creating a com.yobe.speech.IDListenerSpeech object requires an object that implements the callback functions in the com.yobe.speech.AudioConsumer interface. These callback functions will receive processed audio buffers for further processing in real-time. Audio data is captured by the device's microphones, processed, and sent to the com.yobe.speech.AudioConsumer.onDataFeed callback function one buffer at a time. See Audio Buffers for more details on audio buffers. The status after each processing step is sent via a call to the com.yobe.speech.AudioConsumer.onResponse callback function.

The output buffers are arrays of short values.

Note: The originalBuffer will contain two channels of interleaved audio data, while the processedBuffer will only contain one channel of audio data.

// java
import com.yobe.speech.*;
 
// create a class implementing the AudioConsumer callback functions
class MyAudioConsumer implements AudioConsumer {
    @Override
    public void onDataFeed(short[] originalBuffer, short[] processedBuffer) { /* do something with the original and/or processed buffers */ }
 
    @Override
    public void onResponse(Status code) { /* do something with the status */ }
}

The processedBuffer callback argument in onDataFeed can be thought of as the output of the com.yobe.speech.IDListener.ProcessBuffer function. Further processing or storage can be done. However, it's important to minimize runtime spent in the callback function to keep the audio processing running at real-time speeds.

Register and Select User

The functions for performing IDTemplate registration and selection are the same as the IDListener, in the section Register and Select User. For live, processed enrollment, simply implement Option 2 in the onDataFeed callback.

Start Processing

Processing is started via com.yobe.speech.IDListenerSpeech.Start and stopped via com.yobe.speech.IDListenerSpeech.Stop. Once started, the callback functions will start being called with the processed audio buffers in real-time.

Note: A IDListenerSpeech object has a startup time of 5 seconds upon calling Start. The IDListenerSpeech object status will prompt for more data by reporting the com.yobe.speech.SpeechUtil.Status.NEEDS_MORE_DATA code in onResponse until this startup time has passed. After the startup time has passed, the onResponse will report com.yobe.speech.SpeechUtil.Status.OK.

// java

idListenerSpeech.Start("YOBE_LICENSE"); // audio will start getting captured and processed

Clean Up

To stop and clean up the IDListenerSpeech object, simply call com.yobe.speech.IDListenerSpeech.Stop.

// java

idListenerSpeech.Stop(); // no audio data is captured nor processed