Introduction

The Far-CAFE (Conversational Audio Front-End) variant within the Yobe SDK is meant for situations where a person is talking far from a device while there are noise sources (including other voices) much closer to the device. The job of Far-CAFE is to extract the voice of the far-field person while pushing the near-field noise sources further into the auditory background. A typical use case would be a person talking to an appliance (that is making its own noise) from afar.

When to use this variant

This is a Far-Listening scenario as shown in the diagram below:
Processed audio is desired to contain as much of the speech and as little of the noise as possible.

Package Contents

This Yobe Android SDK package contains the following elements:

The speech.aar library that provides the Yobe Android SDK
A sample Android Studio project that implements the Yobe Android SDK
The Yobe license

Prerequisites

Library

Place the provided .aar in the proper location for implementing in your android project. It is common to include the library in your project in the dependencies of the build.gradle.

See the Android Studio Documentation on adding build dependencies for more details.

Permissions

The BioListenerSpeech requires permissions to use the device's microphones in order to capture audio data for processing. In your project's AndroidManifest.xml you can place this entry inside the manifest tag:

<uses-permission android:name="android.permission.RECORD_AUDIO" />

BioListener (Voice Extraction)

CAFE's main functionality (biometric listening) is accessed via the com.yobe.speech.BioListener class.

Initialization

Create a new com.yobe.speech.BioListener instance. The instance must then be initialized using the license provided by Yobe, as well as two configuration arguments: the Microphone Orientation and the Output Buffer Type.

// java
BioListener bioListener = new BioListener();
bioListener.Init(yobe_license, MicOrientation.END_FIRE, VoiceTarget.FAR);

Process and Use Audio

Audio data is passed into the com.yobe.speech.BioListener one buffer at a time. See Audio Buffers for more details on their format. The audio is encoded as PCM 16-bit Shorts.

// java
Object[] result = bioListener.ProcessBuffer(buffer);
short[] processedAudio = SpeechUtil.GetAudioFromResults(result);

result in the above example has an entry that contains the processed version of the audio that is contained in buffer. An example of what to do with this data is to append its contents to a stream or larger buffer.

Clean Up

To ensure proper clean up of the BioListener, simply call com.yobe.speech.BioListener.Deinit.

// java

bioListener.Deinit();

BioListenerSpeech (Real-Time Voice Extraction)

CAFE's real-time functionality is accessed via the com.yobe.speech.BioListenerSpeech class.

Then, in your project, you must either prompt the user for relevant permission or enable the permission in the app's settings on the device.

Initialization

A BioListenerSpeech object can be created using a class that implements com.yobe.speech.AudioConsumer.

// java

BioListenerSpeech bioListenerSpeech = new BioListenerSpeech(new MyAudioConsumer()); // creation of object

see Define Real-Time Processing Callbacks for details on MyAudioConsumer

Define Real-Time Processing Callbacks

Creating a com.yobe.speech.BioListenerSpeech object requires an object that implements the callback functions in the com.yobe.speech.AudioConsumer interface. These callback functions will receive processed audio buffers for further processing in real-time. Audio data is captured by the device's microphones, processed, and sent to the com.yobe.speech.AudioConsumer.onDataFeed callback function one buffer at a time. See Audio Buffers for more details on audio buffers. The status after each processing step is sent via a call to the com.yobe.speech.AudioConsumer.onResponse callback function.

The output buffers are arrays of short values. The output buffer size can also vary from call to call.

Note: The originalBuffer will contain two channels of interleaved audio data, while the processedBuffer will only contain one channel of audio data.

// java
import com.yobe.speech.*;
 
// create a class implementing the AudioConsumer callback functions
class MyAudioConsumer implements AudioConsumer {
    @Override
    public void onDataFeed(short[] originalBuffer, short[] processedBuffer) { /* do something with the original and/or processed buffers */ }
 
    @Override
    public void onResponse(Status code) { /* do something with the status */ }
}

Start Processing

Processing is started via com.yobe.speech.BioListenerSpeech.Start and stopped via com.yobe.speech.BioListenerSpeech.Stop. Once started, the callback functions will start being called with the processed audio buffers in real-time.

Note: A BioListenerSpeech object has a startup time of 5 seconds upon calling Start. The BioListenerSpeech object status will prompt for more data by reporting the com.yobe.speech.SpeechUtil.Status.NEEDS_MORE_DATA code in onResponse until this startup time has passed. After the startup time has passed, the onResponse will report com.yobe.speech.SpeechUtil.Status.OK.

// java

bioListenerSpeech.Start("YOBE_LICENSE"); // audio will start getting captured and processed

Clean Up

To stop and clean up the BioListenerSpeech object, simply call com.yobe.speech.BioListenerSpeech.Stop.

// java

bioListenerSpeech.Stop(); // no audio data is captured nor processed