Audio Buffers

Audio buffers are arrays that contain audio sample data. They are a fixed size and are passed into the Yobe processing functions which outputs processed versions of the buffers. The audio buffer data must be interleaved and 16-bit PCM encoded. The data is interleaved by pairing the samples from mic 1 followed by mic 2 in order. The buffer size in time can be retrieved using com.yobe.speech.Util.GetBufferSizeTime.

Automatic Speech Recognition (ASR)

Automatic Speech Recognition is a process through which audio can be transcribed into text to perform a specific function. The Yobe library output works well with various ASR engines by providing them with noise free audio data.

Far-Listening

This is the audio capture scenario where the target voice is at a relatively farther distance from the device under test, as compared to the noise to be suppressed.

Near-Listening

This is the audio capture scenario where the target voice is at a relatively closer distance from the device under test, as compared to the noise to be suppressed.

Voice Template

The voice template is an audio template related to a specific user that can be used as a reference to be able to identify said user.

Microphone Orientation

The Yobe SDK makes a distinction between two different microphone configurations: end-fire and broadside. These names correspond with physical placement of the microphones relating to the direction of the incoming speech (not the noise).

Broadside

This configuration positions the mics so that the segment created by the mics is orthogonal to the direction that the speaker is talking from. This results in the voice reaching both microphones at approximately the same time.

Broadside Configuration

End-Fire

This configuration positions the mics so that the segment created by the mics is coincident with the direction vector that the speaker is talking from. This results in the sound reaching each microphone with a non-negligible time difference, and mimics a common scenario for hand-held devices, like phones, where one mic is on the bottom and the other is on the top.

End-Fire Configuration

Output Buffer Type

This value determines the size of an output buffer returned by the Yobe ProcessBuffer function in the LATTE variant. The output buffer can be of type FIXED or VARIABLE.

FIXED buffer type

This type is recommended for most applications. The size of the output buffer will always be the same. This is idea for real-time scenarios where the processed audio is part of a chain of processing that is designed to output in real-time. An example application is an automatic muting functionality.

VARIABLE buffer type

THis type is more suited for applications where delay is tolerable and expected. The size of the output buffer may change, and must be evaluated at each ProcessBuffer call. An example application is one that sends processed audio to a cloud-based Automatic Speech Recognition (ASR) engine.

Table of Contents