The GrandE (Grand Engine) variant within the Yobe SDK combines all the functionality that is possible within the Yobe SDK. This includes the following capabilities and settings (take note of the exclusivity of certain settings):
This guide will be broken up into sections describing each function of the Yobe SDK and how to access them within the GrandE variant.
Place the provided libraries and header files in a location that can be discovered by your application's build system.
The BioListener is meant for situations where a person is talking while there are noise sources (including other voices). The job of the BioListener is to extract the voice of a person while pushing the noise sources further into the auditory background.
Yobe::Create::NewBioListener is used to obtain a shared pointer to a new Yobe::BioListener instance. The instance must then be initialized via Yobe::BioListener::Init using the license provided by Yobe.
The BioListener can be initialized while specifying either a Near-Listening or Far-Listening target. See their respective glossary entries for more information.
You can verify the selected voice target using the following function:
Audio data is passed into the Yobe::BioListener one buffer at a time. See Audio Buffers for more details on their format. As seen in the method signatures for the Yobe::BioListener::ProcessBuffer functions, the audio can be encoded as Double or PCM 16-bit Integer. The output buffer size can also vary from call to call.
out_buffer
in the above example now contains the processed version of the audio that is contained in input_buffer
. An example of what to do with this out_buffer
is to append its contents to a stream or larger buffer.
Note: You can find the library's built in buffer size using Yobe::Info::InputBufferSize.
To ensure BioListener is properly deinitialized, simply call Yobe::BioListener::Deinit.
The IDListener is used to perform certain actions with the voice of a pre-enrolled user. The IDListener has two output targets that define its functionality:
The IDListener with ASR output target detects the voice of a pre-enrolled user and extracts it from surrounding noise and human crosstalk. The software enrolls a user on the basis of 10-20 seconds of unscripted speech from that user. The IDListener provides high quality speaker voice signals enabling Automatic Speech Recognition (ASR) platforms to function properly and with a high degree of accuracy in extremely complex environments . A typical use case for this is an enrolled user talking to a kiosk or point of sale tablet in noisy human crosstalk environments (e.g. restaurants, drive-thrus, shopping malls and other public spaces).
The IDListener with AVR output target performs AVR on the voice of a pre-enrolled user to extract their voice from surrounding noise and human crosstalk to determine when the pre-enrolled user is silent and consequently mute the entire signal at those times. The software also has the capability of enrolling a user on the basis of 10-20 seconds of unscripted speech. The Voice Template is used to determine the intervals that a pre-enrolled user is not talking. During those intervals, the audio signal is muted. Typical use cases for this mode are:
Yobe::Create::NewIDListener is used to obtain a shared pointer to a new Yobe::IDListener instance. The instance must then be initialized using the license and a path to the initialization data, both provided by Yobe.
As with the BioListener, the IDListener is specified to initialize for either Near-Listening or Far-Listening. Additionally, the initialization specifies whether the IDListener is in ASR or AVR output mode.
You can verify the selected output mode using the following function:
A Yobe::BiometricTemplate must be created using the desired user's voice so the IDListener can select that template and identify the voice.
Register the user by inputting their voice audio data using Yobe::IDListener::RegisterTemplate. This is done using a continuous array of audio samples.
It is recommended to first process the audio data so that only speech is present in the audio; this will yield better identification results. To achieve this, the IDListener can be placed into Enrollment Mode. In this mode, the audio that should be used to create a BiometricTemplate is processed, buffer-by-buffer, using Yobe::IDListener::ProcessBuffer. This ProcessBuffer will return a status of Yobe::Status::ENROLLING as long as the buffers are being processed in Enrollment Mode. Enrollment Mode is started by calling Yobe::IDListener::StartEnrollment, and is stopped by either manually calling Yobe::IDListener::StopEnrollment or by processing enough buffers for it to stop automatically based on an internal counter (currently, this is enough buffers to equal 20 seconds of audio).
Enrollment Mode can be started at any point after initial calibration. Any samples processed while in Enrollment Mode will not be matched for identification to a selected template, if there is one.
Select the user using the template returned by the registration.
Any new audio buffers passed to Yobe::IDListener::ProcessBuffer while not in Enrollment Mode will be processed with respect to the selected user's voice.
Audio data is passed into the Yobe::IDListener one buffer at a time. See Audio Buffers for more details on their format. As seen in the method signatures for the Yobe::IDListener::ProcessBuffer functions, the audio can be encoded as Double or PCM 16-bit Integer. The output buffer size can also vary from call to call. You can use this variable to prepare your application to deal with the buffers accordingly. In this case, it only happens when there is a transition from unauthorized to authorized state.
out_buffer
in the above example now contains the processed version of the audio that is contained in input_buffer
. An example of what to do with this out_buffer
is to append its contents to a stream or larger buffer.
Note: You can find the library's built in buffer size using Yobe::Info::InputBufferSize.
To ensure proper clean up of the IDListener, simply call Yobe::IDListener::Deinit.