Call Center SDK 1.10.0 Python API documentation¶
Passive voice spoof detection
- class voicesdk_cc.antispoof2.AntispoofEngine¶
Class for detecting spoofing attacks in audio with human speech.
- __init__(init_data_path: str)¶
AntispoofEngine constructor.
- Parameters
init_data_path (str) – Path to the directory containing engine init data.
- is_spoof_file(path_to_audio_file: str) antispoof2.AntispoofResult ¶
Tests whether given audio file contains spoofed speech.
- Parameters
path_to_audio_file (str) – Path to audio file.
- Returns
Antispoofing check result.
- Return type
- is_spoof_samples(*args, **kwargs)¶
Overloaded function.
is_spoof_samples(samples: numpy.ndarray[numpy.int16], sample_rate: int) -> antispoof2.AntispoofResult
Tests whether given audio samples contain spoofed speech.
- Parameters
samples (numpy.array) – PCM16 (numpy.int16) audio samples.
sample_rate (int) – Audio sample rate in Hz.
- Returns
Antispoofing check result.
- Return type
is_spoof_samples(samples: numpy.ndarray[numpy.uint8], sample_rate: int) -> antispoof2.AntispoofResult
Tests whether given audio samples contain spoofed speech.
- Parameters
samples (numpy.array) – PCM16 (numpy.uint8) audio samples.
sample_rate (int) – Audio sample rate in Hz.
- Returns
Antispoofing check result.
- Return type
is_spoof_samples(samples: numpy.ndarray[numpy.float32], sample_rate: int) -> antispoof2.AntispoofResult
Tests whether given audio samples contain spoofed speech.
- Parameters
samples (numpy.array) – float (numpy.float32) audio samples.
sample_rate (int) – Audio sample rate in Hz.
- Returns
Antispoofing check result.
- Return type
- class voicesdk_cc.antispoof2.AntispoofResult¶
Class for antispoofing check result obtained with AntispoofEngine::is_spoof functions.
- property score¶
Human score.
- Type
float
- property score_Replay¶
Replay attack score.
- Type
float
- property score_TTS¶
Text-to-speech attack score.
- Type
float
- property score_VC¶
Voice conversion attack score.
- Type
float
- property unsuitable_input_message¶
Message that explains the reason for audio input to be unsuitable for spoof check.
- Type
str
Specific speaker attributes retrieval
- class voicesdk_cc.attributes.Attributes¶
Estimated person attributes class.
- property age¶
Estimated age in years.
- Type
int
- property gender_score¶
Raw gender score (the bigger score corresponding to male and the smaller score corresponding to female).
- Type
float
- property phone_call_participant¶
Estimated phone call participant class (deprecated, always equals to UNDEFINED
- Type
PhoneCallParticipant
- class voicesdk_cc.attributes.AttributesEstimator¶
Class for estimating person attributes by their voice (age and gender).
- __init__(init_data_path: str)¶
Attributes estimator constructor.
- Parameters
init_data_path (str) – Path to the directory containing init data.
- estimate_with_file(path_to_wav_file: str) attributes.Attributes ¶
Estimates person attributes using given voice sample.
- Parameters
path_to_wav_file (str) – Path to WAV file.
- Returns
Estimated attributes.
- Return type
- estimate_with_samples(*args, **kwargs)¶
Overloaded function.
estimate_with_samples(samples: numpy.ndarray[numpy.int16], sample_rate: int) -> attributes.Attributes
Estimates person attributes using given voice sample.
- Parameters
samples (numpy.array) – PCM16 (numpy.int16) audio samples.
sample_rate (int) – Input audio signal sample rate in Hz.
- Returns
Estimated attributes.
- Return type
estimate_with_samples(samples: numpy.ndarray[numpy.uint8], sample_rate: int) -> attributes.Attributes
Estimates person attributes using given voice sample.
- Parameters
samples (numpy.array) – PCM16 (numpy.uint8) audio samples.
sample_rate (int) – Input audio signal sample rate in Hz.
- Returns
Estimated attributes.
- Return type
estimate_with_samples(samples: numpy.ndarray[numpy.float32], sample_rate: int) -> attributes.Attributes
Estimates person attributes using given voice sample.
- Parameters
samples (numpy.array) – float (numpy.float32) audio samples.
sample_rate (int) – Input audio signal sample rate in Hz.
- Returns
Estimated attributes.
- Return type
- class voicesdk_cc.attributes.Gender¶
Enumeration representing human gender.
- MALE¶
Male.
- FEMALE¶
Female.
Members:
MALE
FEMALE
Callcenter SDK specific routines
- class voicesdk_cc.callcenter.BuildInfo¶
Structure containing present VoiceSDK CC build info.
- property components¶
VoiceSDK components presented in build.
- Type
str
- property git_info¶
Git info dump at the build stage.
- Type
str
- property license_expiration_date¶
License expiration date in YYYY-MM-DD format.
- Type
str
- property license_info¶
Information (e.g. expiration date) about the installed license if available or an empty string if no license is in use. Deprecated, use license_expiration_date instead
- Type
str
- property version¶
VoiceSDK build version.
- Type
str
- voicesdk_cc.callcenter.get_build_info() callcenter.BuildInfo ¶
Returns current VoiceSDK CC build info.
- Returns
Current VoiceSDK CC build info.
- Return type
Utility functions for working with Opus data.
- voicesdk_cc.core.opusutils.read_as_pcm16_samples_from_memory(opus_file: str) tuple ¶
Reads Opus file from a memory buffer and decodes it to PCM16 samples buffer.
- Parameters
opus_file (bytes) – Memory buffer containing complete Opus file contents.
- Returns
Tuple with numpy.array (of numpy.int16 type) and signal sample rate.
- Return type
tuple
Core functionality needed in more than one module.
- class voicesdk_cc.core.AudioInterval¶
Class representing interval of audio data.
- __init__(*args, **kwargs)¶
Overloaded function.
__init__(start_sample: int, end_sample: int, sample_rate: int)
__init__(time_interval: voicesdk::TimeInterval, sample_rate: int)
- property end_sample¶
Sample number where interval ends (not inclusive).
- property end_time¶
Timestamp in milliseconds where AudioInterval ends (not inclusive).
- property sample_rate¶
Sample rate of corresponding audio.
- property start_sample¶
Sample number where interval starts.
- property start_time¶
Timestamp in milliseconds where AudioInterval starts.
- class voicesdk_cc.core.ChannelType¶
Enumeration for audio source labeling during voice template creation.
- MIC¶
Microphone audio channel.
- TEL¶
Telephone audio channel.
- MIXED¶
Mixed audio channel.
Members:
MIC
TEL
MIXED
- class voicesdk_cc.core.VoiceTemplate¶
Voice template class.
- __init__(bytes: bytes)¶
Constructs voice templates from its serialized representation.
- Parameters
bytes (bytes) – Bytes object with serialized voice template.
- static deserialize(bytes: bytes) core.VoiceTemplate ¶
Factory method, deserializes voice template from bytes.
- Parameters
bytes (bytes) – Bytes object with serialized voice template.
- Returns
voice template instance.
- Return type
- get_channel_type() core.ChannelType ¶
Returns voice template channel type which was specified by user on creation.
- Returns
Channel type
- Return type
- get_init_data_id() str ¶
Returns ID of the init data, which was used to create the template.
- Returns
A string containing init data ID
- Return type
str
- is_valid() bool ¶
Checks if voice template is valid or not.
- Returns
True if valid, else otherwise.
- Return type
bool
- static load_from_file(path_to_file: str) core.VoiceTemplate ¶
Factory method, restores voice template from the given file.
- Parameters
path_to_file (str) – Path to template file.
- Returns
voice template instance.
- Return type
- save_to_file(path_to_file: str)¶
Stores voice template in a file of the given path.
- Parameters
path_to_file (str) – Path to template file.
- serialize() bytes ¶
Serializes voice template to bytes.
- Returns
Serialized voice template.
- Return type
bytes
- class voicesdk_cc.core.VoiceTemplateConverter¶
Voice template conversion class.
- __init__(init_data_path: str)¶
VoiceTemplateConverter constructor.
- Parameters
init_data_path (str) – Path to the directory containing init data.
- convert_voice_template(voice_template: core.VoiceTemplate) core.VoiceTemplate ¶
Converts voice template from one configuration to another.
- Parameters
voice_template (VoiceTemplate) – Voice template to be converted.
- Returns
Converted voice template.
- Return type
- get_input_init_data_id() str ¶
Returns init data ID that voice template to be converted should have.
- Returns
A string containing init data ID of the voice template to be converted.
- Return type
str
- get_output_init_data_id() str ¶
Returns init data ID that converted voice template will have.
- Returns
A string containing init data ID of the converted voice template.
- Return type
str
- voicesdk_cc.core.set_num_threads(arg0: int)¶
Sets the maximum number of threads available for VoiceSDK.If 0 passed, then the optimal number of threads is detected automatically (the same effect is achieved if setNumThreads is not called).
- Parameters
num_threads (int) – Maximum number of threads available for VoiceSDK.
- voicesdk_cc.core.set_use_voice_template_compression(arg0: bool)¶
Sets whether to use compression for voice templates serialization. Voice template compression is not used by default.
- Parameters
use_voice_template_compression (bool) – Whether to use compression for voice templates serialization.
Utility functions for working with WAV data.
- voicesdk_cc.core.wavutils.read_as_float_samples(wav_file_path: object) tuple ¶
Reads WAV file as a float samples buffer (WAV file can be of any format).
- Parameters
wav_file_path (str or pathlib.Path) – Path to WAV file.
- Returns
Audio data and sample rate.
- Return type
tuple(numpy.ndarray, int)
- voicesdk_cc.core.wavutils.read_as_float_samples_16bit(wav_file_path: object) tuple ¶
Reads WAV file as a float samples buffer with 16-bit precision (WAV file can be of any format).
- Parameters
wav_file_path (str or pathlib.Path) – Path to WAV file.
- Returns
Audio data and sample rate.
- Return type
tuple(numpy.ndarray, int)
- voicesdk_cc.core.wavutils.read_as_pcm16_bytes(wav_file_path: object) tuple ¶
Reads WAV file as a PCM16 bytes buffer (WAV file can be of any format).
- Parameters
wav_file_path (str or pathlib.Path) – Path to WAV file.
- Returns
Audio data and sample rate.
- Return type
tuple(numpy.ndarray, int)
- voicesdk_cc.core.wavutils.read_as_pcm16_samples(wav_file_path: object) tuple ¶
Reads WAV file as a PCM16 samples buffer (WAV file can be of any format).
- Parameters
wav_file_path (str or pathlib.Path) – Path to WAV file.
- Returns
Audio data and sample rate.
- Return type
tuple(numpy.ndarray, int)
Speaker distinction detection
- class voicesdk_cc.diarization.DiarizationEngine¶
Diarization engine class, an entry point for speaker diarization.
- __init__(init_data_path: str)¶
DiarizationEngine constructor.
- Parameters
init_data_path (str) – Path to the directory containing engine init data.
- get_segmentation_from_file(path_to_wav_file: str, num_speakers: int = 0) Dict[int, List[core.AudioInterval]] ¶
Performs speaker diarization of the given WAV file.
- Parameters
path_to_wav_file (str) – Path to WAV file.
num_speakers (
int
, optional) – Optional number of speakers.
- Returns
Dict containing an AudioInterval lists, each key corresponds to a certain speaker.
- Return type
dict
- get_segmentation_from_samples(*args, **kwargs)¶
Overloaded function.
get_segmentation_from_samples(samples: numpy.ndarray[numpy.int16], sample_rate: int, num_speakers: int = 0) -> Dict[int, List[core.AudioInterval]]
Performs speaker diarization of the given PCM16 samples.
- Parameters
samples (numpy.array) – PCM16 (numpy.int16) audio samples.
sample_rate (int) – Audio sample rate in Hz.
num_speakers (
int
, optional) – Optional number of speakers.
- Returns
Dict containing an AudioInterval lists, each key corresponds to a certain speaker.
- Return type
dict
get_segmentation_from_samples(samples: numpy.ndarray[numpy.uint8], sample_rate: int, num_speakers: int = 0) -> Dict[int, List[core.AudioInterval]]
Performs speaker diarization of the given PCM16 samples.
- Parameters
samples (numpy.array) – PCM16 (numpy.uint8) audio samples.
sample_rate (int) – Audio sample rate in Hz.
num_speakers (
int
, optional) – Optional number of speakers.
- Returns
Dict containing an AudioInterval lists, each key corresponds to a certain speaker.
- Return type
dict
get_segmentation_from_samples(samples: numpy.ndarray[numpy.float32], sample_rate: int, num_speakers: int = 0) -> Dict[int, List[core.AudioInterval]]
Performs speaker diarization of the given PCM16 samples.
- Parameters
samples (numpy.array) – float (numpy.float32) audio samples.
sample_rate (int) – Audio sample rate in Hz.
num_speakers (
int
, optional) – Optional number of speakers.
- Returns
Dict containing an AudioInterval lists, each key corresponds to a certain speaker.
- Return type
dict
Audio/speech utilities and estimators
- class voicesdk_cc.media.QualityCheckEngine¶
Quality check engine class.
- __init__(init_data_path: str)¶
QualityCheckEngine constructor.
- Parameters
init_data_path (str) – Path to the directory containing initialization data.
- check_quality_from_file(path_to_audio_file: str, thresholds: media.QualityCheckMetricsThresholds) media.QualityCheckEngineResult ¶
- Parameters
path_to_audio_file (str) – Path to audio file.
thresholds – (QualityCheckMetricsThresholds): Quality checking thresholds that will be applied to the output quality check metrics.
- Returns
Quality check result.
- Return type
- check_quality_from_samples(*args, **kwargs)¶
Overloaded function.
check_quality_from_samples(samples: numpy.ndarray[numpy.int16], sample_rate: int, thresholds: media.QualityCheckMetricsThresholds) -> media.QualityCheckEngineResult
Checks whether audio buffer is suitable from the quality perspective, from the given PCM16 samples.
- Parameters
samples (numpy.array) – PCM16 (numpy.int16) audio samples.
sample_rate (int) – Audio sample rate in Hz.
thresholds – (QualityCheckMetricsThresholds): Quality checking thresholds that will be applied to the output quality check metrics.
- Returns
Quality check result.
- Return type
check_quality_from_samples(samples: numpy.ndarray[numpy.uint8], sample_rate: int, thresholds: media.QualityCheckMetricsThresholds) -> media.QualityCheckEngineResult
Checks whether audio buffer is suitable from the quality perspective, from the given PCM16 audio bytes.
- Parameters
samples (numpy.array) – PCM16 (numpy.uint8) audio samples.
sample_rate (int) – Audio sample rate in Hz.
thresholds – (QualityCheckMetricsThresholds): Quality checking thresholds that will be applied to the output quality check metrics.
- Returns
Quality check result.
- Return type
check_quality_from_samples(samples: numpy.ndarray[numpy.float32], sample_rate: int, thresholds: media.QualityCheckMetricsThresholds) -> media.QualityCheckEngineResult
Checks whether audio buffer is suitable from the quality perspective.
- Parameters
samples (numpy.array) – Float (numpy.float32) audio samples, from the given float audio samples.
sample_rate (int) – Audio sample rate in Hz.
thresholds – (QualityCheckMetricsThresholds): Quality checking thresholds that will be applied to the output quality check metrics.
- Returns
Quality check result.
- Return type
- get_recommended_thresholds(scenario: media.QualityCheckScenario) media.QualityCheckMetricsThresholds ¶
- Parameters
scenario – (QualityCheckScenario): Scenario for which recommended thresholds will be returned.
- Returns
Quality check thresholds that can be used on quality checking.
- Return type
- class voicesdk_cc.media.QualityCheckEngineResult¶
Quality check result class
- property multiple_speakers_detector_score¶
Multiple speakers detector score value obtained on quality check
- Type
float
- property quality_check_short_description¶
Short description of the quality check results
- property snr_db¶
SNR metric value obtained on quality check in Db
- Type
float
- property speech_length_ms¶
Speech length metric value obtained on quality check in milliseconds
- Type
float
- property speech_relative_length¶
Speech relative length (speech length relative to the total audio length) metric value obtained on quality check
- Type
float
- class voicesdk_cc.media.QualityCheckMetricsThresholds¶
Class for quality checking thresholds.
- __init__(*args, **kwargs)¶
Overloaded function.
__init__()
__init__(minimum_snr_db: float, minimum_speech_length_ms: float, minimum_speech_relative_length: float, maximum_multiple_speakers_detector_score: float)
maximum_multiple_speakers_detector_score (float): Maximum multiple speakers detector score allowed to pass quality check.
- property maximum_multiple_speakers_detector_score¶
Maximum multiple speakers detector score allowed to pass quality check.
- Type
float
- property minimum_snr_db¶
Minimum signal-to-noise ratio required to pass quality check in dB.
- Type
float
- property minimum_speech_length_ms¶
Minimum speech length required to pass quality check in milliseconds.
- Type
float
- property minimum_speech_relative_length¶
Minimum speech relative length (speech length relative to the total audio length) required to pass quality check.
- Type
float
- class voicesdk_cc.media.QualityCheckScenario¶
Enumeration representing scenarios used to get recommended quality check thresholds.
VERIFY_TI_ENROLLMENT: Verification, TI enrollment step. VERIFY_TI_VERIFICATION: Verification, TI verification step. VERIFY_TD_ENROLLMENT: Verification, TD enrollment step. VERIFY_TD_VERIFICATION: Verification, TD verification step. LIVENESS: Liveness check.
Members:
VERIFY_TI_ENROLLMENT
VERIFY_TI_VERIFICATION
VERIFY_TD_ENROLLMENT
VERIFY_TD_VERIFICATION
LIVENESS
- class voicesdk_cc.media.QualityCheckShortDescription¶
Enumeration representing short descriptions of the audio quality check results.
TOO_NOISY: Too noisy audio. TOO_SMALL_SPEECH_TOTAL_LENGTH: Too small speech length in the audio. TOO_SMALL_SPEECH_RELATIVE_LENGTH: Too small speech relative length (speech length relative to the total audio length). MULTIPLE_SPEAKERS_DETECTED: Multiple speakers detected. OK: Audio successfully passed quality check.
Members:
TOO_NOISY
TOO_SMALL_SPEECH_TOTAL_LENGTH
TOO_SMALL_SPEECH_RELATIVE_LENGTH
MULTIPLE_SPEAKERS_DETECTED
OK
- class voicesdk_cc.media.SNRComputer¶
SNRComputer class, intended to calculate signal-to-noise (SNR) ratio with given audio signal.
- __init__(init_data_path: str)¶
SNRComputer constructor.
- Parameters
init_data_path (str) – Path to the directory containing init data.
- compute_with_file(path_to_audio_file: str) float ¶
Calculates SNR with given audio file.
- Parameters
path_to_audio_file (str) – Path to audio file.
- Returns
Computed SNR in dB.
- Return type
float
- compute_with_samples(*args, **kwargs)¶
Overloaded function.
compute_with_samples(samples: numpy.ndarray[numpy.int16], sample_rate: int) -> float
Calculates SNR with given PCM16 audio samples.
- Parameters
samples (numpy.array) – PCM16 (numpy.int16) audio samples.
sample_rate (int) – Input audio signal sample rate in Hz.
- Returns
Computed SNR in dB.
- Return type
float
compute_with_samples(samples: numpy.ndarray[numpy.uint8], sample_rate: int) -> float
Calculates SNR with given PCM16 audio samples.
- Parameters
samples (numpy.array) – PCM16 (numpy.uint8) audio samples.
sample_rate (int) – Input audio signal sample rate in Hz.
- Returns
Computed SNR in dB.
- Return type
float
compute_with_samples(samples: numpy.ndarray[numpy.float32], sample_rate: int) -> float
Calculates SNR with given PCM16 audio samples.
- Parameters
samples (numpy.array) – float (numpy.float32) audio samples.
sample_rate (int) – Input audio signal sample rate in Hz.
- Returns
Computed SNR in dB.
- Return type
float
- class voicesdk_cc.media.SpeechEndpointDetector¶
Speech processor class for speech endpoint detection.
- __init__(min_speech_length_ms: int, max_silence_length_ms: int, sample_rate: int)¶
SpeechEndpointDetector constructor.
- Parameters
min_speech_length_ms (int) – Silence after speech threshold used to determine if speech is already ended (ms).
max_silence_length_ms (int) – Minimum speech length required to begin speech end detection (ms).
sample_rate (int) – Input audio sample rate.
- add_samples(*args, **kwargs)¶
Overloaded function.
add_samples(samples: numpy.ndarray[numpy.int16])
Adds new audio samples to the SpeechEndpointDetector.
- Parameters
samples (numpy.array) – PCM16 (numpy.int16) audio samples.
Note: Audio sample rate is predefined in the constructor.
add_samples(samples: numpy.ndarray[numpy.uint8])
Adds new audio samples to the SpeechEndpointDetector.
- Parameters
samples (numpy.array) – PCM16 (numpy.uint8) audio samples.
Note: Audio sample rate is predefined in the constructor.
add_samples(samples: numpy.ndarray[numpy.float32])
Adds new audio samples to the SpeechEndpointDetector.
- Parameters
samples (numpy.array) – float (numpy.float32) audio samples.
Note: Audio sample rate is predefined in the constructor.
- is_speech_ended() bool ¶
Returns detection state.
- Returns
True if speech end was detected, False otherwise.
- Return type
bool
- reset()¶
Resets detector state.
- class voicesdk_cc.media.SpeechEndpointDetectorOpus¶
Speech processor class for speech endpoint detection in the Opus audio stream.
- __init__(min_speech_length_ms: int, max_silence_length_ms: int, sample_rate: int)¶
SpeechEndpointDetectorOpus constructor.
- Parameters
min_speech_length_ms (int) – Silence after speech threshold used to determine if speech is already ended (ms).
max_silence_length_ms (int) – Minimum speech length required to begin speech end detection (ms).
sample_rate (int) – Input audio sample rate.
- add_packet(bytes: numpy.ndarray[numpy.uint8])¶
Adds Opus packet to the SpeechEndpointDetectorOpus.
- Parameters
bytes (numpy.array) – A buffer containing single Opus packet. It is expected that packet contains data for single mono stream.
Note: Audio sample rate is predefined in the constructor.
- is_speech_ended() bool ¶
Returns detection state.
- Returns
True if speech end was detected, False otherwise.
- Return type
bool
- reset()¶
Resets detector state.
- class voicesdk_cc.media.SpeechEvent¶
Class representing a single speech event.
- property audio_interval¶
Speech event audio interval.
- Type
- property is_voice¶
Whether the frame contains speech or not.
- Type
bool
- class voicesdk_cc.media.SpeechInfo¶
Class that contains metrics related to Voice Activity Detection.
- property background_length_ms¶
Total length of non-speech signal, milliseconds.
- Type
float
- property speech_length_ms¶
Total accumulated speech duration, milliseconds.
- Type
float
- property total_length_ms¶
Total audio record duration, milliseconds.
- Type
float
- class voicesdk_cc.media.SpeechSummary¶
Speech summary class.
- property speech_events¶
Retrieves list of speech events.
- Returns
speech events.
- Return type
list
- property speech_info¶
Retrieves speech info data.
- Returns
speech info.
- Return type
- class voicesdk_cc.media.SpeechSummaryEngine¶
Speech summary engine class, intended to calculate SpeechSummary with given audio samples.
- __init__(init_data_path: str)¶
SpeechSummaryEngine constructor.
- Parameters
init_data_path (str) – Path to the directory containing engine init data.
- create_stream(stream_sample_rate: int) voicesdk::media::python::SpeechSummaryStreamPy ¶
Factory method for creating SpeechSummaryStream.
- Parameters
stream_sample_rate (int) – Audio stream sample rate in Hz.
- Returns
Created speech summary stream.
- Return type
- get_speech_summary_from_file(path_to_audio_file: str) voicesdk::SpeechSummary ¶
Calculates speech summary with given audio file.
- Parameters
path_to_audio_file (str) – Path to audio file.
- Returns
Speech summary.
- Return type
- get_speech_summary_from_samples(*args, **kwargs)¶
Overloaded function.
get_speech_summary_from_samples(samples: numpy.ndarray[numpy.int16], sample_rate: int) -> voicesdk::SpeechSummary
Calculates speech summary with given PCM16 audio samples.
- Parameters
samples (numpy.array) – PCM16 (numpy.int16) audio samples.
sample_rate (int) – Audio sample rate in Hz.
- Returns
Speech summary.
- Return type
get_speech_summary_from_samples(samples: numpy.ndarray[numpy.uint8], sample_rate: int) -> voicesdk::SpeechSummary
Calculates speech summary with given PCM16 audio samples.
- Parameters
samples (numpy.array) – PCM16 (numpy.uint8) audio samples.
sample_rate (int) – Audio sample rate in Hz.
- Returns
Speech summary.
- Return type
get_speech_summary_from_samples(samples: numpy.ndarray[numpy.float32], sample_rate: int) -> voicesdk::SpeechSummary
Calculates speech summary with given PCM16 audio samples.
- Parameters
samples (numpy.array) – float (numpy.float32) audio samples.
sample_rate (int) – Audio sample rate in Hz.
- Returns
Speech summary.
- Return type
- class voicesdk_cc.media.SpeechSummaryStream¶
Stateful speech summary stream class, intended to calculate SpeechSummary with audio samples given in stream. New instance can be obtained with a SpeechSummaryEngine instance.
- add_samples(*args, **kwargs)¶
Overloaded function.
add_samples(samples: numpy.ndarray[numpy.int16])
Adds PCM16 audio samples to process.
- Parameters
samples (numpy.array) – PCM16 (numpy.int16) audio samples.
Note: Audio sample rate is predefined at the stream creation time.
add_samples(samples: numpy.ndarray[numpy.uint8])
Adds PCM16 audio samples to process.
- Parameters
samples (numpy.array) – PCM16 (numpy.uint8) audio samples.
Note: Audio sample rate is predefined at the stream creation time.
add_samples(samples: numpy.ndarray[numpy.float32])
Adds PCM16 audio samples to process.
- Parameters
samples (numpy.array) – float (numpy.float32) audio samples.
Note: Audio sample rate is predefined at the stream creation time.
- finalize()¶
Finalizes input audio stream to process remaining audio samples and produce result if it’s possible.
- get_current_background_length() float ¶
Returns current background length in milliseconds.
- Returns
Current background length in milliseconds.
- Return type
float
- get_speech_event() media.SpeechEvent ¶
Retrieves speech event from output queue.
- Returns
One speech event.
- Return type
- Raises
RuntimeError – If output queue is empty.
Note: Use has_speech_events() to check if there are available speech event.
- get_total_speech_info() media.SpeechInfo ¶
Retrieves accumulated speech info data.
- Returns
speech info.
- Return type
- get_total_speech_summary() voicesdk::SpeechSummary ¶
Retrieves accumulated speech summary data.
- Returns
speech summary.
- Return type
- has_speech_events() bool ¶
Checks if any speech events are present in stream queue.
- Returns
True if any events are present, false otherwise.
- Return type
bool
- reset()¶
Resets stream state: clears buffer, resets speech summary.
- class voicesdk_cc.media.SpeechSummaryStreamOpus¶
Stateful speech summary stream class, intended to calculate SpeechSummary with audio samples given in Opus stream. New instance can be obtained with a SpeechSummaryEngine instance.
- __init__(init_data_path: str, sample_rate: int)¶
SpeechSummaryStreamOpus constructor.
- Parameters
init_data_path (str) – Path to the directory containing engine init data.
sample_rate (int) – Audio stream sample rate in Hz.
- add_packet(bytes: numpy.ndarray[numpy.uint8])¶
Adds Opus packet to the SpeechSummaryStreamOpus.
- Parameters
bytes (numpy.array) – A buffer containing single Opus packet. It is expected that packet contains data for single mono stream.
Note: Audio sample rate is predefined in the constructor.
- finalize()¶
Finalizes input audio stream to process remaining audio samples and produce result if it’s possible.
- get_current_background_length() float ¶
Returns current background length in milliseconds.
- Returns
Current background length in milliseconds.
- Return type
float
- get_speech_event() media.SpeechEvent ¶
Retrieves speech event from output queue.
- Returns
One speech event.
- Return type
- Raises
RuntimeError – If output queue is empty.
Note: Use has_speech_events() to check if there are available speech event.
- get_total_speech_info() media.SpeechInfo ¶
Retrieves accumulated speech info data.
- Returns
speech info.
- Return type
- get_total_speech_summary() voicesdk::SpeechSummary ¶
Retrieves accumulated speech summary data.
- Returns
speech summary.
- Return type
- has_speech_events() bool ¶
Checks if any speech events are present in stream queue.
- Returns
True if any events are present, false otherwise.
- Return type
bool
- reset()¶
Resets stream state: clears buffer, resets speech summary.
Enrollment and verification for speaker audio
- class voicesdk_cc.verify.VerifyResult¶
Voice verification result class.
- property probability¶
Voice matching probability from 0 to 1, should be used for making a biometrics authentication decision.
- Type
float
- property score¶
Raw verification score, intended to be used for evaluation and data-wise calibration.
- Type
float
- class voicesdk_cc.verify.VerifyStreamResult¶
Streaming voice verification result class.
- property audio_interval¶
Audio interval, which verify result refers to.
- Type
- property verify_result¶
Voice verification result.
- Type
voicesdk.verify.VerifyResult
- class voicesdk_cc.verify.VoiceTemplateFactory¶
Voice verification engine class.
- __init__(init_data_path: str)¶
VoiceTemplateFactory constructor.
- Parameters
init_data_path (str) – Path to the directory containing factory init data.
- check_quality_from_file(path_to_audio_file: str) verify.QualityCheckResult ¶
- Parameters
path_to_audio_file (str) – Path to audio file.
- Returns
Quality check result.Note:
- Return type
QualityCheckResult
Audio file sampling frequency should be equal to or greater than the value returned by get_minimum_audio_sample_rate.
- check_quality_from_samples(*args, **kwargs)¶
Overloaded function.
check_quality_from_samples(samples: numpy.ndarray[numpy.int16], sample_rate: int) -> verify.QualityCheckResult
Deprecated, use QualityCheckEngine API from media component instead. Checks whether audio buffer is suitable to use as voice enrollment entry from the quality perspective.
- Parameters
samples (numpy.array) – PCM16 (numpy.int16) audio samples.
sample_rate (int) – Audio sample rate in Hz.
- Returns
Quality check result.Note:
- Return type
QualityCheckResult
Sampling frequency should be equal to or greater than the value returned by get_minimum_audio_sample_rate.
check_quality_from_samples(samples: numpy.ndarray[numpy.uint8], sample_rate: int) -> verify.QualityCheckResult
Deprecated, use QualityCheckEngine API from media component instead. Checks whether audio buffer is suitable to use as voice enrollment entry from the quality perspective.
- Parameters
samples (numpy.array) – PCM16 (numpy.uint8) audio samples.
sample_rate (int) – Audio sample rate in Hz.
- Returns
Quality check result.Note:
- Return type
QualityCheckResult
Sampling frequency should be equal to or greater than the value returned by get_minimum_audio_sample_rate.
check_quality_from_samples(samples: numpy.ndarray[numpy.float32], sample_rate: int) -> verify.QualityCheckResult
Deprecated, use QualityCheckEngine API from media component instead. Checks whether audio buffer is suitable to use as voice enrollment entry from the quality perspective.
- Parameters
samples (numpy.array) – Float (numpy.float32) audio samples.
sample_rate (int) – Audio sample rate in Hz.
- Returns
Quality check result.Note:
- Return type
QualityCheckResult
Sampling frequency should be equal to or greater than the value returned by get_minimum_audio_sample_rate.
- create_voice_template_batch_from_file(input_batch: List[verify.VerifyFileBatchElement]) List[core.VoiceTemplate] ¶
Creates multiple voice templates from the contents of the given WAV files.
Args: input_batch (list): List of VerifyFileBatchElement Returns: Created voice templates.
Note: This API is experimental and subject to change.
- create_voice_template_batch_from_samples(*args, **kwargs)¶
Overloaded function.
create_voice_template_batch_from_samples(input_batch: List[verify.VerifySamplesBatchElementFloat]) -> List[core.VoiceTemplate]
Creates multiple voice templates from given audio samples.
Args: input_batch (list): List of VerifySamplesBatchElementFloat Returns: Created voice templates.
Note: This API is experimental and subject to change.
create_voice_template_batch_from_samples(input_batch: List[verify.VerifySamplesBatchElementInt16]) -> List[core.VoiceTemplate]
Creates multiple voice templates from given audio samples.
Args: input_batch (list): List of VerifySamplesBatchElementInt32 Returns: Created voice templates.
Note: This API is experimental and subject to change.
create_voice_template_batch_from_samples(input_batch: List[verify.VerifySamplesBatchElementUint8]) -> List[core.VoiceTemplate]
Creates multiple voice templates from given audio samples.
Args: input_batch (list): List of VerifySamplesBatchElementUint8 Returns: Created voice templates.
Note: This API is experimental and subject to change.
- create_voice_template_from_file(path_to_audio_file: str, channel_type: core.ChannelType = <ChannelType.TEL: 2>) core.VoiceTemplate ¶
Creates voice template from the contents of the given audio file.
- Parameters
path_to_audio_file (str) – Path to audio file.
channel_type (
ChannelType
, optional) – Input audio channel type, default is ChannelType.TEL.
- Returns
Created voice template.Note:
- Return type
Audio file sampling frequency should be equal to or greater than the value returned by get_minimum_audio_sample_rate.
- create_voice_template_from_samples(*args, **kwargs)¶
Overloaded function.
create_voice_template_from_samples(samples: numpy.ndarray[numpy.int16], sample_rate: int, channel_type: core.ChannelType = <ChannelType.TEL: 2>) -> core.VoiceTemplate
Creates voice template from audio samples.
- Parameters
samples (numpy.array) – PCM16 (numpy.int16) audio samples.
sample_rate (int) – Audio sample rate in Hz.
channel_type (
ChannelType
, optional) – Input audio channel type, default is ChannelType.TEL.
- Returns
Created voice template.Note:
- Return type
Sampling frequency should be equal to or greater than the value returned by get_minimum_audio_sample_rate.
create_voice_template_from_samples(samples: numpy.ndarray[numpy.uint8], sample_rate: int, channel_type: core.ChannelType = <ChannelType.TEL: 2>) -> core.VoiceTemplate
Creates voice template from audio samples.
- Parameters
samples (numpy.array) – PCM16 (numpy.uint8) audio samples.
sample_rate (int) – Audio sample rate in Hz.
channel_type (
ChannelType
, optional) – Input audio channel type, default is ChannelType.TEL.
- Returns
Created voice template.Note:
- Return type
Sampling frequency should be equal to or greater than the value returned by get_minimum_audio_sample_rate.
create_voice_template_from_samples(samples: numpy.ndarray[numpy.float32], sample_rate: int, channel_type: core.ChannelType = <ChannelType.TEL: 2>) -> core.VoiceTemplate
Creates voice template from audio samples.
- Parameters
samples (numpy.array) – Float (numpy.float32) audio samples.
sample_rate (int) – Audio sample rate in Hz.
channel_type (
ChannelType
, optional) – Input audio channel type, default is ChannelType.MI.
- Returns
Created voice template.Note:
- Return type
Sampling frequency should be equal to or greater than the value returned by get_minimum_audio_sample_rate.
- get_init_data_id() str ¶
Returns ID of the init data, which was used to create the factory.
- Returns
A string containing init data ID.
- Return type
str
- get_minimum_audio_sample_rate() int ¶
Returns minimum supported input audio sampling frequency in Hz.
- Returns
A minimum sampling rate in Hz.
- Return type
int
- merge_voice_templates(voice_templates: list) core.VoiceTemplate ¶
Merges a list of voice templates of the same speaker producing a union template.
- Parameters
voice_templates (list) – List of voice templates.
- Returns
A union voice template.Note:
- Return type
All the templates should have the same init data ID as the factory instance.
- class voicesdk_cc.verify.VoiceTemplateMatcher¶
Voice verification engine class.
- __init__(init_data_path: str)¶
VoiceTemplateMatcher constructor.
- Parameters
init_data_path (str) – Path to the directory containing matcher init data.
- get_init_data_id() str ¶
Returns ID of the init data, which was used to create the matcher.
- Returns
A string containing init data ID.
- Return type
str
- match_voice_templates(template1: core.VoiceTemplate, template2: core.VoiceTemplate) verify.VerifyResult ¶
Matches two voice templates one-to-one.
- Parameters
template1 (VoiceTemplate) – First voice template.
template2 (VoiceTemplate) – Second voice template.
- Returns
Verification result.Note:
- Return type
voicesdk.verify.VerifyResult
Both templates should have the same init data ID as the matcher instance.
- class voicesdk_cc.verify.VoiceVerifyStream¶
Class for continuous voice verification using audio stream.
- __init__(voice_template_factory: voicesdk::verify::python::VoiceTemplateFactoryPy, voice_template_matcher: voicesdk::verify::python::VoiceTemplateMatcherPy, voice_templates: List[core.VoiceTemplate], sample_rate: int, audio_context_length_seconds: int = 10, window_length_seconds: float = 3)¶
Note: Sampling frequency should be equal to or greater than the value returned by voice_template_factory.get_minimum_audio_sample_rate(). Voice template matcher, voice template factory and voice template should have the same init data ID.
- add_samples(*args, **kwargs)¶
Overloaded function.
add_samples(samples: numpy.ndarray[numpy.int16])
Adds PCM16 audio samples to process.
- Parameters
samples (numpy.array) – PCM16 (numpy.int16) audio samples.Note:
Audio sample rate is predefined at the create time. If exception was thrown, then stream state is being reset except the accumulated results buffer.
add_samples(samples: numpy.ndarray[numpy.uint8])
Adds PCM16 audio samples to process.
- Parameters
samples (numpy.array) – PCM16 (numpy.uint8) audio samples.Note:
Audio sample rate is predefined at the create time. If exception was thrown, then stream state is being reset except the accumulated results buffer.
add_samples(samples: numpy.ndarray[numpy.float32])
Adds PCM16 audio samples to process.
- Parameters
samples (numpy.array) – float (numpy.float32) audio samples.Note:
Audio sample rate is predefined at the create time. If exception was thrown, then stream state is being reset except the accumulated results buffer.
- finalize()¶
Finalizes input audio stream to process remaining audio samples and produce result if it’s possible.
- get_verify_result() List[verify.VerifyStreamResult] ¶
Retrieves verify result from output queue.
- Returns
One verify result.
- Return type
- Raises
RuntimeError – If output queue is empty.Note:
Use has_verify_results() to check if there are available verify results.
- has_verify_results() bool ¶
Checks if there are verify results in output queue.
- Returns
True is there are results available, else otherwise.
- Return type
bool
- reset()¶
Resets stream state.
- class voicesdk_cc.verify.VoiceVerifyStreamOpus¶
Class for continuous voice verification using Opus audio stream.
- __init__(voice_template_factory: voicesdk::verify::python::VoiceTemplateFactoryPy, voice_template_matcher: voicesdk::verify::python::VoiceTemplateMatcherPy, voice_template: core.VoiceTemplate, sample_rate: int, audio_context_length_seconds: int = 10)¶
audio_context_length_seconds (
int
, optional): Length of audio context for voice verification in seconds, must be at least 3 seconds.Note:Sampling frequency should be equal to or greater than the value returned by voice_template_factory.get_minimum_audio_sample_rate(). Voice template matcher, voice template factory and voice template should have the same init data ID.
- add_packet(bytes: numpy.ndarray[numpy.uint8])¶
Adds Opus packet to process.
- Parameters
bytes (numpy.array) – Opus packet bytes (numpy.uint8). It is expected that packet contains data for single mono stream.Note:
Audio sample rate is predefined at the create time. If exception was thrown, then stream state is being reset except the accumulated results buffer.
- finalize()¶
Finalizes input audio stream to process remaining audio samples and produce result if it’s possible.
- get_verify_result() verify.VerifyStreamResult ¶
Retrieves verify result from output queue.
- Returns
One verify result.
- Return type
- Raises
RuntimeError – If output queue is empty.Note:
Use has_verify_results() to check if there are available verify results.
- has_verify_results() bool ¶
Checks if there are verify results in output queue.
- Returns
True is there are results available, else otherwise.
- Return type
bool
- reset()¶
Resets stream state.