Call Center SDK 1.11.1 Python API documentation¶
Passive voice spoof detection
- class voicesdk_cc.antispoof2.AntispoofEngine¶
Class for detecting spoofing attacks in audio with human speech.
- __init__(init_data_path: str)¶
AntispoofEngine constructor.
- Parameters:
init_data_path (str) – Path to the directory containing engine init data.
- is_spoof_file(path_to_audio_file: str) antispoof2.AntispoofResult ¶
Tests whether given audio file contains spoofed speech.
- Parameters:
path_to_audio_file (str) – Path to audio file.
- Returns:
Antispoofing check result.
- Return type:
- is_spoof_samples(*args, **kwargs)¶
Overloaded function.
is_spoof_samples(samples: numpy.ndarray[numpy.int16], sample_rate: int) -> antispoof2.AntispoofResult
Tests whether given audio samples contain spoofed speech.
- Parameters:
samples (numpy.array) – PCM16 (numpy.int16) audio samples.
sample_rate (int) – Audio sample rate in Hz.
- Returns:
Antispoofing check result.
- Return type:
is_spoof_samples(samples: numpy.ndarray[numpy.uint8], sample_rate: int) -> antispoof2.AntispoofResult
Tests whether given audio samples contain spoofed speech.
- Parameters:
samples (numpy.array) – PCM16 (numpy.uint8) audio samples.
sample_rate (int) – Audio sample rate in Hz.
- Returns:
Antispoofing check result.
- Return type:
is_spoof_samples(samples: numpy.ndarray[numpy.float32], sample_rate: int) -> antispoof2.AntispoofResult
Tests whether given audio samples contain spoofed speech.
- Parameters:
samples (numpy.array) – float (numpy.float32) audio samples.
sample_rate (int) – Audio sample rate in Hz.
- Returns:
Antispoofing check result.
- Return type:
- class voicesdk_cc.antispoof2.AntispoofResult¶
Class for antispoofing check result obtained with AntispoofEngine::is_spoof functions.
- property score¶
Human score.
- Type:
float
- property score_Replay¶
Replay attack score.
- Type:
float
- property score_TTS¶
Text-to-speech attack score.
- Type:
float
- property score_VC¶
Voice conversion attack score.
- Type:
float
- property unsuitable_input_message¶
Message that explains the reason for audio input to be unsuitable for spoof check.
- Type:
str
Specific speaker attributes retrieval
- class voicesdk_cc.attributes.Attributes¶
Estimated person attributes class.
- property age¶
Estimated age in years.
- Type:
int
- property gender_score¶
Raw gender score (the bigger score corresponding to male and the smaller score corresponding to female).
- Type:
float
- property phone_call_participant¶
Estimated phone call participant class (deprecated, always equals to UNDEFINED
- Type:
PhoneCallParticipant
- class voicesdk_cc.attributes.AttributesEstimator¶
Class for estimating person attributes by their voice (age and gender).
- __init__(init_data_path: str)¶
Attributes estimator constructor.
- Parameters:
init_data_path (str) – Path to the directory containing init data.
- estimate_with_file(path_to_wav_file: str) attributes.Attributes ¶
Estimates person attributes using given voice sample.
- Parameters:
path_to_wav_file (str) – Path to WAV file.
- Returns:
Estimated attributes.
- Return type:
- estimate_with_samples(*args, **kwargs)¶
Overloaded function.
estimate_with_samples(samples: numpy.ndarray[numpy.int16], sample_rate: int) -> attributes.Attributes
Estimates person attributes using given voice sample.
- Parameters:
samples (numpy.array) – PCM16 (numpy.int16) audio samples.
sample_rate (int) – Input audio signal sample rate in Hz.
- Returns:
Estimated attributes.
- Return type:
estimate_with_samples(samples: numpy.ndarray[numpy.uint8], sample_rate: int) -> attributes.Attributes
Estimates person attributes using given voice sample.
- Parameters:
samples (numpy.array) – PCM16 (numpy.uint8) audio samples.
sample_rate (int) – Input audio signal sample rate in Hz.
- Returns:
Estimated attributes.
- Return type:
estimate_with_samples(samples: numpy.ndarray[numpy.float32], sample_rate: int) -> attributes.Attributes
Estimates person attributes using given voice sample.
- Parameters:
samples (numpy.array) – float (numpy.float32) audio samples.
sample_rate (int) – Input audio signal sample rate in Hz.
- Returns:
Estimated attributes.
- Return type:
- class voicesdk_cc.attributes.Gender¶
Enumeration representing human gender.
- MALE¶
Male.
- FEMALE¶
Female.
Members:
MALE
FEMALE
Callcenter SDK specific routines
- class voicesdk_cc.callcenter.BuildInfo¶
Structure containing present VoiceSDK CC build info.
- property components¶
VoiceSDK components presented in build.
- Type:
str
- property git_info¶
Git info dump at the build stage.
- Type:
str
- property license_expiration_date¶
License expiration date in YYYY-MM-DD format.
- Type:
str
- property license_info¶
Information (e.g. expiration date) about the installed license if available or an empty string if no license is in use. Deprecated, use license_expiration_date instead
- Type:
str
- property version¶
VoiceSDK build version.
- Type:
str
- voicesdk_cc.callcenter.get_build_info() callcenter.BuildInfo ¶
Returns current VoiceSDK CC build info.
- Returns:
Current VoiceSDK CC build info.
- Return type:
Utility functions for working with Opus data.
- voicesdk_cc.core.opusutils.read_as_pcm16_samples_from_memory(opus_file: str) tuple ¶
Reads Opus file from a memory buffer and decodes it to PCM16 samples buffer.
- Parameters:
opus_file (bytes) – Memory buffer containing complete Opus file contents.
- Returns:
Tuple with numpy.array (of numpy.int16 type) and signal sample rate.
- Return type:
tuple
Core functionality needed in more than one module.
- class voicesdk_cc.core.AudioInterval¶
Class representing interval of audio data.
- __init__(*args, **kwargs)¶
Overloaded function.
__init__(start_sample: int, end_sample: int, sample_rate: int)
__init__(time_interval: voicesdk::TimeInterval, sample_rate: int)
- property end_sample¶
Sample number where interval ends (not inclusive).
- property end_time¶
Timestamp in milliseconds where AudioInterval ends (not inclusive).
- property sample_rate¶
Sample rate of corresponding audio.
- property start_sample¶
Sample number where interval starts.
- property start_time¶
Timestamp in milliseconds where AudioInterval starts.
- class voicesdk_cc.core.ChannelType¶
Enumeration for audio source labeling during voice template creation.
- MIC¶
Microphone audio channel.
- TEL¶
Telephone audio channel.
- MIXED¶
Mixed audio channel.
Members:
MIC
TEL
MIXED
- class voicesdk_cc.core.VoiceTemplate¶
Voice template class.
- __init__(bytes: bytes)¶
Constructs voice templates from its serialized representation.
- Parameters:
bytes (bytes) – Bytes object with serialized voice template.
- static deserialize(bytes: bytes) core.VoiceTemplate ¶
Factory method, deserializes voice template from bytes.
- Parameters:
bytes (bytes) – Bytes object with serialized voice template.
- Returns:
voice template instance.
- Return type:
- get_channel_type() core.ChannelType ¶
Returns voice template channel type which was specified by user on creation.
- Returns:
Channel type
- Return type:
- get_init_data_id() str ¶
Returns ID of the init data, which was used to create the template.
- Returns:
A string containing init data ID
- Return type:
str
- is_valid() bool ¶
Checks if voice template is valid or not.
- Returns:
True if valid, else otherwise.
- Return type:
bool
- static load_from_file(path_to_file: str) core.VoiceTemplate ¶
Factory method, restores voice template from the given file.
- Parameters:
path_to_file (str) – Path to template file.
- Returns:
voice template instance.
- Return type:
- save_to_file(path_to_file: str)¶
Stores voice template in a file of the given path.
- Parameters:
path_to_file (str) – Path to template file.
- serialize() bytes ¶
Serializes voice template to bytes.
- Returns:
Serialized voice template.
- Return type:
bytes
- class voicesdk_cc.core.VoiceTemplateConverter¶
Voice template conversion class.
- __init__(init_data_path: str)¶
VoiceTemplateConverter constructor.
- Parameters:
init_data_path (str) – Path to the directory containing init data.
- convert_voice_template(voice_template: core.VoiceTemplate) core.VoiceTemplate ¶
Converts voice template from one configuration to another.
- Parameters:
voice_template (VoiceTemplate) – Voice template to be converted.
- Returns:
Converted voice template.
- Return type:
- get_input_init_data_id() str ¶
Returns init data ID that voice template to be converted should have.
- Returns:
A string containing init data ID of the voice template to be converted.
- Return type:
str
- get_output_init_data_id() str ¶
Returns init data ID that converted voice template will have.
- Returns:
A string containing init data ID of the converted voice template.
- Return type:
str
- voicesdk_cc.core.set_num_threads(arg0: int)¶
Sets the maximum number of threads available for VoiceSDK.If 0 passed, then the optimal number of threads is detected automatically (the same effect is achieved if setNumThreads is not called).
- Parameters:
num_threads (int) – Maximum number of threads available for VoiceSDK.
- voicesdk_cc.core.set_use_voice_template_compression(arg0: bool)¶
Sets whether to use compression for voice templates serialization. Voice template compression is not used by default.
- Parameters:
use_voice_template_compression (bool) – Whether to use compression for voice templates serialization.
Utility functions for working with WAV data.
- voicesdk_cc.core.wavutils.read_as_float_samples(wav_file_path: object) tuple ¶
Reads WAV file as a float samples buffer (WAV file can be of any format).
- Parameters:
wav_file_path (str or pathlib.Path) – Path to WAV file.
- Returns:
Audio data and sample rate.
- Return type:
tuple(numpy.ndarray, int)
- voicesdk_cc.core.wavutils.read_as_float_samples_16bit(wav_file_path: object) tuple ¶
Reads WAV file as a float samples buffer with 16-bit precision (WAV file can be of any format).
- Parameters:
wav_file_path (str or pathlib.Path) – Path to WAV file.
- Returns:
Audio data and sample rate.
- Return type:
tuple(numpy.ndarray, int)
- voicesdk_cc.core.wavutils.read_as_pcm16_bytes(wav_file_path: object) tuple ¶
Reads WAV file as a PCM16 bytes buffer (WAV file can be of any format).
- Parameters:
wav_file_path (str or pathlib.Path) – Path to WAV file.
- Returns:
Audio data and sample rate.
- Return type:
tuple(numpy.ndarray, int)
- voicesdk_cc.core.wavutils.read_as_pcm16_samples(wav_file_path: object) tuple ¶
Reads WAV file as a PCM16 samples buffer (WAV file can be of any format).
- Parameters:
wav_file_path (str or pathlib.Path) – Path to WAV file.
- Returns:
Audio data and sample rate.
- Return type:
tuple(numpy.ndarray, int)
Speaker distinction detection
- class voicesdk_cc.diarization.DiarizationEngine¶
Diarization engine class, an entry point for speaker diarization.
- __init__(init_data_path: str)¶
DiarizationEngine constructor.
- Parameters:
init_data_path (str) – Path to the directory containing engine init data.
- get_segmentation_from_file(path_to_wav_file: str, num_speakers: int = 0) dict[int, list[core.AudioInterval]] ¶
Performs speaker diarization of the given WAV file.
- Parameters:
path_to_wav_file (str) – Path to WAV file.
num_speakers (
int
, optional) – Optional number of speakers.
- Returns:
Dict containing an AudioInterval lists, each key corresponds to a certain speaker.
- Return type:
dict
- get_segmentation_from_samples(*args, **kwargs)¶
Overloaded function.
get_segmentation_from_samples(samples: numpy.ndarray[numpy.int16], sample_rate: int, num_speakers: int = 0) -> dict[int, list[core.AudioInterval]]
Performs speaker diarization of the given PCM16 samples.
- Parameters:
samples (numpy.array) – PCM16 (numpy.int16) audio samples.
sample_rate (int) – Audio sample rate in Hz.
num_speakers (
int
, optional) – Optional number of speakers.
- Returns:
Dict containing an AudioInterval lists, each key corresponds to a certain speaker.
- Return type:
dict
get_segmentation_from_samples(samples: numpy.ndarray[numpy.uint8], sample_rate: int, num_speakers: int = 0) -> dict[int, list[core.AudioInterval]]
Performs speaker diarization of the given PCM16 samples.
- Parameters:
samples (numpy.array) – PCM16 (numpy.uint8) audio samples.
sample_rate (int) – Audio sample rate in Hz.
num_speakers (
int
, optional) – Optional number of speakers.
- Returns:
Dict containing an AudioInterval lists, each key corresponds to a certain speaker.
- Return type:
dict
get_segmentation_from_samples(samples: numpy.ndarray[numpy.float32], sample_rate: int, num_speakers: int = 0) -> dict[int, list[core.AudioInterval]]
Performs speaker diarization of the given PCM16 samples.
- Parameters:
samples (numpy.array) – float (numpy.float32) audio samples.
sample_rate (int) – Audio sample rate in Hz.
num_speakers (
int
, optional) – Optional number of speakers.
- Returns:
Dict containing an AudioInterval lists, each key corresponds to a certain speaker.
- Return type:
dict
Audio/speech utilities and estimators
- class voicesdk_cc.media.QualityCheckEngine¶
Quality check engine class.
- __init__(init_data_path: str)¶
QualityCheckEngine constructor.
- Parameters:
init_data_path (str) – Path to the directory containing initialization data.
- check_quality_from_file(path_to_audio_file: str, thresholds: media.QualityCheckMetricsThresholds) media.QualityCheckEngineResult ¶
- Parameters:
path_to_audio_file (str) – Path to audio file.
thresholds – (QualityCheckMetricsThresholds): Quality checking thresholds that will be applied to the output quality check metrics.
- Returns:
Quality check result.
- Return type:
- check_quality_from_samples(*args, **kwargs)¶
Overloaded function.
check_quality_from_samples(samples: numpy.ndarray[numpy.int16], sample_rate: int, thresholds: media.QualityCheckMetricsThresholds) -> media.QualityCheckEngineResult
Checks whether audio buffer is suitable from the quality perspective, from the given PCM16 samples.
- Parameters:
samples (numpy.array) – PCM16 (numpy.int16) audio samples.
sample_rate (int) – Audio sample rate in Hz.
thresholds – (QualityCheckMetricsThresholds): Quality checking thresholds that will be applied to the output quality check metrics.
- Returns:
Quality check result.
- Return type:
check_quality_from_samples(samples: numpy.ndarray[numpy.uint8], sample_rate: int, thresholds: media.QualityCheckMetricsThresholds) -> media.QualityCheckEngineResult
Checks whether audio buffer is suitable from the quality perspective, from the given PCM16 audio bytes.
- Parameters:
samples (numpy.array) – PCM16 (numpy.uint8) audio samples.
sample_rate (int) – Audio sample rate in Hz.
thresholds – (QualityCheckMetricsThresholds): Quality checking thresholds that will be applied to the output quality check metrics.
- Returns:
Quality check result.
- Return type:
check_quality_from_samples(samples: numpy.ndarray[numpy.float32], sample_rate: int, thresholds: media.QualityCheckMetricsThresholds) -> media.QualityCheckEngineResult
Checks whether audio buffer is suitable from the quality perspective.
- Parameters:
samples (numpy.array) – Float (numpy.float32) audio samples, from the given float audio samples.
sample_rate (int) – Audio sample rate in Hz.
thresholds – (QualityCheckMetricsThresholds): Quality checking thresholds that will be applied to the output quality check metrics.
- Returns:
Quality check result.
- Return type:
- get_recommended_thresholds(scenario: media.QualityCheckScenario) media.QualityCheckMetricsThresholds ¶
- Parameters:
scenario – (QualityCheckScenario): Scenario for which recommended thresholds will be returned.
- Returns:
Quality check thresholds that can be used on quality checking.
- Return type:
- class voicesdk_cc.media.QualityCheckEngineResult¶
Quality check result class
- property multiple_speakers_detector_score¶
Multiple speakers detector score value obtained on quality check
- Type:
float
- property quality_check_short_description¶
Short description of the quality check results
- property snr_db¶
SNR metric value obtained on quality check in Db
- Type:
float
- property speech_length_ms¶
Speech length metric value obtained on quality check in milliseconds
- Type:
float
- property speech_relative_length¶
Speech relative length (speech length relative to the total audio length) metric value obtained on quality check
- Type:
float
- class voicesdk_cc.media.QualityCheckMetricsThresholds¶
Class for quality checking thresholds.
- __init__(*args, **kwargs)¶
Overloaded function.
__init__()
__init__(minimum_snr_db: float, minimum_speech_length_ms: float, minimum_speech_relative_length: float, maximum_multiple_speakers_detector_score: float)
maximum_multiple_speakers_detector_score (float): Maximum multiple speakers detector score allowed to pass quality check.
- property maximum_multiple_speakers_detector_score¶
Maximum multiple speakers detector score allowed to pass quality check.
- Type:
float
- property minimum_snr_db¶
Minimum signal-to-noise ratio required to pass quality check in dB.
- Type:
float
- property minimum_speech_length_ms¶
Minimum speech length required to pass quality check in milliseconds.
- Type:
float
- property minimum_speech_relative_length¶
Minimum speech relative length (speech length relative to the total audio length) required to pass quality check.
- Type:
float
- class voicesdk_cc.media.QualityCheckScenario¶
Enumeration representing scenarios used to get recommended quality check thresholds.
VERIFY_TI_ENROLLMENT: Verification, TI enrollment step. VERIFY_TI_VERIFICATION: Verification, TI verification step. VERIFY_TD_ENROLLMENT: Verification, TD enrollment step. VERIFY_TD_VERIFICATION: Verification, TD verification step. LIVENESS: Liveness check.
Members:
VERIFY_TI_ENROLLMENT
VERIFY_TI_VERIFICATION
VERIFY_TD_ENROLLMENT
VERIFY_TD_VERIFICATION
LIVENESS
- class voicesdk_cc.media.QualityCheckShortDescription¶
Enumeration representing short descriptions of the audio quality check results.
TOO_NOISY: Too noisy audio. TOO_SMALL_SPEECH_TOTAL_LENGTH: Too small speech length in the audio. TOO_SMALL_SPEECH_RELATIVE_LENGTH: Too small speech relative length (speech length relative to the total audio length). MULTIPLE_SPEAKERS_DETECTED: Multiple speakers detected. OK: Audio successfully passed quality check.
Members:
TOO_NOISY
TOO_SMALL_SPEECH_TOTAL_LENGTH
TOO_SMALL_SPEECH_RELATIVE_LENGTH
MULTIPLE_SPEAKERS_DETECTED
OK
- class voicesdk_cc.media.SNRComputer¶
SNRComputer class, intended to calculate signal-to-noise (SNR) ratio with given audio signal.
- __init__(init_data_path: str)¶
SNRComputer constructor.
- Parameters:
init_data_path (str) – Path to the directory containing init data.
- compute_with_file(path_to_audio_file: str) float ¶
Calculates SNR with given audio file.
- Parameters:
path_to_audio_file (str) – Path to audio file.
- Returns:
Computed SNR in dB.
- Return type:
float
- compute_with_samples(*args, **kwargs)¶
Overloaded function.
compute_with_samples(samples: numpy.ndarray[numpy.int16], sample_rate: int) -> float
Calculates SNR with given PCM16 audio samples.
- Parameters:
samples (numpy.array) – PCM16 (numpy.int16) audio samples.
sample_rate (int) – Input audio signal sample rate in Hz.
- Returns:
Computed SNR in dB.
- Return type:
float
compute_with_samples(samples: numpy.ndarray[numpy.uint8], sample_rate: int) -> float
Calculates SNR with given PCM16 audio samples.
- Parameters:
samples (numpy.array) – PCM16 (numpy.uint8) audio samples.
sample_rate (int) – Input audio signal sample rate in Hz.
- Returns:
Computed SNR in dB.
- Return type:
float
compute_with_samples(samples: numpy.ndarray[numpy.float32], sample_rate: int) -> float
Calculates SNR with given PCM16 audio samples.
- Parameters:
samples (numpy.array) – float (numpy.float32) audio samples.
sample_rate (int) – Input audio signal sample rate in Hz.
- Returns:
Computed SNR in dB.
- Return type:
float
- class voicesdk_cc.media.SpeechEndpointDetector¶
Speech processor class for speech endpoint detection.
- __init__(min_speech_length_ms: int, max_silence_length_ms: int, sample_rate: int)¶
SpeechEndpointDetector constructor.
- Parameters:
min_speech_length_ms (int) – Silence after speech threshold used to determine if speech is already ended (ms).
max_silence_length_ms (int) – Minimum speech length required to begin speech end detection (ms).
sample_rate (int) – Input audio sample rate.
- add_samples(*args, **kwargs)¶
Overloaded function.
add_samples(samples: numpy.ndarray[numpy.int16])
Adds new audio samples to the SpeechEndpointDetector.
- Parameters:
samples (numpy.array) – PCM16 (numpy.int16) audio samples.
Note: Audio sample rate is predefined in the constructor.
add_samples(samples: numpy.ndarray[numpy.uint8])
Adds new audio samples to the SpeechEndpointDetector.
- Parameters:
samples (numpy.array) – PCM16 (numpy.uint8) audio samples.
Note: Audio sample rate is predefined in the constructor.
add_samples(samples: numpy.ndarray[numpy.float32])
Adds new audio samples to the SpeechEndpointDetector.
- Parameters:
samples (numpy.array) – float (numpy.float32) audio samples.
Note: Audio sample rate is predefined in the constructor.
- is_speech_ended() bool ¶
Returns detection state.
- Returns:
True if speech end was detected, False otherwise.
- Return type:
bool
- reset()¶
Resets detector state.
- class voicesdk_cc.media.SpeechEndpointDetectorOpus¶
Speech processor class for speech endpoint detection in the Opus audio stream.
- __init__(min_speech_length_ms: int, max_silence_length_ms: int, sample_rate: int)¶
SpeechEndpointDetectorOpus constructor.
- Parameters:
min_speech_length_ms (int) – Silence after speech threshold used to determine if speech is already ended (ms).
max_silence_length_ms (int) – Minimum speech length required to begin speech end detection (ms).
sample_rate (int) – Input audio sample rate.
- add_packet(bytes: numpy.ndarray[numpy.uint8])¶
Adds Opus packet to the SpeechEndpointDetectorOpus.
- Parameters:
bytes (numpy.array) – A buffer containing single Opus packet. It is expected that packet contains data for single mono stream.
Note: Audio sample rate is predefined in the constructor.
- is_speech_ended() bool ¶
Returns detection state.
- Returns:
True if speech end was detected, False otherwise.
- Return type:
bool
- reset()¶
Resets detector state.
- class voicesdk_cc.media.SpeechEvent¶
Class representing a single speech event.
- property audio_interval¶
Speech event audio interval.
- Type:
- property is_voice¶
Whether the frame contains speech or not.
- Type:
bool
- class voicesdk_cc.media.SpeechInfo¶
Class that contains metrics related to Voice Activity Detection.
- property background_length_ms¶
Total length of non-speech signal, milliseconds.
- Type:
float
- property speech_length_ms¶
Total accumulated speech duration, milliseconds.
- Type:
float
- property total_length_ms¶
Total audio record duration, milliseconds.
- Type:
float
- class voicesdk_cc.media.SpeechSummary¶
Speech summary class.
- property speech_events¶
Retrieves list of speech events.
- Returns:
speech events.
- Return type:
list
- property speech_info¶
Retrieves speech info data.
- Returns:
speech info.
- Return type:
- class voicesdk_cc.media.SpeechSummaryEngine¶
Speech summary engine class, intended to calculate SpeechSummary with given audio samples.
- __init__(init_data_path: str)¶
SpeechSummaryEngine constructor.
- Parameters:
init_data_path (str) – Path to the directory containing engine init data.
- create_stream(stream_sample_rate: int) voicesdk::media::python::SpeechSummaryStreamPy ¶
Factory method for creating SpeechSummaryStream.
- Parameters:
stream_sample_rate (int) – Audio stream sample rate in Hz.
- Returns:
Created speech summary stream.
- Return type:
- get_speech_summary_from_file(path_to_audio_file: str) voicesdk::SpeechSummary ¶
Calculates speech summary with given audio file.
- Parameters:
path_to_audio_file (str) – Path to audio file.
- Returns:
Speech summary.
- Return type:
- get_speech_summary_from_samples(*args, **kwargs)¶
Overloaded function.
get_speech_summary_from_samples(samples: numpy.ndarray[numpy.int16], sample_rate: int) -> voicesdk::SpeechSummary
Calculates speech summary with given PCM16 audio samples.
- Parameters:
samples (numpy.array) – PCM16 (numpy.int16) audio samples.
sample_rate (int) – Audio sample rate in Hz.
- Returns:
Speech summary.
- Return type:
get_speech_summary_from_samples(samples: numpy.ndarray[numpy.uint8], sample_rate: int) -> voicesdk::SpeechSummary
Calculates speech summary with given PCM16 audio samples.
- Parameters:
samples (numpy.array) – PCM16 (numpy.uint8) audio samples.
sample_rate (int) – Audio sample rate in Hz.
- Returns:
Speech summary.
- Return type:
get_speech_summary_from_samples(samples: numpy.ndarray[numpy.float32], sample_rate: int) -> voicesdk::SpeechSummary
Calculates speech summary with given PCM16 audio samples.
- Parameters:
samples (numpy.array) – float (numpy.float32) audio samples.
sample_rate (int) – Audio sample rate in Hz.
- Returns:
Speech summary.
- Return type:
- class voicesdk_cc.media.SpeechSummaryStream¶
Stateful speech summary stream class, intended to calculate SpeechSummary with audio samples given in stream. New instance can be obtained with a SpeechSummaryEngine instance.
- add_samples(*args, **kwargs)¶
Overloaded function.
add_samples(samples: numpy.ndarray[numpy.int16])
Adds PCM16 audio samples to process.
- Parameters:
samples (numpy.array) – PCM16 (numpy.int16) audio samples.
Note: Audio sample rate is predefined at the stream creation time.
add_samples(samples: numpy.ndarray[numpy.uint8])
Adds PCM16 audio samples to process.
- Parameters:
samples (numpy.array) – PCM16 (numpy.uint8) audio samples.
Note: Audio sample rate is predefined at the stream creation time.
add_samples(samples: numpy.ndarray[numpy.float32])
Adds PCM16 audio samples to process.
- Parameters:
samples (numpy.array) – float (numpy.float32) audio samples.
Note: Audio sample rate is predefined at the stream creation time.
- finalize()¶
Finalizes input audio stream to process remaining audio samples and produce result if it’s possible.
- get_current_background_length() float ¶
Returns current background length in milliseconds.
- Returns:
Current background length in milliseconds.
- Return type:
float
- get_speech_event() media.SpeechEvent ¶
Retrieves speech event from output queue.
- Returns:
One speech event.
- Return type:
- Raises:
RuntimeError – If output queue is empty.
Note: Use has_speech_events() to check if there are available speech event.
- get_total_speech_info() media.SpeechInfo ¶
Retrieves accumulated speech info data.
- Returns:
speech info.
- Return type:
- get_total_speech_summary() voicesdk::SpeechSummary ¶
Retrieves accumulated speech summary data.
- Returns:
speech summary.
- Return type:
- has_speech_events() bool ¶
Checks if any speech events are present in stream queue.
- Returns:
True if any events are present, false otherwise.
- Return type:
bool
- reset()¶
Resets stream state: clears buffer, resets speech summary.
- class voicesdk_cc.media.SpeechSummaryStreamOpus¶
Stateful speech summary stream class, intended to calculate SpeechSummary with audio samples given in Opus stream. New instance can be obtained with a SpeechSummaryEngine instance.
- __init__(init_data_path: str, sample_rate: int)¶
SpeechSummaryStreamOpus constructor.
- Parameters:
init_data_path (str) – Path to the directory containing engine init data.
sample_rate (int) – Audio stream sample rate in Hz.
- add_packet(bytes: numpy.ndarray[numpy.uint8])¶
Adds Opus packet to the SpeechSummaryStreamOpus.
- Parameters:
bytes (numpy.array) – A buffer containing single Opus packet. It is expected that packet contains data for single mono stream.
Note: Audio sample rate is predefined in the constructor.
- finalize()¶
Finalizes input audio stream to process remaining audio samples and produce result if it’s possible.
- get_current_background_length() float ¶
Returns current background length in milliseconds.
- Returns:
Current background length in milliseconds.
- Return type:
float
- get_speech_event() media.SpeechEvent ¶
Retrieves speech event from output queue.
- Returns:
One speech event.
- Return type:
- Raises:
RuntimeError – If output queue is empty.
Note: Use has_speech_events() to check if there are available speech event.
- get_total_speech_info() media.SpeechInfo ¶
Retrieves accumulated speech info data.
- Returns:
speech info.
- Return type:
- get_total_speech_summary() voicesdk::SpeechSummary ¶
Retrieves accumulated speech summary data.
- Returns:
speech summary.
- Return type:
- has_speech_events() bool ¶
Checks if any speech events are present in stream queue.
- Returns:
True if any events are present, false otherwise.
- Return type:
bool
- reset()¶
Resets stream state: clears buffer, resets speech summary.
Enrollment and verification for speaker audio
- class voicesdk_cc.verify.VerifyResult¶
Voice verification result class.
- property probability¶
Voice matching probability from 0 to 1, should be used for making a biometrics authentication decision.
- Type:
float
- property score¶
Raw verification score, intended to be used for evaluation and data-wise calibration.
- Type:
float
- class voicesdk_cc.verify.VerifyStreamResult¶
Streaming voice verification result class.
- property audio_interval¶
Audio interval, which verify result refers to.
- Type:
- property verify_result¶
Voice verification result.
- Type:
voicesdk.verify.VerifyResult
- class voicesdk_cc.verify.VoiceTemplateFactory¶
Voice verification engine class.
- __init__(init_data_path: str)¶
VoiceTemplateFactory constructor.
- Parameters:
init_data_path (str) – Path to the directory containing factory init data.
- create_voice_template_batch_from_file(input_batch: list[verify.VerifyFileBatchElement]) list[core.VoiceTemplate] ¶
Creates multiple voice templates from the contents of the given WAV files.
Args: input_batch (list): List of VerifyFileBatchElement Returns: Created voice templates.
Note: This API is experimental and subject to change.
- create_voice_template_batch_from_samples(*args, **kwargs)¶
Overloaded function.
create_voice_template_batch_from_samples(input_batch: list[verify.VerifySamplesBatchElementFloat]) -> list[core.VoiceTemplate]
Creates multiple voice templates from given audio samples.
Args: input_batch (list): List of VerifySamplesBatchElementFloat Returns: Created voice templates.
Note: This API is experimental and subject to change.
create_voice_template_batch_from_samples(input_batch: list[verify.VerifySamplesBatchElementInt16]) -> list[core.VoiceTemplate]
Creates multiple voice templates from given audio samples.
Args: input_batch (list): List of VerifySamplesBatchElementInt32 Returns: Created voice templates.
Note: This API is experimental and subject to change.
create_voice_template_batch_from_samples(input_batch: list[verify.VerifySamplesBatchElementUint8]) -> list[core.VoiceTemplate]
Creates multiple voice templates from given audio samples.
Args: input_batch (list): List of VerifySamplesBatchElementUint8 Returns: Created voice templates.
Note: This API is experimental and subject to change.
- create_voice_template_from_file(path_to_audio_file: str, channel_type: core.ChannelType = <ChannelType.TEL: 2>) core.VoiceTemplate ¶
Creates voice template from the contents of the given audio file.
- Parameters:
path_to_audio_file (str) – Path to audio file.
channel_type (
ChannelType
, optional) – Input audio channel type, default is ChannelType.TEL.
- Returns:
Created voice template.Note:
- Return type:
Audio file sampling frequency should be equal to or greater than the value returned by get_minimum_audio_sample_rate.
- create_voice_template_from_samples(*args, **kwargs)¶
Overloaded function.
create_voice_template_from_samples(samples: numpy.ndarray[numpy.int16], sample_rate: int, channel_type: core.ChannelType = <ChannelType.TEL: 2>) -> core.VoiceTemplate
Creates voice template from audio samples.
- Parameters:
samples (numpy.array) – PCM16 (numpy.int16) audio samples.
sample_rate (int) – Audio sample rate in Hz.
channel_type (
ChannelType
, optional) – Input audio channel type, default is ChannelType.TEL.
- Returns:
Created voice template.Note:
- Return type:
Sampling frequency should be equal to or greater than the value returned by get_minimum_audio_sample_rate.
create_voice_template_from_samples(samples: numpy.ndarray[numpy.uint8], sample_rate: int, channel_type: core.ChannelType = <ChannelType.TEL: 2>) -> core.VoiceTemplate
Creates voice template from audio samples.
- Parameters:
samples (numpy.array) – PCM16 (numpy.uint8) audio samples.
sample_rate (int) – Audio sample rate in Hz.
channel_type (
ChannelType
, optional) – Input audio channel type, default is ChannelType.TEL.
- Returns:
Created voice template.Note:
- Return type:
Sampling frequency should be equal to or greater than the value returned by get_minimum_audio_sample_rate.
create_voice_template_from_samples(samples: numpy.ndarray[numpy.float32], sample_rate: int, channel_type: core.ChannelType = <ChannelType.TEL: 2>) -> core.VoiceTemplate
Creates voice template from audio samples.
- Parameters:
samples (numpy.array) – Float (numpy.float32) audio samples.
sample_rate (int) – Audio sample rate in Hz.
channel_type (
ChannelType
, optional) – Input audio channel type, default is ChannelType.MI.
- Returns:
Created voice template.Note:
- Return type:
Sampling frequency should be equal to or greater than the value returned by get_minimum_audio_sample_rate.
- get_init_data_id() str ¶
Returns ID of the init data, which was used to create the factory.
- Returns:
A string containing init data ID.
- Return type:
str
- get_minimum_audio_sample_rate() int ¶
Returns minimum supported input audio sampling frequency in Hz.
- Returns:
A minimum sampling rate in Hz.
- Return type:
int
- merge_voice_templates(voice_templates: list) core.VoiceTemplate ¶
Merges a list of voice templates of the same speaker producing a union template.
- Parameters:
voice_templates (list) – List of voice templates.
- Returns:
A union voice template.Note:
- Return type:
All the templates should have the same init data ID as the factory instance.
- class voicesdk_cc.verify.VoiceTemplateMatcher¶
Voice verification engine class.
- __init__(init_data_path: str)¶
VoiceTemplateMatcher constructor.
- Parameters:
init_data_path (str) – Path to the directory containing matcher init data.
- get_init_data_id() str ¶
Returns ID of the init data, which was used to create the matcher.
- Returns:
A string containing init data ID.
- Return type:
str
- match_voice_templates(template1: core.VoiceTemplate, template2: core.VoiceTemplate) verify.VerifyResult ¶
Matches two voice templates one-to-one.
- Parameters:
template1 (VoiceTemplate) – First voice template.
template2 (VoiceTemplate) – Second voice template.
- Returns:
Verification result.Note:
- Return type:
voicesdk.verify.VerifyResult
Both templates should have the same init data ID as the matcher instance.
- class voicesdk_cc.verify.VoiceVerifyStream¶
Class for continuous voice verification using audio stream.
- __init__(voice_template_factory: voicesdk::verify::python::VoiceTemplateFactoryPy, voice_template_matcher: voicesdk::verify::python::VoiceTemplateMatcherPy, voice_templates: list[core.VoiceTemplate], sample_rate: int, audio_context_length_seconds: int = 10, window_length_seconds: float = 3)¶
Note: Sampling frequency should be equal to or greater than the value returned by voice_template_factory.get_minimum_audio_sample_rate(). Voice template matcher, voice template factory and voice template should have the same init data ID.
- add_samples(*args, **kwargs)¶
Overloaded function.
add_samples(samples: numpy.ndarray[numpy.int16])
Adds PCM16 audio samples to process.
- Parameters:
samples (numpy.array) – PCM16 (numpy.int16) audio samples.Note:
Audio sample rate is predefined at the create time. If exception was thrown, then stream state is being reset except the accumulated results buffer.
add_samples(samples: numpy.ndarray[numpy.uint8])
Adds PCM16 audio samples to process.
- Parameters:
samples (numpy.array) – PCM16 (numpy.uint8) audio samples.Note:
Audio sample rate is predefined at the create time. If exception was thrown, then stream state is being reset except the accumulated results buffer.
add_samples(samples: numpy.ndarray[numpy.float32])
Adds PCM16 audio samples to process.
- Parameters:
samples (numpy.array) – float (numpy.float32) audio samples.Note:
Audio sample rate is predefined at the create time. If exception was thrown, then stream state is being reset except the accumulated results buffer.
- finalize()¶
Finalizes input audio stream to process remaining audio samples and produce result if it’s possible.
- get_verify_result() list[verify.VerifyStreamResult] ¶
Retrieves verify result from output queue.
- Returns:
One verify result.
- Return type:
- Raises:
RuntimeError – If output queue is empty.Note:
Use has_verify_results() to check if there are available verify results.
- has_verify_results() bool ¶
Checks if there are verify results in output queue.
- Returns:
True is there are results available, else otherwise.
- Return type:
bool
- reset()¶
Resets stream state.
- class voicesdk_cc.verify.VoiceVerifyStreamOpus¶
Class for continuous voice verification using Opus audio stream.
- __init__(voice_template_factory: voicesdk::verify::python::VoiceTemplateFactoryPy, voice_template_matcher: voicesdk::verify::python::VoiceTemplateMatcherPy, voice_template: core.VoiceTemplate, sample_rate: int, audio_context_length_seconds: int = 10)¶
audio_context_length_seconds (
int
, optional): Length of audio context for voice verification in seconds, must be at least 3 seconds.Note:Sampling frequency should be equal to or greater than the value returned by voice_template_factory.get_minimum_audio_sample_rate(). Voice template matcher, voice template factory and voice template should have the same init data ID.
- add_packet(bytes: numpy.ndarray[numpy.uint8])¶
Adds Opus packet to process.
- Parameters:
bytes (numpy.array) – Opus packet bytes (numpy.uint8). It is expected that packet contains data for single mono stream.Note:
Audio sample rate is predefined at the create time. If exception was thrown, then stream state is being reset except the accumulated results buffer.
- finalize()¶
Finalizes input audio stream to process remaining audio samples and produce result if it’s possible.
- get_verify_result() verify.VerifyStreamResult ¶
Retrieves verify result from output queue.
- Returns:
One verify result.
- Return type:
- Raises:
RuntimeError – If output queue is empty.Note:
Use has_verify_results() to check if there are available verify results.
- has_verify_results() bool ¶
Checks if there are verify results in output queue.
- Returns:
True is there are results available, else otherwise.
- Return type:
bool
- reset()¶
Resets stream state.
Identification of one speaker among many
- class voicesdk_cc.identification.IdentificationEngine¶
Identification engine class used for creating identification list and for identification itself.
- __init__(arg0: str)¶
IdentificationEngine constructor.
- Parameters:
init_data_path (str) – Path to the directory containing engine init data.
- create_identification_list(voice_templates: list) identification.IdentificationList ¶
Creates a list for identification from given voice templates.
- Parameters:
voice_templates (list) – Vector of voice templates which will be used for identification (each template should be created with M_TI_X_2 verify method using 8k init data).
- Returns:
Created identification list.
- Return type:
- enrich_identification_list(identification_list: identification.IdentificationList, voice_templates: list)¶
Enriches identification list with given voice templates.
- Parameters:
identification_list (IdentificationList) – Identification list that will be enriched.
voice_templates (list) – Vector of voice templates which will be used for identification listenrichment (each template should be created with M_TI_X_2 verify method using 8k init data).
- identify(voice_template: core.VoiceTemplate, identification_list: identification.IdentificationList, acceptance_level: float = -2.0) identification.IdentificationResult ¶
Runs a template search in a provided identification list.
- Parameters:
voice_template (VoiceTemplate) – Speakers to identify voice template (template should be created with M_TI_X_2 verify method using 8k init data).
identification_list (IdentificationList) – Identification list to search speaker in.
acceptanceLevel (float) – Optional parameter that could be used for finer identification threshold adjustment. The greater this parameter, the smaller amount of templates from the identification list might be considered as similar to the passed template. Default value is -2.0.
- Returns:
Identification result.
- Return type:
- class voicesdk_cc.identification.IdentificationList¶
Class containing list for identification.
- __init__(*args, **kwargs)¶
Overloaded function.
__init__(arg0: str)
__init__()
- deserialize(bytes: str)¶
Deserializes identification list from bytes.
- Parameters:
bytes (bytes) – Bytes object with serialized identification list.
- serialize() bytes ¶
Serializes identification to bytes.
- Returns:
Serialized identification list.
- Return type:
bytes
- class voicesdk_cc.identification.IdentificationResult¶
Class containing identification result.
- get_indexes_of_matched_templates() list[int] ¶
Returns indexes of templates stored (i.e. used for creation) identification list which are considered to belong to the speaker being identified.
- Returns:
List of indexes of templates matched templates from identification list.
- Return type:
list
- is_matched() bool ¶
Check whether voice template was matched or not.
- Returns:
True if template was matched, False otherwise
- Return type:
bool
- property scores¶
Contains similarity score for each template sorted in according to templates identification list was created with.
- Type:
list
- property threshold¶
Similarity score threshold, supposed to be used to make a decision about if corresponding voice template in the identification list belongs to the same speaker as the passed one (is the score is greater than the threshold).
- Type:
float