Voice SDK 5.0.0 Python API documentation¶
- class voicesdk.core.AudioInfo¶
Class containing audio info
- property channels_num¶
Number of audio channels
- Type
int
- property sample_rate¶
Audio sample rate in Hz
- Type
int
- property samples_num¶
Number of audio samples
- Type
int
- class voicesdk.core.AudioInterval¶
Class representing interval of audio in both samples and milliseconds.
- property end_sample¶
sample number where AudioInterval ends (not inclusive).
- Type
float
- property end_time¶
AudioInterval end in milliseconds (not inclusive).
- Type
float
- property sample_rate¶
Sample rate of corresponding audio.
- Type
float
- property start_sample¶
sample number where AudioInterval starts.
- Type
float
- property start_time¶
AudioInterval start in milliseconds.
- Type
float
- class voicesdk.core.BuildInfo¶
Structure containing present VoiceSDK build info.
- property components¶
VoiceSDK components presented in build.
- Type
str
- property git_info¶
Git info dump at the build stage.
- Type
str
- property license_expiration_date¶
License expiration date in YYYY-MM-DD format. The date corresponds to the SDK feature that expires first. Deprecated, use get_license_info instead.
- Type
str
- property license_info¶
Information (e.g. expiration date) about the installed license if available or an empty string if no license is in use. Deprecated, use get_license_info instead.
- Type
str
- property version¶
VoiceSDK build version.
- Type
str
- class voicesdk.core.ChannelType¶
Enumeration for audio source labeling during voice template creation.
- MIC¶
Microphone audio channel.
- TEL¶
Telephone audio channel.
- MIXED¶
Mixed audio channel.
Members:
MIC
TEL
MIXED
- class voicesdk.core.LicenseFeature¶
VoiceSDK licensed features.
Members:
Core : Core functionality.
Verification : Voice verification.
LivenessPresentationAttackDetection : Voice liveness (presentation/replay attack detection).
LivenessVoiceClonesDetection : Voice liveness (voice clones detection).
QualityChecking : Quality checking functionality (SNR, speech length etc.).
- class voicesdk.core.LicenseFeatureInfo¶
VoiceSDK feature information.
- property expiration_date¶
Feature expiration date in YYYY-MM-DD format.
- Type
str
- property feature¶
License feature.
- Type
- class voicesdk.core.OpusUtils¶
OpusUtils class, contains static methods for Opus files reading.
- static read_as_pcm16_samples_from_memory(opus_file: str) tuple ¶
Reads Opus file from a memory buffer and decodes it to PCM16 samples buffer.
- Parameters
opus_file (bytes) – Memory buffer containing complete Opus file contents.
- Returns
Tuple with numpy.array (of numpy.int16 type) and signal sample rate.
- Return type
tuple
- class voicesdk.core.TimeInterval¶
Class representing time interval.
- property end_time¶
timestamp in milliseconds where TimeInterval ends (not inclusive).
- Type
float
- property start_time¶
timestamp in milliseconds where TimeInterval starts.
- Type
float
- class voicesdk.core.VoiceTemplate¶
Voice template class.
- __init__(bytes: bytes)¶
Constructs voice templates from its serialized representation.
- Parameters
bytes (bytes) – Bytes object with serialized voice template.
- static deserialize(bytes: bytes) voicesdk.core.VoiceTemplate ¶
Factory method, deserializes voice template from bytes.
- Parameters
bytes (bytes) – Bytes object with serialized voice template.
- Returns
voice template instance.
- Return type
- get_channel_type() voicesdk.core.ChannelType ¶
Returns voice template channel type which was specified by user on creation.
- Returns
Channel type
- Return type
- get_init_data_id() str ¶
Returns ID of the init data, which was used to create the template.
- Returns
A string containing init data ID
- Return type
str
- is_valid() bool ¶
Checks if voice template is valid or not.
- Returns
True if valid, else otherwise.
- Return type
bool
- static load_from_file(path_to_file: str) voicesdk.core.VoiceTemplate ¶
Factory method, restores voice template from the given file.
- Parameters
path_to_file (str) – Path to template file.
- Returns
voice template instance.
- Return type
- save_to_file(path_to_file: str)¶
Stores voice template in a file of the given path.
- Parameters
path_to_file (str) – Path to template file.
- serialize() bytes ¶
Serializes voice template to bytes.
- Returns
Serialized voice template.
- Return type
bytes
- class voicesdk.core.VoiceTemplateConverter¶
Voice template conversion class.
- __init__(init_data_path: str)¶
VoiceTemplateConverter constructor.
- Parameters
init_data_path (str) – Path to the directory containing init data.
- convert_voice_template(voice_template: voicesdk.core.VoiceTemplate) voicesdk.core.VoiceTemplate ¶
Converts voice template from one configuration to another.
- Parameters
voice_template (VoiceTemplate) – Voice template to be converted.
- Returns
Converted voice template.
- Return type
- get_input_init_data_id() str ¶
Returns init data ID that voice template to be converted should have.
- Returns
A string containing init data ID of the voice template to be converted.
- Return type
str
- get_output_init_data_id() str ¶
Returns init data ID that converted voice template will have.
- Returns
A string containing init data ID of the converted voice template.
- Return type
str
- class voicesdk.core.WavUtils¶
WavUtils class, contains static methods for WAV files reading.
- static get_audio_info(path_to_wav_file: str) voicesdk.core.AudioInfo ¶
Returns WAV file audio info.
- Parameters
path_to_wav_file (str) – Path to WAV file.Returns:
AudioInfo – object containing audio info.
- static get_audio_info_from_memory(wav_file: str) voicesdk.core.AudioInfo ¶
Returns WAV file audio info.
- Parameters
wav_file (bytes) – Memory buffer containing complete WAV file contents.Returns:
AudioInfo – object containing audio info.
- static read_as_float_samples(path_to_wav_file: str) tuple ¶
Reads WAV file as a float samples buffer (WAV file can be of any format).
- Parameters
path_to_wav_file (str) – Path to WAV file.
- Returns
Tuple with numpy.array (of numpy.float32 type) and signal sample rate.
- Return type
tuple
- static read_as_float_samples_16bit(path_to_wav_file: str) tuple ¶
Reads WAV file as a float buffer with 16-bit precision (WAV file can be of any format).
- Parameters
path_to_wav_file (str) – Path to WAV file.
- Returns
Tuple with numpy.array (of numpy.float32 type) and signal sample rate.
- Return type
tuple
- static read_as_float_samples_16bit_from_memory(wav_file: str) tuple ¶
Reads WAV file as a float samples buffer with 16-bit precision (WAV file can be of any format).
- Parameters
wav_file (bytes) – Memory buffer containing complete WAV file contents.
- Returns
Tuple with numpy.array (of numpy.float32 type) and signal sample rate.
- Return type
tuple
- static read_as_float_samples_from_memory(wav_file: str) tuple ¶
Reads WAV file as a float samples buffer (WAV file can be of any format).
- Parameters
wav_file (bytes) – Memory buffer containing complete WAV file contents.
- Returns
Tuple with numpy.array (of numpy.float32 type) and signal sample rate.
- Return type
tuple
- static read_as_pcm16_bytes(path_to_wav_file: str) tuple ¶
Reads WAV file as a PCM16 bytes buffer (WAV file can be of any format).
- Parameters
path_to_wav_file (str) – Path to WAV file.
- Returns
Tuple with numpy.array (of numpy.uint8 type) and signal sample rate.
- Return type
tuple
- static read_as_pcm16_bytes_from_memory(wav_file: str) tuple ¶
Reads WAV file as a PCM16 bytes buffer (WAV file can be of any format).
- Parameters
wav_file (bytes) – Memory buffer containing complete WAV file contents.
- Returns
Tuple with numpy.array (of numpy.uint8 type) and signal sample rate.
- Return type
tuple
- static read_as_pcm16_samples(path_to_wav_file: str) tuple ¶
Reads WAV file as a PCM16 samples buffer (WAV file can be of any format).
- Parameters
path_to_wav_file (str) – Path to WAV file.
- Returns
Tuple with numpy.array (of numpy.int16 type) and signal sample rate.
- Return type
tuple
- static read_as_pcm16_samples_from_memory(wav_file: str) tuple ¶
Reads WAV file as a PCM16 samples buffer (WAV file can be of any format).
- Parameters
wav_file (bytes) – Memory buffer containing complete WAV file contents.
- Returns
Tuple with numpy.array (of numpy.int16 type) and signal sample rate.
- Return type
tuple
- voicesdk.core.get_build_info() voicesdk.core.BuildInfo ¶
Returns present VoiceSDK build info.
- Returns
Present VoiceSDK present build info.
- Return type
- voicesdk.core.get_license_info() list[voicesdk.core.LicenseFeatureInfo] ¶
Returns information (enabled features and expiration dates) about the installed license if available.
- Returns
List of VoiceSDK features available with the installed license.
- Return type
list
- voicesdk.core.set_num_threads(arg0: int)¶
Sets the maximum number of threads available for VoiceSDK.If 0 passed, then the optimal number of threads is detected automatically (the same effect is achieved if setNumThreads is not called).
- Parameters
num_threads (int) – Maximum number of threads available for VoiceSDK.
- voicesdk.core.set_use_voice_template_compression(arg0: bool)¶
Sets whether to use compression for voice templates serialization. Voice template compression is not used by default.
- Parameters
use_voice_template_compression (bool) – Whether to use compression for voice templates serialization.
- class voicesdk.media.QualityCheckEngine¶
Quality check engine class.
- __init__(init_data_path: str)¶
QualityCheckEngine constructor.
- Parameters
init_data_path (str) – Path to the directory containing initialization data.
- check_quality_from_file(path_to_audio_file: str, thresholds: voicesdk.media.QualityCheckMetricsThresholds) voicesdk.media.QualityCheckEngineResult ¶
- Parameters
path_to_audio_file (str) – Path to audio file.
thresholds – (QualityCheckMetricsThresholds): Quality checking thresholds that will be applied to the output quality check metrics.
- Returns
Quality check result.
- Return type
- check_quality_from_samples(*args, **kwargs)¶
Overloaded function.
check_quality_from_samples(samples: numpy.ndarray[numpy.int16], sample_rate: int, thresholds: voicesdk.media.QualityCheckMetricsThresholds) -> voicesdk.media.QualityCheckEngineResult
Checks whether audio buffer is suitable from the quality perspective, from the given PCM16 samples.
- Parameters
samples (numpy.array) – PCM16 (numpy.int16) audio samples.
sample_rate (int) – Audio sample rate in Hz.
thresholds – (QualityCheckMetricsThresholds): Quality checking thresholds that will be applied to the output quality check metrics.
- Returns
Quality check result.
- Return type
check_quality_from_samples(samples: numpy.ndarray[numpy.uint8], sample_rate: int, thresholds: voicesdk.media.QualityCheckMetricsThresholds) -> voicesdk.media.QualityCheckEngineResult
Checks whether audio buffer is suitable from the quality perspective, from the given PCM16 audio bytes.
- Parameters
samples (numpy.array) – PCM16 (numpy.uint8) audio samples.
sample_rate (int) – Audio sample rate in Hz.
thresholds – (QualityCheckMetricsThresholds): Quality checking thresholds that will be applied to the output quality check metrics.
- Returns
Quality check result.
- Return type
check_quality_from_samples(samples: numpy.ndarray[numpy.float32], sample_rate: int, thresholds: voicesdk.media.QualityCheckMetricsThresholds) -> voicesdk.media.QualityCheckEngineResult
Checks whether audio buffer is suitable from the quality perspective.
- Parameters
samples (numpy.array) – Float (numpy.float32) audio samples, from the given float audio samples.
sample_rate (int) – Audio sample rate in Hz.
thresholds – (QualityCheckMetricsThresholds): Quality checking thresholds that will be applied to the output quality check metrics.
- Returns
Quality check result.
- Return type
- get_recommended_thresholds(scenario: voicesdk.media.QualityCheckScenario) voicesdk.media.QualityCheckMetricsThresholds ¶
- Parameters
scenario – (QualityCheckScenario): Scenario for which recommended thresholds will be returned.
- Returns
Quality check thresholds that can be used on quality checking.
- Return type
- class voicesdk.media.QualityCheckEngineResult¶
Quality check result class
- property multiple_speakers_detector_score¶
Multiple speakers detector score value obtained on quality check
- Type
float
- property quality_check_short_description¶
Short description of the quality check results
- property snr_db¶
SNR metric value obtained on quality check in Db
- Type
float
- property speech_length_ms¶
Speech length metric value obtained on quality check in milliseconds
- Type
float
- property speech_relative_length¶
Speech relative length (speech length relative to the total audio length) metric value obtained on quality check
- Type
float
- class voicesdk.media.QualityCheckMetricsThresholds¶
Class for quality checking thresholds.
- __init__(*args, **kwargs)¶
Overloaded function.
__init__()
__init__(minimum_snr_db: float, minimum_speech_length_ms: float, minimum_speech_relative_length: float, maximum_multiple_speakers_detector_score: float)
maximum_multiple_speakers_detector_score (float): Maximum multiple speakers detector score allowed to pass quality check.
- property maximum_multiple_speakers_detector_score¶
Maximum multiple speakers detector score allowed to pass quality check.
- Type
float
- property minimum_snr_db¶
Minimum signal-to-noise ratio required to pass quality check in dB.
- Type
float
- property minimum_speech_length_ms¶
Minimum speech length required to pass quality check in milliseconds.
- Type
float
- property minimum_speech_relative_length¶
Minimum speech relative length (speech length relative to the total audio length) required to pass quality check.
- Type
float
- class voicesdk.media.QualityCheckScenario¶
Enumeration representing scenarios used to get recommended quality check thresholds.
VERIFY_TI_ENROLLMENT: Verification, TI enrollment step. VERIFY_TI_VERIFICATION: Verification, TI verification step. VERIFY_TD_ENROLLMENT: Verification, TD enrollment step. VERIFY_TD_VERIFICATION: Verification, TD verification step. LIVENESS: Liveness check.
Members:
VERIFY_TI_ENROLLMENT
VERIFY_TI_VERIFICATION
VERIFY_TD_ENROLLMENT
VERIFY_TD_VERIFICATION
LIVENESS
- class voicesdk.media.QualityCheckShortDescription¶
Enumeration representing short descriptions of the audio quality check results.
TOO_NOISY: Too noisy audio. TOO_SMALL_SPEECH_TOTAL_LENGTH: Too small speech length in the audio. TOO_SMALL_SPEECH_RELATIVE_LENGTH: Too small speech relative length (speech length relative to the total audio length). MULTIPLE_SPEAKERS_DETECTED: Multiple speakers detected. OK: Audio successfully passed quality check.
Members:
TOO_NOISY
TOO_SMALL_SPEECH_TOTAL_LENGTH
TOO_SMALL_SPEECH_RELATIVE_LENGTH
MULTIPLE_SPEAKERS_DETECTED
OK
- class voicesdk.media.SNRComputer¶
SNRComputer class, intended to calculate signal-to-noise (SNR) ratio with given audio signal.
- __init__(init_data_path: str)¶
SNRComputer constructor.
- Parameters
init_data_path (str) – Path to the directory containing init data.
- compute_with_file(path_to_audio_file: str) float ¶
Calculates SNR with given audio file.
- Parameters
path_to_audio_file (str) – Path to audio file.
- Returns
Computed SNR in dB.
- Return type
float
- compute_with_samples(*args, **kwargs)¶
Overloaded function.
compute_with_samples(samples: numpy.ndarray[numpy.int16], sample_rate: int) -> float
Calculates SNR with given PCM16 audio samples.
- Parameters
samples (numpy.array) – PCM16 (numpy.int16) audio samples.
sample_rate (int) – Input audio signal sample rate in Hz.
- Returns
Computed SNR in dB.
- Return type
float
compute_with_samples(samples: numpy.ndarray[numpy.uint8], sample_rate: int) -> float
Calculates SNR with given PCM16 audio samples.
- Parameters
samples (numpy.array) – PCM16 (numpy.uint8) audio samples.
sample_rate (int) – Input audio signal sample rate in Hz.
- Returns
Computed SNR in dB.
- Return type
float
compute_with_samples(samples: numpy.ndarray[numpy.float32], sample_rate: int) -> float
Calculates SNR with given PCM16 audio samples.
- Parameters
samples (numpy.array) – float (numpy.float32) audio samples.
sample_rate (int) – Input audio signal sample rate in Hz.
- Returns
Computed SNR in dB.
- Return type
float
- class voicesdk.media.SpeechEndpointDetector¶
Speech processor class for speech endpoint detection.
- __init__(min_speech_length_ms: int, max_silence_length_ms: int, sample_rate: int)¶
SpeechEndpointDetector constructor.
- Parameters
min_speech_length_ms (int) – Silence after speech threshold used to determine if speech is already ended (ms).
max_silence_length_ms (int) – Minimum speech length required to begin speech end detection (ms).
sample_rate (int) – Input audio sample rate.
- add_samples(*args, **kwargs)¶
Overloaded function.
add_samples(samples: numpy.ndarray[numpy.int16])
Adds new audio samples to the SpeechEndpointDetector.
- Parameters
samples (numpy.array) – PCM16 (numpy.int16) audio samples.
Note: Audio sample rate is predefined in the constructor.
add_samples(samples: numpy.ndarray[numpy.uint8])
Adds new audio samples to the SpeechEndpointDetector.
- Parameters
samples (numpy.array) – PCM16 (numpy.uint8) audio samples.
Note: Audio sample rate is predefined in the constructor.
add_samples(samples: numpy.ndarray[numpy.float32])
Adds new audio samples to the SpeechEndpointDetector.
- Parameters
samples (numpy.array) – float (numpy.float32) audio samples.
Note: Audio sample rate is predefined in the constructor.
- is_speech_ended() bool ¶
Returns detection state.
- Returns
True if speech end was detected, False otherwise.
- Return type
bool
- reset()¶
Resets detector state.
- class voicesdk.media.SpeechEndpointDetectorOpus¶
Speech processor class for speech endpoint detection in the Opus audio stream.
- __init__(min_speech_length_ms: int, max_silence_length_ms: int, sample_rate: int)¶
SpeechEndpointDetectorOpus constructor.
- Parameters
min_speech_length_ms (int) – Silence after speech threshold used to determine if speech is already ended (ms).
max_silence_length_ms (int) – Minimum speech length required to begin speech end detection (ms).
sample_rate (int) – Input audio sample rate.
- add_packet(bytes: numpy.ndarray[numpy.uint8])¶
Adds Opus packet to the SpeechEndpointDetector.
- Parameters
bytes (numpy.array) – A buffer containing single Opus packet. It is expected that packet contains data for single mono stream.
Note: Audio sample rate is predefined in the constructor.
- is_speech_ended() bool ¶
Returns detection state.
- Returns
True if speech end was detected, False otherwise.
- Return type
bool
- reset()¶
Resets detector state.
- class voicesdk.media.SpeechEvent¶
Class representing a single speech event.
- property audio_interval¶
Speech event audio interval.
- Type
- property is_voice¶
Whether the frame contains speech or not.
- Type
bool
- class voicesdk.media.SpeechInfo¶
Class that contains metrics related to Voice Activity Detection.
- property background_length_ms¶
Total length of non-speech signal, milliseconds.
- Type
float
- property speech_length_ms¶
Total accumulated speech duration, milliseconds.
- Type
float
- property total_length_ms¶
Total audio record duration, milliseconds.
- Type
float
- class voicesdk.media.SpeechSummary¶
Speech summary class.
- property speech_events¶
Retrieves list of speech events.
- Returns
speech events.
- Return type
list
- property speech_info¶
Retrieves speech info data.
- Returns
speech info.
- Return type
- class voicesdk.media.SpeechSummaryEngine¶
Speech summary engine class, intended to calculate SpeechSummary with given audio samples.
- __init__(init_data_path: str)¶
SpeechSummaryEngine constructor.
- Parameters
init_data_path (str) – Path to the directory containing engine init data.
- create_stream(stream_sample_rate: int) pyvoicesdk::SpeechSummaryStream ¶
Factory method for creating SpeechSummaryStream.
- Parameters
stream_sample_rate (int) – Audio stream sample rate in Hz.
- Returns
Created speech summary stream.
- Return type
- get_speech_summary_from_file(path_to_audio_file: str) voicesdk::SpeechSummary ¶
Calculates speech summary with given audio file.
- Parameters
path_to_audio_file (str) – Path to audio file.
- Returns
Speech summary.
- Return type
- get_speech_summary_from_samples(*args, **kwargs)¶
Overloaded function.
get_speech_summary_from_samples(samples: numpy.ndarray[numpy.int16], sample_rate: int) -> voicesdk::SpeechSummary
Calculates speech summary with given PCM16 audio samples.
- Parameters
samples (numpy.array) – PCM16 (numpy.int16) audio samples.
sample_rate (int) – Audio sample rate in Hz.
- Returns
Speech summary.
- Return type
get_speech_summary_from_samples(samples: numpy.ndarray[numpy.uint8], sample_rate: int) -> voicesdk::SpeechSummary
Calculates speech summary with given PCM16 audio samples.
- Parameters
samples (numpy.array) – PCM16 (numpy.uint8) audio samples.
sample_rate (int) – Audio sample rate in Hz.
- Returns
Speech summary.
- Return type
get_speech_summary_from_samples(samples: numpy.ndarray[numpy.float32], sample_rate: int) -> voicesdk::SpeechSummary
Calculates speech summary with given PCM16 audio samples.
- Parameters
samples (numpy.array) – float (numpy.float32) audio samples.
sample_rate (int) – Audio sample rate in Hz.
- Returns
Speech summary.
- Return type
- class voicesdk.media.SpeechSummaryStream¶
Stateful speech summary stream class, intended to calculate SpeechSummary with audio samples given in stream. New instance can be obtained with a SpeechSummaryEngine instance.
- add_samples(*args, **kwargs)¶
Overloaded function.
add_samples(samples: numpy.ndarray[numpy.int16])
Adds PCM16 audio samples to process.
- Parameters
samples (numpy.array) – PCM16 (numpy.int16) audio samples.
Note: Audio sample rate is predefined at the stream creation time.
add_samples(samples: numpy.ndarray[numpy.uint8])
Adds PCM16 audio samples to process.
- Parameters
samples (numpy.array) – PCM16 (numpy.uint8) audio samples.
Note: Audio sample rate is predefined at the stream creation time.
add_samples(samples: numpy.ndarray[numpy.float32])
Adds PCM16 audio samples to process.
- Parameters
samples (numpy.array) – float (numpy.float32) audio samples.
Note: Audio sample rate is predefined at the stream creation time.
- finalize()¶
Finalizes input audio stream to process remaining audio samples and produce result if it’s possible.
- get_current_background_length() float ¶
Returns current background length in milliseconds.
- Returns
Current background length in milliseconds.
- Return type
float
- get_speech_event() voicesdk.media.SpeechEvent ¶
Retrieves speech event from output queue.
- Returns
One speech event.
- Return type
- Raises
RuntimeError – If output queue is empty.
Note: Use has_speech_events() to check if there are available speech event.
- get_total_speech_info() voicesdk.media.SpeechInfo ¶
Retrieves accumulated speech info data.
- Returns
speech info.
- Return type
- get_total_speech_summary() voicesdk::SpeechSummary ¶
Retrieves accumulated speech summary data.
- Returns
speech summary.
- Return type
- has_speech_events() bool ¶
Checks if any speech events are present in stream queue.
- Returns
True if any events are present, false otherwise.
- Return type
bool
- reset()¶
Resets stream state: clears buffer, resets speech summary.
- class voicesdk.media.SpeechSummaryStreamOpus¶
Stateful speech summary stream class, intended to calculate SpeechSummary with audio samples given in Opus stream. New instance can be obtained with a SpeechSummaryEngine instance.
- __init__(init_data_path: str, sample_rate: int)¶
SpeechSummaryStreamOpus constructor.
- Parameters
init_data_path (str) – Path to the directory containing engine init data.
sample_rate (int) – Audio stream sample rate in Hz.
- add_packet(bytes: numpy.ndarray[numpy.uint8])¶
Adds Opus packet to the SpeechSummaryStreamOpus.
- Parameters
bytes (numpy.array) – A buffer containing single Opus packet. It is expected that packet contains data for single mono stream.
Note: Audio sample rate is predefined in the constructor.
- finalize()¶
Finalizes input audio stream to process remaining audio samples and produce result if it’s possible.
- get_current_background_length() float ¶
Returns current background length in milliseconds.
- Returns
Current background length in milliseconds.
- Return type
float
- get_speech_event() voicesdk.media.SpeechEvent ¶
Retrieves speech event from output queue.
- Returns
One speech event.
- Return type
- Raises
RuntimeError – If output queue is empty.
Note: Use has_speech_events() to check if there are available speech event.
- get_total_speech_info() voicesdk.media.SpeechInfo ¶
Retrieves accumulated speech info data.
- Returns
speech info.
- Return type
- get_total_speech_summary() voicesdk::SpeechSummary ¶
Retrieves accumulated speech summary data.
- Returns
speech summary.
- Return type
- has_speech_events() bool ¶
Checks if any speech events are present in stream queue.
- Returns
True if any events are present, false otherwise.
- Return type
bool
- reset()¶
Resets stream state: clears buffer, resets speech summary.
- class voicesdk.verify.VerifyResult¶
Voice verification result class.
- property probability¶
Voice matching probability from 0 to 1, should be used for making a biometrics authentication decision.
- Type
float
- property score¶
Raw verification score, intended to be used for evaluation and data-wise calibration.
- Type
float
- class voicesdk.verify.VerifyStreamResult¶
Streaming voice verification result class.
- property audio_interval¶
Audio interval, which verify result refers to.
- Type
- property verify_result¶
Voice verification result.
- class voicesdk.verify.VoiceTemplateFactory¶
Voice verification engine class.
- __init__(init_data_path: str)¶
VoiceTemplateFactory constructor.
- Parameters
init_data_path (str) – Path to the directory containing factory init data.
- create_voice_template_batch_from_file(input_batch: list[voicesdk.verify.VerifyFileBatchElement]) list[voicesdk.core.VoiceTemplate] ¶
Creates multiple voice templates from the contents of the given WAV files.
Args: input_batch (list): List of VerifyFileBatchElementReturns: Created voice templates.
- create_voice_template_batch_from_samples(*args, **kwargs)¶
Overloaded function.
create_voice_template_batch_from_samples(input_batch: list[voicesdk.verify.VerifySamplesBatchElementFloat]) -> list[voicesdk.core.VoiceTemplate]
Creates multiple voice templates from given audio samples.
Args: input_batch (list): List of VerifySamplesBatchElementFloat Returns: Created voice templates.
create_voice_template_batch_from_samples(input_batch: list[voicesdk.verify.VerifySamplesBatchElementInt16]) -> list[voicesdk.core.VoiceTemplate]
Creates multiple voice templates from given audio samples.
Args: input_batch (list): List of VerifySamplesBatchElementInt16 Returns: Created voice templates.
create_voice_template_batch_from_samples(input_batch: list[voicesdk.verify.VerifySamplesBatchElementUint8]) -> list[voicesdk.core.VoiceTemplate]
Creates multiple voice templates from given audio samples.
Args: input_batch (list): List of VerifySamplesBatchElementUint8 Returns: Created voice templates.
- create_voice_template_from_file(path_to_audio_file: str, channel_type: voicesdk.core.ChannelType = <ChannelType.MIC: 1>) voicesdk.core.VoiceTemplate ¶
Creates voice template from the contents of the given audio file.
- Parameters
path_to_audio_file (str) – Path to audio file.
channel_type (
ChannelType
, optional) – Input audio channel type, default is ChannelType.MIC.
- Returns
Created voice template.Note:
- Return type
Sampling frequency should be equal to or greater than the value returned by get_minimum_audio_sample_rate.
- create_voice_template_from_samples(*args, **kwargs)¶
Overloaded function.
create_voice_template_from_samples(samples: numpy.ndarray[numpy.int16], sample_rate: int, channel_type: voicesdk.core.ChannelType = <ChannelType.MIC: 1>) -> voicesdk.core.VoiceTemplate
Creates voice template from audio samples.
- Parameters
samples (numpy.array) – PCM16 (numpy.int16) audio samples.
sample_rate (int) – Audio sample rate in Hz.
channel_type (
ChannelType
, optional) – Input audio channel type, default is ChannelType.MIC.
- Returns
Created voice template.Note:
- Return type
Sampling frequency should be equal to or greater than the value returned by get_minimum_audio_sample_rate.
create_voice_template_from_samples(samples: numpy.ndarray[numpy.uint8], sample_rate: int, channel_type: voicesdk.core.ChannelType = <ChannelType.MIC: 1>) -> voicesdk.core.VoiceTemplate
Creates voice template from audio samples.
- Parameters
samples (numpy.array) – PCM16 (numpy.uint8) audio samples.
sample_rate (int) – Audio sample rate in Hz.
channel_type (
ChannelType
, optional) – Input audio channel type, default is ChannelType.MIC.
- Returns
Created voice template.Note:
- Return type
Sampling frequency should be equal to or greater than the value returned by get_minimum_audio_sample_rate.
create_voice_template_from_samples(samples: numpy.ndarray[numpy.float32], sample_rate: int, channel_type: voicesdk.core.ChannelType = <ChannelType.MIC: 1>) -> voicesdk.core.VoiceTemplate
Creates voice template from audio samples.
- Parameters
samples (numpy.array) – Float (numpy.float32) audio samples.
sample_rate (int) – Audio sample rate in Hz.
channel_type (
ChannelType
, optional) – Input audio channel type, default is ChannelType.MIC.
- Returns
Created voice template.Note:
- Return type
Sampling frequency should be equal to or greater than the value returned by get_minimum_audio_sample_rate.
- get_init_data_id() str ¶
Returns ID of the init data, which was used to create the factory.
- Returns
A string containing init data ID.
- Return type
str
- get_minimum_audio_sample_rate() int ¶
Returns minimum supported input audio sampling frequency in Hz.
- Returns
A minimum sampling rate in Hz.
- Return type
int
- merge_voice_templates(voice_templates: list) voicesdk.core.VoiceTemplate ¶
Merges a list of voice templates of the same speaker producing a union template.
- Parameters
voice_templates (list) – List of voice templates.
- Returns
A union voice template.Note:
- Return type
All the templates should have the same init data ID as the factory instance.
- class voicesdk.verify.VoiceTemplateMatcher¶
Voice verification engine class.
- __init__(init_data_path: str)¶
VoiceTemplateMatcher constructor.
- Parameters
init_data_path (str) – Path to the directory containing matcher init data.
- get_init_data_id() str ¶
Returns ID of the init data, which was used to create the matcher.
- Returns
A string containing init data ID.
- Return type
str
- match_voice_templates(template1: voicesdk.core.VoiceTemplate, template2: voicesdk.core.VoiceTemplate) voicesdk.verify.VerifyResult ¶
Matches two voice templates one-to-one.
- Parameters
template1 (VoiceTemplate) – First voice template.
template2 (VoiceTemplate) – Second voice template.
- Returns
Verification result.Note:
- Return type
Both templates should have the same init data ID as the matcher instance.
- class voicesdk.verify.VoiceVerifyStream¶
Class for continuous voice verification using audio stream.
- __init__(voice_template_factory: pyvoicesdk::VoiceTemplateFactory, voice_template_matcher: pyvoicesdk::VoiceTemplateMatcher, voice_templates: list[voicesdk.core.VoiceTemplate], sample_rate: int, audio_context_length_seconds: int = 10, window_length_seconds: float = 3)¶
Note: Sampling frequency should be equal to or greater than the value returned by voice_template_factory.get_minimum_audio_sample_rate(). Voice template matcher, voice template factory and voice template should have the same init data ID.
- add_samples(*args, **kwargs)¶
Overloaded function.
add_samples(samples: numpy.ndarray[numpy.int16])
Adds PCM16 audio samples to process.
- Parameters
samples (numpy.array) – PCM16 (numpy.int16) audio samples.Note:
Audio sample rate is predefined at the create time. If exception was thrown, then stream state is being reset except the accumulated results buffer.
add_samples(samples: numpy.ndarray[numpy.uint8])
Adds PCM16 audio samples to process.
- Parameters
samples (numpy.array) – PCM16 (numpy.uint8) audio samples.Note:
Audio sample rate is predefined at the create time. If exception was thrown, then stream state is being reset except the accumulated results buffer.
add_samples(samples: numpy.ndarray[numpy.float32])
Adds PCM16 audio samples to process.
- Parameters
samples (numpy.array) – float (numpy.float32) audio samples.Note:
Audio sample rate is predefined at the create time. If exception was thrown, then stream state is being reset except the accumulated results buffer.
- finalize()¶
Finalizes input audio stream to process remaining audio samples and produce result if it’s possible.
- get_verify_result() list[voicesdk.verify.VerifyStreamResult] ¶
Retrieves verification result from output queue containing one verify stream result for each reference template.
- Returns
One verify result.
- Return type
- Raises
RuntimeError – If output queue is empty.Note:
Use has_verify_results() to check if there are available verify results.
- get_verify_result_for_one_template() voicesdk.verify.VerifyStreamResult ¶
Retrieves verification result from output queue consisting of single verify stream result corresponding to the zeroth reference template. Suitable for the case when the only one reference template was specified. Behaves the same as get_verify_result in IDVoice < 3.13, if only one reference template was set
- Returns
One verify result for the zeroth reference template.
- Return type
- Raises
RuntimeError – If output queue is empty.Note:
Use has_verify_results() to check if there are available verify results.
- has_verify_results() bool ¶
Checks if there are verify results in output queue.
- Returns
True is there are results available, else otherwise.
- Return type
bool
- reset()¶
Resets stream state.
- class voicesdk.verify.VoiceVerifyStreamOpus¶
Class for continuous voice verification using Opus audio stream.
- __init__(voice_template_factory: pyvoicesdk::VoiceTemplateFactory, voice_template_matcher: pyvoicesdk::VoiceTemplateMatcher, voice_template: voicesdk.core.VoiceTemplate, sample_rate: int, audio_context_length_seconds: int = 10)¶
audio_context_length_seconds (
int
, optional): Length of audio context for voice verification in seconds, must be at least 3 seconds.Note:Sampling frequency should be equal to or greater than the value returned by voice_template_factory.get_minimum_audio_sample_rate(). Voice template matcher, voice template factory and voice template should have the same init data ID.
- add_packet(bytes: numpy.ndarray[numpy.uint8])¶
Adds Opus packet to process.
- Parameters
bytes (numpy.array) – Opus packet bytes (numpy.uint8). It is expected that packet contains data for single mono stream.Note:
Audio sample rate is predefined at the create time. If exception was thrown, then stream state is being reset except the accumulated results buffer.
- finalize()¶
Finalizes input audio stream to process remaining audio samples and produce result if it’s possible.
- get_verify_result() voicesdk.verify.VerifyStreamResult ¶
Retrieves verify result from output queue.
- Returns
One verify result.
- Return type
- Raises
RuntimeError – If output queue is empty.Note:
Use has_verify_results() to check if there are available verify results.
- has_verify_results() bool ¶
Checks if there are verify results in output queue.
- Returns
True is there are results available, else otherwise.
- Return type
bool
- reset()¶
Resets stream state.
- class voicesdk.liveness.LivenessEngine¶
Voice liveness check class.
- __init__(init_data_path: str)¶
LivenessEngine constructor.
- Parameters
init_data_path (str) – Path to the directory containing engine init data.
- check_liveness_file(path_to_audio_file: str) voicesdk.liveness.LivenessResult ¶
Checks voice liveness from the given audio file.
- Parameters
path_to_audio_file (str) – Path to audio file.
- Returns
Liveness check result.
- Return type
- check_liveness_samples(*args, **kwargs)¶
Overloaded function.
check_liveness_samples(samples: numpy.ndarray[numpy.int16], sample_rate: int) -> voicesdk.liveness.LivenessResult
Checks voice liveness from the given PCM16 audio samples.
- Parameters
samples (numpy.array) – PCM16 (numpy.int16) audio samples.
sample_rate (int) – Audio sample rate in Hz.
- Returns
Liveness check result.
- Return type
check_liveness_samples(samples: numpy.ndarray[numpy.uint8], sample_rate: int) -> voicesdk.liveness.LivenessResult
Checks voice liveness from the given PCM16 audio bytes.
- Parameters
samples (numpy.array) – PCM16 (numpy.uint8) audio bytes.
sample_rate (int) – Audio sample rate in Hz.
- Returns
Liveness check result.
- Return type
check_liveness_samples(samples: numpy.ndarray[numpy.float32], sample_rate: int) -> voicesdk.liveness.LivenessResult
Checks voice liveness from the given float audio samples.
- Parameters
samples (numpy.array) – float (numpy.float32) audio samples.
sample_rate (int) – Audio sample rate in Hz.
- Returns
Liveness check result.
- Return type
- class voicesdk.liveness.LivenessResult¶
Voice liveness check result.
- get_status_code() voicesdk.liveness.LivenessResultValidationStatusCode ¶
Gets validation status code
- Returns
Validation status code.
- Return type
- get_value() voicesdk.liveness.LivenessResultValue ¶
Gets liveness check result value. Available only if validation preceding liveness check was successful, see Ok
- Returns
Liveness check result value.
- Return type
LivenessResultValue
- ok() bool ¶
- Returns flag indicating whether validation was successful and liveness result value is available.Returns:
bool: flag indicating whether validation was successful and liveness result value is available.
- class voicesdk.liveness.LivenessResultValidationStatusCode¶
Enumeration representing status code of validation preceding liveness check.
TOO_SMALL_SPEECH_LENGTH: speech length is too small for liveness check to be performed. OK: Successful validation.
Members:
TOO_SMALL_SPEECH_LENGTH
OK