Voice SDK 5.1.0 Python API documentation

class voicesdk.core.AudioInfo

Class containing audio info

property channels_num

Number of audio channels

Type:

int

property sample_rate

Audio sample rate in Hz

Type:

int

property samples_num

Number of audio samples

Type:

int

class voicesdk.core.AudioInterval

Class representing interval of audio in both samples and milliseconds.

property end_sample

sample number where AudioInterval ends (not inclusive).

Type:

float

property end_time

AudioInterval end in milliseconds (not inclusive).

Type:

float

property sample_rate

Sample rate of corresponding audio.

Type:

float

property start_sample

sample number where AudioInterval starts.

Type:

float

property start_time

AudioInterval start in milliseconds.

Type:

float

class voicesdk.core.BuildInfo

Structure containing present VoiceSDK build info.

property components

VoiceSDK components presented in build.

Type:

str

property git_info

Git info dump at the build stage.

Type:

str

property license_expiration_date

License expiration date in YYYY-MM-DD format. The date corresponds to the SDK feature that expires first. Deprecated, use get_license_info instead.

Type:

str

property license_info

Information (e.g. expiration date) about the installed license if available or an empty string if no license is in use. Deprecated, use get_license_info instead.

Type:

str

property version

VoiceSDK build version.

Type:

str

class voicesdk.core.ChannelType

Enumeration for audio source labeling during voice template creation.

MIC

Microphone audio channel.

TEL

Telephone audio channel.

MIXED

Mixed audio channel.

Members:

MIC

TEL

MIXED

class voicesdk.core.LicenseFeature

VoiceSDK licensed features.

Members:

Core : Core functionality.

Verification : Voice verification.

LivenessPresentationAttackDetection : Voice liveness (presentation/replay attack detection).

LivenessVoiceClonesDetection : Voice liveness (voice clones detection).

QualityChecking : Quality checking functionality (SNR, speech length etc.).

class voicesdk.core.LicenseFeatureInfo

VoiceSDK feature information.

property expiration_date

Feature expiration date in YYYY-MM-DD format.

Type:

str

property feature

License feature.

Type:

LicenseFeature

class voicesdk.core.OpusUtils

OpusUtils class, contains static methods for Opus files reading.

static read_as_pcm16_samples_from_memory(opus_file: str) tuple

Reads Opus file from a memory buffer and decodes it to PCM16 samples buffer.

Parameters:

opus_file (bytes) – Memory buffer containing complete Opus file contents.

Returns:

Tuple with numpy.array (of numpy.int16 type) and signal sample rate.

Return type:

tuple

class voicesdk.core.TimeInterval

Class representing time interval.

property end_time

timestamp in milliseconds where TimeInterval ends (not inclusive).

Type:

float

property start_time

timestamp in milliseconds where TimeInterval starts.

Type:

float

class voicesdk.core.VoiceTemplate

Voice template class.

__init__(bytes: bytes)

Constructs voice templates from its serialized representation.

Parameters:

bytes (bytes) – Bytes object with serialized voice template.

static deserialize(bytes: bytes) voicesdk.core.VoiceTemplate

Factory method, deserializes voice template from bytes.

Parameters:

bytes (bytes) – Bytes object with serialized voice template.

Returns:

voice template instance.

Return type:

VoiceTemplate

get_channel_type() voicesdk.core.ChannelType

Returns voice template channel type which was specified by user on creation.

Returns:

Channel type

Return type:

ChannelType

get_init_data_id() str

Returns ID of the init data, which was used to create the template.

Returns:

A string containing init data ID

Return type:

str

is_valid() bool

Checks if voice template is valid or not.

Returns:

True if valid, else otherwise.

Return type:

bool

static load_from_file(path_to_file: str) voicesdk.core.VoiceTemplate

Factory method, restores voice template from the given file.

Parameters:

path_to_file (str) – Path to template file.

Returns:

voice template instance.

Return type:

VoiceTemplate

save_to_file(path_to_file: str)

Stores voice template in a file of the given path.

Parameters:

path_to_file (str) – Path to template file.

serialize() bytes

Serializes voice template to bytes.

Returns:

Serialized voice template.

Return type:

bytes

class voicesdk.core.VoiceTemplateConverter

Voice template conversion class.

__init__(init_data_path: str)

VoiceTemplateConverter constructor.

Parameters:

init_data_path (str) – Path to the directory containing init data.

convert_voice_template(voice_template: voicesdk.core.VoiceTemplate) voicesdk.core.VoiceTemplate

Converts voice template from one configuration to another.

Parameters:

voice_template (VoiceTemplate) – Voice template to be converted.

Returns:

Converted voice template.

Return type:

VoiceTemplate

get_input_init_data_id() str

Returns init data ID that voice template to be converted should have.

Returns:

A string containing init data ID of the voice template to be converted.

Return type:

str

get_output_init_data_id() str

Returns init data ID that converted voice template will have.

Returns:

A string containing init data ID of the converted voice template.

Return type:

str

class voicesdk.core.WavUtils

WavUtils class, contains static methods for WAV files reading.

static get_audio_info(path_to_wav_file: str) voicesdk.core.AudioInfo

Returns WAV file audio info.

Parameters:
  • path_to_wav_file (str) – Path to WAV file.Returns:

  • AudioInfo – object containing audio info.

static get_audio_info_from_memory(wav_file: str) voicesdk.core.AudioInfo

Returns WAV file audio info.

Parameters:
  • wav_file (bytes) – Memory buffer containing complete WAV file contents.Returns:

  • AudioInfo – object containing audio info.

static read_as_float_samples(path_to_wav_file: str) tuple

Reads WAV file as a float samples buffer (WAV file can be of any format).

Parameters:

path_to_wav_file (str) – Path to WAV file.

Returns:

Tuple with numpy.array (of numpy.float32 type) and signal sample rate.

Return type:

tuple

static read_as_float_samples_16bit(path_to_wav_file: str) tuple

Reads WAV file as a float buffer with 16-bit precision (WAV file can be of any format).

Parameters:

path_to_wav_file (str) – Path to WAV file.

Returns:

Tuple with numpy.array (of numpy.float32 type) and signal sample rate.

Return type:

tuple

static read_as_float_samples_16bit_from_memory(wav_file: str) tuple

Reads WAV file as a float samples buffer with 16-bit precision (WAV file can be of any format).

Parameters:

wav_file (bytes) – Memory buffer containing complete WAV file contents.

Returns:

Tuple with numpy.array (of numpy.float32 type) and signal sample rate.

Return type:

tuple

static read_as_float_samples_from_memory(wav_file: str) tuple

Reads WAV file as a float samples buffer (WAV file can be of any format).

Parameters:

wav_file (bytes) – Memory buffer containing complete WAV file contents.

Returns:

Tuple with numpy.array (of numpy.float32 type) and signal sample rate.

Return type:

tuple

static read_as_pcm16_bytes(path_to_wav_file: str) tuple

Reads WAV file as a PCM16 bytes buffer (WAV file can be of any format).

Parameters:

path_to_wav_file (str) – Path to WAV file.

Returns:

Tuple with numpy.array (of numpy.uint8 type) and signal sample rate.

Return type:

tuple

static read_as_pcm16_bytes_from_memory(wav_file: str) tuple

Reads WAV file as a PCM16 bytes buffer (WAV file can be of any format).

Parameters:

wav_file (bytes) – Memory buffer containing complete WAV file contents.

Returns:

Tuple with numpy.array (of numpy.uint8 type) and signal sample rate.

Return type:

tuple

static read_as_pcm16_samples(path_to_wav_file: str) tuple

Reads WAV file as a PCM16 samples buffer (WAV file can be of any format).

Parameters:

path_to_wav_file (str) – Path to WAV file.

Returns:

Tuple with numpy.array (of numpy.int16 type) and signal sample rate.

Return type:

tuple

static read_as_pcm16_samples_from_memory(wav_file: str) tuple

Reads WAV file as a PCM16 samples buffer (WAV file can be of any format).

Parameters:

wav_file (bytes) – Memory buffer containing complete WAV file contents.

Returns:

Tuple with numpy.array (of numpy.int16 type) and signal sample rate.

Return type:

tuple

voicesdk.core.get_build_info() voicesdk.core.BuildInfo

Returns present VoiceSDK build info.

Returns:

Present VoiceSDK present build info.

Return type:

BuildInfo

voicesdk.core.get_license_info() list[voicesdk.core.LicenseFeatureInfo]

Returns information (enabled features and expiration dates) about the installed license if available.

Returns:

List of VoiceSDK features available with the installed license.

Return type:

list

voicesdk.core.set_num_threads(arg0: int)

Sets the maximum number of threads available for VoiceSDK.If 0 passed, then the optimal number of threads is detected automatically (the same effect is achieved if setNumThreads is not called).

Parameters:

num_threads (int) – Maximum number of threads available for VoiceSDK.

voicesdk.core.set_use_voice_template_compression(arg0: bool)

Sets whether to use compression for voice templates serialization. Voice template compression is not used by default.

Parameters:

use_voice_template_compression (bool) – Whether to use compression for voice templates serialization.

class voicesdk.media.QualityCheckEngine

Quality check engine class.

__init__(init_data_path: str)

QualityCheckEngine constructor.

Parameters:

init_data_path (str) – Path to the directory containing initialization data.

check_quality_from_file(path_to_audio_file: str, thresholds: voicesdk.media.QualityCheckMetricsThresholds) voicesdk.media.QualityCheckEngineResult
Parameters:
  • path_to_audio_file (str) – Path to audio file.

  • thresholds – (QualityCheckMetricsThresholds): Quality checking thresholds that will be applied to the output quality check metrics.

Returns:

Quality check result.

Return type:

QualityCheckEngineResult

check_quality_from_samples(*args, **kwargs)

Overloaded function.

  1. check_quality_from_samples(samples: numpy.ndarray[numpy.int16], sample_rate: int, thresholds: voicesdk.media.QualityCheckMetricsThresholds) -> voicesdk.media.QualityCheckEngineResult

Checks whether audio buffer is suitable from the quality perspective, from the given PCM16 samples.

Parameters:
  • samples (numpy.array) – PCM16 (numpy.int16) audio samples.

  • sample_rate (int) – Audio sample rate in Hz.

  • thresholds – (QualityCheckMetricsThresholds): Quality checking thresholds that will be applied to the output quality check metrics.

Returns:

Quality check result.

Return type:

QualityCheckEngineResult

  1. check_quality_from_samples(samples: numpy.ndarray[numpy.uint8], sample_rate: int, thresholds: voicesdk.media.QualityCheckMetricsThresholds) -> voicesdk.media.QualityCheckEngineResult

Checks whether audio buffer is suitable from the quality perspective, from the given PCM16 audio bytes.

Parameters:
  • samples (numpy.array) – PCM16 (numpy.uint8) audio samples.

  • sample_rate (int) – Audio sample rate in Hz.

  • thresholds – (QualityCheckMetricsThresholds): Quality checking thresholds that will be applied to the output quality check metrics.

Returns:

Quality check result.

Return type:

QualityCheckEngineResult

  1. check_quality_from_samples(samples: numpy.ndarray[numpy.float32], sample_rate: int, thresholds: voicesdk.media.QualityCheckMetricsThresholds) -> voicesdk.media.QualityCheckEngineResult

Checks whether audio buffer is suitable from the quality perspective.

Parameters:
  • samples (numpy.array) – Float (numpy.float32) audio samples, from the given float audio samples.

  • sample_rate (int) – Audio sample rate in Hz.

  • thresholds – (QualityCheckMetricsThresholds): Quality checking thresholds that will be applied to the output quality check metrics.

Returns:

Quality check result.

Return type:

QualityCheckEngineResult

Parameters:

scenario – (QualityCheckScenario): Scenario for which recommended thresholds will be returned.

Returns:

Quality check thresholds that can be used on quality checking.

Return type:

QualityCheckMetricsThresholds

class voicesdk.media.QualityCheckEngineResult

Quality check result class

property multiple_speakers_detector_score

Multiple speakers detector score value obtained on quality check

Type:

float

property quality_check_short_description

Short description of the quality check results

property snr_db

SNR metric value obtained on quality check in Db

Type:

float

property speech_length_ms

Speech length metric value obtained on quality check in milliseconds

Type:

float

property speech_relative_length

Speech relative length (speech length relative to the total audio length) metric value obtained on quality check

Type:

float

class voicesdk.media.QualityCheckMetricsThresholds

Class for quality checking thresholds.

__init__(*args, **kwargs)

Overloaded function.

  1. __init__()

  2. __init__(minimum_snr_db: float, minimum_speech_length_ms: float, minimum_speech_relative_length: float, maximum_multiple_speakers_detector_score: float)

maximum_multiple_speakers_detector_score (float): Maximum multiple speakers detector score allowed to pass quality check.

property maximum_multiple_speakers_detector_score

Maximum multiple speakers detector score allowed to pass quality check.

Type:

float

property minimum_snr_db

Minimum signal-to-noise ratio required to pass quality check in dB.

Type:

float

property minimum_speech_length_ms

Minimum speech length required to pass quality check in milliseconds.

Type:

float

property minimum_speech_relative_length

Minimum speech relative length (speech length relative to the total audio length) required to pass quality check.

Type:

float

class voicesdk.media.QualityCheckScenario

Enumeration representing scenarios used to get recommended quality check thresholds.

VERIFY_TI_ENROLLMENT: Verification, TI enrollment step. VERIFY_TI_VERIFICATION: Verification, TI verification step. VERIFY_TD_ENROLLMENT: Verification, TD enrollment step. VERIFY_TD_VERIFICATION: Verification, TD verification step. LIVENESS: Liveness check.

Members:

VERIFY_TI_ENROLLMENT

VERIFY_TI_VERIFICATION

VERIFY_TD_ENROLLMENT

VERIFY_TD_VERIFICATION

LIVENESS

class voicesdk.media.QualityCheckShortDescription

Enumeration representing short descriptions of the audio quality check results.

TOO_NOISY: Too noisy audio. TOO_SMALL_SPEECH_TOTAL_LENGTH: Too small speech length in the audio. TOO_SMALL_SPEECH_RELATIVE_LENGTH: Too small speech relative length (speech length relative to the total audio length). MULTIPLE_SPEAKERS_DETECTED: Multiple speakers detected. OK: Audio successfully passed quality check.

Members:

TOO_NOISY

TOO_SMALL_SPEECH_TOTAL_LENGTH

TOO_SMALL_SPEECH_RELATIVE_LENGTH

MULTIPLE_SPEAKERS_DETECTED

OK

class voicesdk.media.SNRComputer

SNRComputer class, intended to calculate signal-to-noise (SNR) ratio with given audio signal.

__init__(init_data_path: str)

SNRComputer constructor.

Parameters:

init_data_path (str) – Path to the directory containing init data.

compute_with_file(path_to_audio_file: str) float

Calculates SNR with given audio file.

Parameters:

path_to_audio_file (str) – Path to audio file.

Returns:

Computed SNR in dB.

Return type:

float

compute_with_samples(*args, **kwargs)

Overloaded function.

  1. compute_with_samples(samples: numpy.ndarray[numpy.int16], sample_rate: int) -> float

Calculates SNR with given PCM16 audio samples.

Parameters:
  • samples (numpy.array) – PCM16 (numpy.int16) audio samples.

  • sample_rate (int) – Input audio signal sample rate in Hz.

Returns:

Computed SNR in dB.

Return type:

float

  1. compute_with_samples(samples: numpy.ndarray[numpy.uint8], sample_rate: int) -> float

Calculates SNR with given PCM16 audio samples.

Parameters:
  • samples (numpy.array) – PCM16 (numpy.uint8) audio samples.

  • sample_rate (int) – Input audio signal sample rate in Hz.

Returns:

Computed SNR in dB.

Return type:

float

  1. compute_with_samples(samples: numpy.ndarray[numpy.float32], sample_rate: int) -> float

Calculates SNR with given PCM16 audio samples.

Parameters:
  • samples (numpy.array) – float (numpy.float32) audio samples.

  • sample_rate (int) – Input audio signal sample rate in Hz.

Returns:

Computed SNR in dB.

Return type:

float

class voicesdk.media.SpeechEndpointDetector

Speech processor class for speech endpoint detection.

__init__(min_speech_length_ms: int, max_silence_length_ms: int, sample_rate: int)

SpeechEndpointDetector constructor.

Parameters:
  • min_speech_length_ms (int) – Silence after speech threshold used to determine if speech is already ended (ms).

  • max_silence_length_ms (int) – Minimum speech length required to begin speech end detection (ms).

  • sample_rate (int) – Input audio sample rate.

add_samples(*args, **kwargs)

Overloaded function.

  1. add_samples(samples: numpy.ndarray[numpy.int16])

Adds new audio samples to the SpeechEndpointDetector.

Parameters:

samples (numpy.array) – PCM16 (numpy.int16) audio samples.

Note: Audio sample rate is predefined in the constructor.

  1. add_samples(samples: numpy.ndarray[numpy.uint8])

Adds new audio samples to the SpeechEndpointDetector.

Parameters:

samples (numpy.array) – PCM16 (numpy.uint8) audio samples.

Note: Audio sample rate is predefined in the constructor.

  1. add_samples(samples: numpy.ndarray[numpy.float32])

Adds new audio samples to the SpeechEndpointDetector.

Parameters:

samples (numpy.array) – float (numpy.float32) audio samples.

Note: Audio sample rate is predefined in the constructor.

is_speech_ended() bool

Returns detection state.

Returns:

True if speech end was detected, False otherwise.

Return type:

bool

reset()

Resets detector state.

class voicesdk.media.SpeechEndpointDetectorOpus

Speech processor class for speech endpoint detection in the Opus audio stream.

__init__(min_speech_length_ms: int, max_silence_length_ms: int, sample_rate: int)

SpeechEndpointDetectorOpus constructor.

Parameters:
  • min_speech_length_ms (int) – Silence after speech threshold used to determine if speech is already ended (ms).

  • max_silence_length_ms (int) – Minimum speech length required to begin speech end detection (ms).

  • sample_rate (int) – Input audio sample rate.

add_packet(bytes: numpy.ndarray[numpy.uint8])

Adds Opus packet to the SpeechEndpointDetector.

Parameters:

bytes (numpy.array) – A buffer containing single Opus packet. It is expected that packet contains data for single mono stream.

Note: Audio sample rate is predefined in the constructor.

is_speech_ended() bool

Returns detection state.

Returns:

True if speech end was detected, False otherwise.

Return type:

bool

reset()

Resets detector state.

class voicesdk.media.SpeechEvent

Class representing a single speech event.

property audio_interval

Speech event audio interval.

Type:

AudioInterval

property is_voice

Whether the frame contains speech or not.

Type:

bool

class voicesdk.media.SpeechInfo

Class that contains metrics related to Voice Activity Detection.

property background_length_ms

Total length of non-speech signal, milliseconds.

Type:

float

property speech_length_ms

Total accumulated speech duration, milliseconds.

Type:

float

property total_length_ms

Total audio record duration, milliseconds.

Type:

float

class voicesdk.media.SpeechSummary

Speech summary class.

property speech_events

Retrieves list of speech events.

Returns:

speech events.

Return type:

list

property speech_info

Retrieves speech info data.

Returns:

speech info.

Return type:

SpeechInfo

class voicesdk.media.SpeechSummaryEngine

Speech summary engine class, intended to calculate SpeechSummary with given audio samples.

__init__(init_data_path: str)

SpeechSummaryEngine constructor.

Parameters:

init_data_path (str) – Path to the directory containing engine init data.

create_stream(stream_sample_rate: int) pyvoicesdk::SpeechSummaryStream

Factory method for creating SpeechSummaryStream.

Parameters:

stream_sample_rate (int) – Audio stream sample rate in Hz.

Returns:

Created speech summary stream.

Return type:

SpeechSummaryStream

get_speech_summary_from_file(path_to_audio_file: str) voicesdk::SpeechSummary

Calculates speech summary with given audio file.

Parameters:

path_to_audio_file (str) – Path to audio file.

Returns:

Speech summary.

Return type:

SpeechSummary

get_speech_summary_from_samples(*args, **kwargs)

Overloaded function.

  1. get_speech_summary_from_samples(samples: numpy.ndarray[numpy.int16], sample_rate: int) -> voicesdk::SpeechSummary

Calculates speech summary with given PCM16 audio samples.

Parameters:
  • samples (numpy.array) – PCM16 (numpy.int16) audio samples.

  • sample_rate (int) – Audio sample rate in Hz.

Returns:

Speech summary.

Return type:

SpeechSummary

  1. get_speech_summary_from_samples(samples: numpy.ndarray[numpy.uint8], sample_rate: int) -> voicesdk::SpeechSummary

Calculates speech summary with given PCM16 audio samples.

Parameters:
  • samples (numpy.array) – PCM16 (numpy.uint8) audio samples.

  • sample_rate (int) – Audio sample rate in Hz.

Returns:

Speech summary.

Return type:

SpeechSummary

  1. get_speech_summary_from_samples(samples: numpy.ndarray[numpy.float32], sample_rate: int) -> voicesdk::SpeechSummary

Calculates speech summary with given PCM16 audio samples.

Parameters:
  • samples (numpy.array) – float (numpy.float32) audio samples.

  • sample_rate (int) – Audio sample rate in Hz.

Returns:

Speech summary.

Return type:

SpeechSummary

class voicesdk.media.SpeechSummaryStream

Stateful speech summary stream class, intended to calculate SpeechSummary with audio samples given in stream. New instance can be obtained with a SpeechSummaryEngine instance.

add_samples(*args, **kwargs)

Overloaded function.

  1. add_samples(samples: numpy.ndarray[numpy.int16])

Adds PCM16 audio samples to process.

Parameters:

samples (numpy.array) – PCM16 (numpy.int16) audio samples.

Note: Audio sample rate is predefined at the stream creation time.

  1. add_samples(samples: numpy.ndarray[numpy.uint8])

Adds PCM16 audio samples to process.

Parameters:

samples (numpy.array) – PCM16 (numpy.uint8) audio samples.

Note: Audio sample rate is predefined at the stream creation time.

  1. add_samples(samples: numpy.ndarray[numpy.float32])

Adds PCM16 audio samples to process.

Parameters:

samples (numpy.array) – float (numpy.float32) audio samples.

Note: Audio sample rate is predefined at the stream creation time.

finalize()

Finalizes input audio stream to process remaining audio samples and produce result if it’s possible.

get_current_background_length() float

Returns current background length in milliseconds.

Returns:

Current background length in milliseconds.

Return type:

float

get_speech_event() voicesdk.media.SpeechEvent

Retrieves speech event from output queue.

Returns:

One speech event.

Return type:

SpeechEvent

Raises:

RuntimeError – If output queue is empty.

Note: Use has_speech_events() to check if there are available speech event.

get_total_speech_info() voicesdk.media.SpeechInfo

Retrieves accumulated speech info data.

Returns:

speech info.

Return type:

SpeechInfo

get_total_speech_summary() voicesdk::SpeechSummary

Retrieves accumulated speech summary data.

Returns:

speech summary.

Return type:

SpeechSummary

has_speech_events() bool

Checks if any speech events are present in stream queue.

Returns:

True if any events are present, false otherwise.

Return type:

bool

reset()

Resets stream state: clears buffer, resets speech summary.

class voicesdk.media.SpeechSummaryStreamOpus

Stateful speech summary stream class, intended to calculate SpeechSummary with audio samples given in Opus stream. New instance can be obtained with a SpeechSummaryEngine instance.

__init__(init_data_path: str, sample_rate: int)

SpeechSummaryStreamOpus constructor.

Parameters:
  • init_data_path (str) – Path to the directory containing engine init data.

  • sample_rate (int) – Audio stream sample rate in Hz.

add_packet(bytes: numpy.ndarray[numpy.uint8])

Adds Opus packet to the SpeechSummaryStreamOpus.

Parameters:

bytes (numpy.array) – A buffer containing single Opus packet. It is expected that packet contains data for single mono stream.

Note: Audio sample rate is predefined in the constructor.

finalize()

Finalizes input audio stream to process remaining audio samples and produce result if it’s possible.

get_current_background_length() float

Returns current background length in milliseconds.

Returns:

Current background length in milliseconds.

Return type:

float

get_speech_event() voicesdk.media.SpeechEvent

Retrieves speech event from output queue.

Returns:

One speech event.

Return type:

SpeechEvent

Raises:

RuntimeError – If output queue is empty.

Note: Use has_speech_events() to check if there are available speech event.

get_total_speech_info() voicesdk.media.SpeechInfo

Retrieves accumulated speech info data.

Returns:

speech info.

Return type:

SpeechInfo

get_total_speech_summary() voicesdk::SpeechSummary

Retrieves accumulated speech summary data.

Returns:

speech summary.

Return type:

SpeechSummary

has_speech_events() bool

Checks if any speech events are present in stream queue.

Returns:

True if any events are present, false otherwise.

Return type:

bool

reset()

Resets stream state: clears buffer, resets speech summary.

class voicesdk.verify.VerifyResult

Voice verification result class.

property probability

Voice matching probability from 0 to 1, should be used for making a biometrics authentication decision.

Type:

float

property score

Raw verification score, intended to be used for evaluation and data-wise calibration.

Type:

float

class voicesdk.verify.VerifyStreamResult

Streaming voice verification result class.

property audio_interval

Audio interval, which verify result refers to.

Type:

AudioInterval

property verify_result

Voice verification result.

Type:

voicesdk.verify.VerifyResult

class voicesdk.verify.VoiceTemplateFactory

Voice verification engine class.

__init__(init_data_path: str)

VoiceTemplateFactory constructor.

Parameters:

init_data_path (str) – Path to the directory containing factory init data.

create_voice_template_batch_from_file(input_batch: list[voicesdk.verify.VerifyFileBatchElement]) list[voicesdk.core.VoiceTemplate]

Creates multiple voice templates from the contents of the given WAV files.

Args: input_batch (list): List of VerifyFileBatchElementReturns: Created voice templates.

create_voice_template_batch_from_samples(*args, **kwargs)

Overloaded function.

  1. create_voice_template_batch_from_samples(input_batch: list[voicesdk.verify.VerifySamplesBatchElementFloat]) -> list[voicesdk.core.VoiceTemplate]

Creates multiple voice templates from given audio samples.

Args: input_batch (list): List of VerifySamplesBatchElementFloat Returns: Created voice templates.

  1. create_voice_template_batch_from_samples(input_batch: list[voicesdk.verify.VerifySamplesBatchElementInt16]) -> list[voicesdk.core.VoiceTemplate]

Creates multiple voice templates from given audio samples.

Args: input_batch (list): List of VerifySamplesBatchElementInt16 Returns: Created voice templates.

  1. create_voice_template_batch_from_samples(input_batch: list[voicesdk.verify.VerifySamplesBatchElementUint8]) -> list[voicesdk.core.VoiceTemplate]

Creates multiple voice templates from given audio samples.

Args: input_batch (list): List of VerifySamplesBatchElementUint8 Returns: Created voice templates.

create_voice_template_from_file(path_to_audio_file: str, channel_type: voicesdk.core.ChannelType = <ChannelType.MIC: 1>) voicesdk.core.VoiceTemplate

Creates voice template from the contents of the given audio file.

Parameters:
  • path_to_audio_file (str) – Path to audio file.

  • channel_type (ChannelType, optional) – Input audio channel type, default is ChannelType.MIC.

Returns:

Created voice template.Note:

Return type:

VoiceTemplate

Sampling frequency should be equal to or greater than the value returned by get_minimum_audio_sample_rate.

create_voice_template_from_samples(*args, **kwargs)

Overloaded function.

  1. create_voice_template_from_samples(samples: numpy.ndarray[numpy.int16], sample_rate: int, channel_type: voicesdk.core.ChannelType = <ChannelType.MIC: 1>) -> voicesdk.core.VoiceTemplate

Creates voice template from audio samples.

Parameters:
  • samples (numpy.array) – PCM16 (numpy.int16) audio samples.

  • sample_rate (int) – Audio sample rate in Hz.

  • channel_type (ChannelType, optional) – Input audio channel type, default is ChannelType.MIC.

Returns:

Created voice template.Note:

Return type:

VoiceTemplate

Sampling frequency should be equal to or greater than the value returned by get_minimum_audio_sample_rate.

  1. create_voice_template_from_samples(samples: numpy.ndarray[numpy.uint8], sample_rate: int, channel_type: voicesdk.core.ChannelType = <ChannelType.MIC: 1>) -> voicesdk.core.VoiceTemplate

Creates voice template from audio samples.

Parameters:
  • samples (numpy.array) – PCM16 (numpy.uint8) audio samples.

  • sample_rate (int) – Audio sample rate in Hz.

  • channel_type (ChannelType, optional) – Input audio channel type, default is ChannelType.MIC.

Returns:

Created voice template.Note:

Return type:

VoiceTemplate

Sampling frequency should be equal to or greater than the value returned by get_minimum_audio_sample_rate.

  1. create_voice_template_from_samples(samples: numpy.ndarray[numpy.float32], sample_rate: int, channel_type: voicesdk.core.ChannelType = <ChannelType.MIC: 1>) -> voicesdk.core.VoiceTemplate

Creates voice template from audio samples.

Parameters:
  • samples (numpy.array) – Float (numpy.float32) audio samples.

  • sample_rate (int) – Audio sample rate in Hz.

  • channel_type (ChannelType, optional) – Input audio channel type, default is ChannelType.MIC.

Returns:

Created voice template.Note:

Return type:

VoiceTemplate

Sampling frequency should be equal to or greater than the value returned by get_minimum_audio_sample_rate.

get_init_data_id() str

Returns ID of the init data, which was used to create the factory.

Returns:

A string containing init data ID.

Return type:

str

get_minimum_audio_sample_rate() int

Returns minimum supported input audio sampling frequency in Hz.

Returns:

A minimum sampling rate in Hz.

Return type:

int

merge_voice_templates(voice_templates: list) voicesdk.core.VoiceTemplate

Merges a list of voice templates of the same speaker producing a union template.

Parameters:

voice_templates (list) – List of voice templates.

Returns:

A union voice template.Note:

Return type:

VoiceTemplate

All the templates should have the same init data ID as the factory instance.

class voicesdk.verify.VoiceTemplateMatcher

Voice verification engine class.

__init__(init_data_path: str)

VoiceTemplateMatcher constructor.

Parameters:

init_data_path (str) – Path to the directory containing matcher init data.

get_init_data_id() str

Returns ID of the init data, which was used to create the matcher.

Returns:

A string containing init data ID.

Return type:

str

match_voice_templates(template1: voicesdk.core.VoiceTemplate, template2: voicesdk.core.VoiceTemplate) voicesdk.verify.VerifyResult

Matches two voice templates one-to-one.

Parameters:
Returns:

Verification result.Note:

Return type:

voicesdk.verify.VerifyResult

Both templates should have the same init data ID as the matcher instance.

class voicesdk.verify.VoiceVerifyStream

Class for continuous voice verification using audio stream.

__init__(voice_template_factory: pyvoicesdk::VoiceTemplateFactory, voice_template_matcher: pyvoicesdk::VoiceTemplateMatcher, voice_templates: list[voicesdk.core.VoiceTemplate], sample_rate: int, audio_context_length_seconds: int = 10, window_length_seconds: float = 3)

Note: Sampling frequency should be equal to or greater than the value returned by voice_template_factory.get_minimum_audio_sample_rate(). Voice template matcher, voice template factory and voice template should have the same init data ID.

add_samples(*args, **kwargs)

Overloaded function.

  1. add_samples(samples: numpy.ndarray[numpy.int16])

Adds PCM16 audio samples to process.

Parameters:

samples (numpy.array) – PCM16 (numpy.int16) audio samples.Note:

Audio sample rate is predefined at the create time. If exception was thrown, then stream state is being reset except the accumulated results buffer.

  1. add_samples(samples: numpy.ndarray[numpy.uint8])

Adds PCM16 audio samples to process.

Parameters:

samples (numpy.array) – PCM16 (numpy.uint8) audio samples.Note:

Audio sample rate is predefined at the create time. If exception was thrown, then stream state is being reset except the accumulated results buffer.

  1. add_samples(samples: numpy.ndarray[numpy.float32])

Adds PCM16 audio samples to process.

Parameters:

samples (numpy.array) – float (numpy.float32) audio samples.Note:

Audio sample rate is predefined at the create time. If exception was thrown, then stream state is being reset except the accumulated results buffer.

finalize()

Finalizes input audio stream to process remaining audio samples and produce result if it’s possible.

get_verify_result() list[voicesdk.verify.VerifyStreamResult]

Retrieves verification result from output queue containing one verify stream result for each reference template.

Returns:

One verify result.

Return type:

VerifyStreamResult

Raises:

RuntimeError – If output queue is empty.Note:

Use has_verify_results() to check if there are available verify results.

get_verify_result_for_one_template() voicesdk.verify.VerifyStreamResult

Retrieves verification result from output queue consisting of single verify stream result corresponding to the zeroth reference template. Suitable for the case when the only one reference template was specified. Behaves the same as get_verify_result in IDVoice < 3.13, if only one reference template was set

Returns:

One verify result for the zeroth reference template.

Return type:

VerifyStreamResult

Raises:

RuntimeError – If output queue is empty.Note:

Use has_verify_results() to check if there are available verify results.

has_verify_results() bool

Checks if there are verify results in output queue.

Returns:

True is there are results available, else otherwise.

Return type:

bool

reset()

Resets stream state.

class voicesdk.verify.VoiceVerifyStreamOpus

Class for continuous voice verification using Opus audio stream.

__init__(voice_template_factory: pyvoicesdk::VoiceTemplateFactory, voice_template_matcher: pyvoicesdk::VoiceTemplateMatcher, voice_template: voicesdk.core.VoiceTemplate, sample_rate: int, audio_context_length_seconds: int = 10)

audio_context_length_seconds (int, optional): Length of audio context for voice verification in seconds, must be at least 3 seconds.Note:

Sampling frequency should be equal to or greater than the value returned by voice_template_factory.get_minimum_audio_sample_rate(). Voice template matcher, voice template factory and voice template should have the same init data ID.

add_packet(bytes: numpy.ndarray[numpy.uint8])

Adds Opus packet to process.

Parameters:

bytes (numpy.array) – Opus packet bytes (numpy.uint8). It is expected that packet contains data for single mono stream.Note:

Audio sample rate is predefined at the create time. If exception was thrown, then stream state is being reset except the accumulated results buffer.

finalize()

Finalizes input audio stream to process remaining audio samples and produce result if it’s possible.

get_verify_result() voicesdk.verify.VerifyStreamResult

Retrieves verify result from output queue.

Returns:

One verify result.

Return type:

VerifyStreamResult

Raises:

RuntimeError – If output queue is empty.Note:

Use has_verify_results() to check if there are available verify results.

has_verify_results() bool

Checks if there are verify results in output queue.

Returns:

True is there are results available, else otherwise.

Return type:

bool

reset()

Resets stream state.

class voicesdk.liveness.LivenessEngine

Voice liveness check class.

__init__(init_data_path: str)

LivenessEngine constructor.

Parameters:

init_data_path (str) – Path to the directory containing engine init data.

check_liveness_file(path_to_audio_file: str) voicesdk.liveness.LivenessResult

Checks voice liveness from the given audio file.

Parameters:

path_to_audio_file (str) – Path to audio file.

Returns:

Liveness check result.

Return type:

LivenessResult

check_liveness_samples(*args, **kwargs)

Overloaded function.

  1. check_liveness_samples(samples: numpy.ndarray[numpy.int16], sample_rate: int) -> voicesdk.liveness.LivenessResult

Checks voice liveness from the given PCM16 audio samples.

Parameters:
  • samples (numpy.array) – PCM16 (numpy.int16) audio samples.

  • sample_rate (int) – Audio sample rate in Hz.

Returns:

Liveness check result.

Return type:

LivenessResult

  1. check_liveness_samples(samples: numpy.ndarray[numpy.uint8], sample_rate: int) -> voicesdk.liveness.LivenessResult

Checks voice liveness from the given PCM16 audio bytes.

Parameters:
  • samples (numpy.array) – PCM16 (numpy.uint8) audio bytes.

  • sample_rate (int) – Audio sample rate in Hz.

Returns:

Liveness check result.

Return type:

LivenessResult

  1. check_liveness_samples(samples: numpy.ndarray[numpy.float32], sample_rate: int) -> voicesdk.liveness.LivenessResult

Checks voice liveness from the given float audio samples.

Parameters:
  • samples (numpy.array) – float (numpy.float32) audio samples.

  • sample_rate (int) – Audio sample rate in Hz.

Returns:

Liveness check result.

Return type:

LivenessResult

class voicesdk.liveness.LivenessResult

Voice liveness check result.

get_status_code() voicesdk.liveness.LivenessResultValidationStatusCode

Gets validation status code

Returns:

Validation status code.

Return type:

LivenessResultValidationStatusCode

get_value() voicesdk.liveness.LivenessResultValue

Gets liveness check result value. Available only if validation preceding liveness check was successful, see Ok

Returns:

Liveness check result value.

Return type:

LivenessResultValue

ok() bool
Returns flag indicating whether validation was successful and liveness result value is available.Returns:

bool: flag indicating whether validation was successful and liveness result value is available.

class voicesdk.liveness.LivenessResultValidationStatusCode

Enumeration representing status code of validation preceding liveness check.

TOO_SMALL_SPEECH_LENGTH: speech length is too small for liveness check to be performed. OK: Successful validation.

Members:

TOO_SMALL_SPEECH_LENGTH

OK