Call Center SDK 1.10.0 Python API documentation

Passive voice spoof detection

class voicesdk_cc.antispoof2.AntispoofEngine

Class for detecting spoofing attacks in audio with human speech.

__init__(init_data_path: str)

AntispoofEngine constructor.

Parameters

init_data_path (str) – Path to the directory containing engine init data.

is_spoof_file(path_to_audio_file: str) antispoof2.AntispoofResult

Tests whether given audio file contains spoofed speech.

Parameters

path_to_audio_file (str) – Path to audio file.

Returns

Antispoofing check result.

Return type

AntispoofResult

is_spoof_samples(*args, **kwargs)

Overloaded function.

  1. is_spoof_samples(samples: numpy.ndarray[numpy.int16], sample_rate: int) -> antispoof2.AntispoofResult

Tests whether given audio samples contain spoofed speech.

Parameters
  • samples (numpy.array) – PCM16 (numpy.int16) audio samples.

  • sample_rate (int) – Audio sample rate in Hz.

Returns

Antispoofing check result.

Return type

AntispoofResult

  1. is_spoof_samples(samples: numpy.ndarray[numpy.uint8], sample_rate: int) -> antispoof2.AntispoofResult

Tests whether given audio samples contain spoofed speech.

Parameters
  • samples (numpy.array) – PCM16 (numpy.uint8) audio samples.

  • sample_rate (int) – Audio sample rate in Hz.

Returns

Antispoofing check result.

Return type

AntispoofResult

  1. is_spoof_samples(samples: numpy.ndarray[numpy.float32], sample_rate: int) -> antispoof2.AntispoofResult

Tests whether given audio samples contain spoofed speech.

Parameters
  • samples (numpy.array) – float (numpy.float32) audio samples.

  • sample_rate (int) – Audio sample rate in Hz.

Returns

Antispoofing check result.

Return type

AntispoofResult

class voicesdk_cc.antispoof2.AntispoofResult

Class for antispoofing check result obtained with AntispoofEngine::is_spoof functions.

property score

Human score.

Type

float

property score_Replay

Replay attack score.

Type

float

property score_TTS

Text-to-speech attack score.

Type

float

property score_VC

Voice conversion attack score.

Type

float

property unsuitable_input_message

Message that explains the reason for audio input to be unsuitable for spoof check.

Type

str

Specific speaker attributes retrieval

class voicesdk_cc.attributes.Attributes

Estimated person attributes class.

property age

Estimated age in years.

Type

int

property gender

Estimated gender.

Type

Gender

property gender_score

Raw gender score (the bigger score corresponding to male and the smaller score corresponding to female).

Type

float

property phone_call_participant

Estimated phone call participant class (deprecated, always equals to UNDEFINED

Type

PhoneCallParticipant

class voicesdk_cc.attributes.AttributesEstimator

Class for estimating person attributes by their voice (age and gender).

__init__(init_data_path: str)

Attributes estimator constructor.

Parameters

init_data_path (str) – Path to the directory containing init data.

estimate_with_file(path_to_wav_file: str) attributes.Attributes

Estimates person attributes using given voice sample.

Parameters

path_to_wav_file (str) – Path to WAV file.

Returns

Estimated attributes.

Return type

Attributes

estimate_with_samples(*args, **kwargs)

Overloaded function.

  1. estimate_with_samples(samples: numpy.ndarray[numpy.int16], sample_rate: int) -> attributes.Attributes

Estimates person attributes using given voice sample.

Parameters
  • samples (numpy.array) – PCM16 (numpy.int16) audio samples.

  • sample_rate (int) – Input audio signal sample rate in Hz.

Returns

Estimated attributes.

Return type

Attributes

  1. estimate_with_samples(samples: numpy.ndarray[numpy.uint8], sample_rate: int) -> attributes.Attributes

Estimates person attributes using given voice sample.

Parameters
  • samples (numpy.array) – PCM16 (numpy.uint8) audio samples.

  • sample_rate (int) – Input audio signal sample rate in Hz.

Returns

Estimated attributes.

Return type

Attributes

  1. estimate_with_samples(samples: numpy.ndarray[numpy.float32], sample_rate: int) -> attributes.Attributes

Estimates person attributes using given voice sample.

Parameters
  • samples (numpy.array) – float (numpy.float32) audio samples.

  • sample_rate (int) – Input audio signal sample rate in Hz.

Returns

Estimated attributes.

Return type

Attributes

class voicesdk_cc.attributes.Gender

Enumeration representing human gender.

MALE

Male.

FEMALE

Female.

Members:

MALE

FEMALE

Callcenter SDK specific routines

class voicesdk_cc.callcenter.BuildInfo

Structure containing present VoiceSDK CC build info.

property components

VoiceSDK components presented in build.

Type

str

property git_info

Git info dump at the build stage.

Type

str

property license_expiration_date

License expiration date in YYYY-MM-DD format.

Type

str

property license_info

Information (e.g. expiration date) about the installed license if available or an empty string if no license is in use. Deprecated, use license_expiration_date instead

Type

str

property version

VoiceSDK build version.

Type

str

voicesdk_cc.callcenter.get_build_info() callcenter.BuildInfo

Returns current VoiceSDK CC build info.

Returns

Current VoiceSDK CC build info.

Return type

BuildInfo

Utility functions for working with Opus data.

voicesdk_cc.core.opusutils.read_as_pcm16_samples_from_memory(opus_file: str) tuple

Reads Opus file from a memory buffer and decodes it to PCM16 samples buffer.

Parameters

opus_file (bytes) – Memory buffer containing complete Opus file contents.

Returns

Tuple with numpy.array (of numpy.int16 type) and signal sample rate.

Return type

tuple

Core functionality needed in more than one module.

class voicesdk_cc.core.AudioInterval

Class representing interval of audio data.

__init__(*args, **kwargs)

Overloaded function.

  1. __init__(start_sample: int, end_sample: int, sample_rate: int)

  2. __init__(time_interval: voicesdk::TimeInterval, sample_rate: int)

property end_sample

Sample number where interval ends (not inclusive).

property end_time

Timestamp in milliseconds where AudioInterval ends (not inclusive).

property sample_rate

Sample rate of corresponding audio.

property start_sample

Sample number where interval starts.

property start_time

Timestamp in milliseconds where AudioInterval starts.

class voicesdk_cc.core.ChannelType

Enumeration for audio source labeling during voice template creation.

MIC

Microphone audio channel.

TEL

Telephone audio channel.

MIXED

Mixed audio channel.

Members:

MIC

TEL

MIXED

class voicesdk_cc.core.VoiceTemplate

Voice template class.

__init__(bytes: bytes)

Constructs voice templates from its serialized representation.

Parameters

bytes (bytes) – Bytes object with serialized voice template.

static deserialize(bytes: bytes) core.VoiceTemplate

Factory method, deserializes voice template from bytes.

Parameters

bytes (bytes) – Bytes object with serialized voice template.

Returns

voice template instance.

Return type

VoiceTemplate

get_channel_type() core.ChannelType

Returns voice template channel type which was specified by user on creation.

Returns

Channel type

Return type

ChannelType

get_init_data_id() str

Returns ID of the init data, which was used to create the template.

Returns

A string containing init data ID

Return type

str

is_valid() bool

Checks if voice template is valid or not.

Returns

True if valid, else otherwise.

Return type

bool

static load_from_file(path_to_file: str) core.VoiceTemplate

Factory method, restores voice template from the given file.

Parameters

path_to_file (str) – Path to template file.

Returns

voice template instance.

Return type

VoiceTemplate

save_to_file(path_to_file: str)

Stores voice template in a file of the given path.

Parameters

path_to_file (str) – Path to template file.

serialize() bytes

Serializes voice template to bytes.

Returns

Serialized voice template.

Return type

bytes

class voicesdk_cc.core.VoiceTemplateConverter

Voice template conversion class.

__init__(init_data_path: str)

VoiceTemplateConverter constructor.

Parameters

init_data_path (str) – Path to the directory containing init data.

convert_voice_template(voice_template: core.VoiceTemplate) core.VoiceTemplate

Converts voice template from one configuration to another.

Parameters

voice_template (VoiceTemplate) – Voice template to be converted.

Returns

Converted voice template.

Return type

VoiceTemplate

get_input_init_data_id() str

Returns init data ID that voice template to be converted should have.

Returns

A string containing init data ID of the voice template to be converted.

Return type

str

get_output_init_data_id() str

Returns init data ID that converted voice template will have.

Returns

A string containing init data ID of the converted voice template.

Return type

str

voicesdk_cc.core.set_num_threads(arg0: int)

Sets the maximum number of threads available for VoiceSDK.If 0 passed, then the optimal number of threads is detected automatically (the same effect is achieved if setNumThreads is not called).

Parameters

num_threads (int) – Maximum number of threads available for VoiceSDK.

voicesdk_cc.core.set_use_voice_template_compression(arg0: bool)

Sets whether to use compression for voice templates serialization. Voice template compression is not used by default.

Parameters

use_voice_template_compression (bool) – Whether to use compression for voice templates serialization.

Utility functions for working with WAV data.

voicesdk_cc.core.wavutils.read_as_float_samples(wav_file_path: object) tuple

Reads WAV file as a float samples buffer (WAV file can be of any format).

Parameters

wav_file_path (str or pathlib.Path) – Path to WAV file.

Returns

Audio data and sample rate.

Return type

tuple(numpy.ndarray, int)

voicesdk_cc.core.wavutils.read_as_float_samples_16bit(wav_file_path: object) tuple

Reads WAV file as a float samples buffer with 16-bit precision (WAV file can be of any format).

Parameters

wav_file_path (str or pathlib.Path) – Path to WAV file.

Returns

Audio data and sample rate.

Return type

tuple(numpy.ndarray, int)

voicesdk_cc.core.wavutils.read_as_pcm16_bytes(wav_file_path: object) tuple

Reads WAV file as a PCM16 bytes buffer (WAV file can be of any format).

Parameters

wav_file_path (str or pathlib.Path) – Path to WAV file.

Returns

Audio data and sample rate.

Return type

tuple(numpy.ndarray, int)

voicesdk_cc.core.wavutils.read_as_pcm16_samples(wav_file_path: object) tuple

Reads WAV file as a PCM16 samples buffer (WAV file can be of any format).

Parameters

wav_file_path (str or pathlib.Path) – Path to WAV file.

Returns

Audio data and sample rate.

Return type

tuple(numpy.ndarray, int)

Speaker distinction detection

class voicesdk_cc.diarization.DiarizationEngine

Diarization engine class, an entry point for speaker diarization.

__init__(init_data_path: str)

DiarizationEngine constructor.

Parameters

init_data_path (str) – Path to the directory containing engine init data.

get_segmentation_from_file(path_to_wav_file: str, num_speakers: int = 0) Dict[int, List[core.AudioInterval]]

Performs speaker diarization of the given WAV file.

Parameters
  • path_to_wav_file (str) – Path to WAV file.

  • num_speakers (int, optional) – Optional number of speakers.

Returns

Dict containing an AudioInterval lists, each key corresponds to a certain speaker.

Return type

dict

get_segmentation_from_samples(*args, **kwargs)

Overloaded function.

  1. get_segmentation_from_samples(samples: numpy.ndarray[numpy.int16], sample_rate: int, num_speakers: int = 0) -> Dict[int, List[core.AudioInterval]]

Performs speaker diarization of the given PCM16 samples.

Parameters
  • samples (numpy.array) – PCM16 (numpy.int16) audio samples.

  • sample_rate (int) – Audio sample rate in Hz.

  • num_speakers (int, optional) – Optional number of speakers.

Returns

Dict containing an AudioInterval lists, each key corresponds to a certain speaker.

Return type

dict

  1. get_segmentation_from_samples(samples: numpy.ndarray[numpy.uint8], sample_rate: int, num_speakers: int = 0) -> Dict[int, List[core.AudioInterval]]

Performs speaker diarization of the given PCM16 samples.

Parameters
  • samples (numpy.array) – PCM16 (numpy.uint8) audio samples.

  • sample_rate (int) – Audio sample rate in Hz.

  • num_speakers (int, optional) – Optional number of speakers.

Returns

Dict containing an AudioInterval lists, each key corresponds to a certain speaker.

Return type

dict

  1. get_segmentation_from_samples(samples: numpy.ndarray[numpy.float32], sample_rate: int, num_speakers: int = 0) -> Dict[int, List[core.AudioInterval]]

Performs speaker diarization of the given PCM16 samples.

Parameters
  • samples (numpy.array) – float (numpy.float32) audio samples.

  • sample_rate (int) – Audio sample rate in Hz.

  • num_speakers (int, optional) – Optional number of speakers.

Returns

Dict containing an AudioInterval lists, each key corresponds to a certain speaker.

Return type

dict

Audio/speech utilities and estimators

class voicesdk_cc.media.QualityCheckEngine

Quality check engine class.

__init__(init_data_path: str)

QualityCheckEngine constructor.

Parameters

init_data_path (str) – Path to the directory containing initialization data.

check_quality_from_file(path_to_audio_file: str, thresholds: media.QualityCheckMetricsThresholds) media.QualityCheckEngineResult
Parameters
  • path_to_audio_file (str) – Path to audio file.

  • thresholds – (QualityCheckMetricsThresholds): Quality checking thresholds that will be applied to the output quality check metrics.

Returns

Quality check result.

Return type

QualityCheckEngineResult

check_quality_from_samples(*args, **kwargs)

Overloaded function.

  1. check_quality_from_samples(samples: numpy.ndarray[numpy.int16], sample_rate: int, thresholds: media.QualityCheckMetricsThresholds) -> media.QualityCheckEngineResult

Checks whether audio buffer is suitable from the quality perspective, from the given PCM16 samples.

Parameters
  • samples (numpy.array) – PCM16 (numpy.int16) audio samples.

  • sample_rate (int) – Audio sample rate in Hz.

  • thresholds – (QualityCheckMetricsThresholds): Quality checking thresholds that will be applied to the output quality check metrics.

Returns

Quality check result.

Return type

QualityCheckEngineResult

  1. check_quality_from_samples(samples: numpy.ndarray[numpy.uint8], sample_rate: int, thresholds: media.QualityCheckMetricsThresholds) -> media.QualityCheckEngineResult

Checks whether audio buffer is suitable from the quality perspective, from the given PCM16 audio bytes.

Parameters
  • samples (numpy.array) – PCM16 (numpy.uint8) audio samples.

  • sample_rate (int) – Audio sample rate in Hz.

  • thresholds – (QualityCheckMetricsThresholds): Quality checking thresholds that will be applied to the output quality check metrics.

Returns

Quality check result.

Return type

QualityCheckEngineResult

  1. check_quality_from_samples(samples: numpy.ndarray[numpy.float32], sample_rate: int, thresholds: media.QualityCheckMetricsThresholds) -> media.QualityCheckEngineResult

Checks whether audio buffer is suitable from the quality perspective.

Parameters
  • samples (numpy.array) – Float (numpy.float32) audio samples, from the given float audio samples.

  • sample_rate (int) – Audio sample rate in Hz.

  • thresholds – (QualityCheckMetricsThresholds): Quality checking thresholds that will be applied to the output quality check metrics.

Returns

Quality check result.

Return type

QualityCheckEngineResult

Parameters

scenario – (QualityCheckScenario): Scenario for which recommended thresholds will be returned.

Returns

Quality check thresholds that can be used on quality checking.

Return type

QualityCheckMetricsThresholds

class voicesdk_cc.media.QualityCheckEngineResult

Quality check result class

property multiple_speakers_detector_score

Multiple speakers detector score value obtained on quality check

Type

float

property quality_check_short_description

Short description of the quality check results

property snr_db

SNR metric value obtained on quality check in Db

Type

float

property speech_length_ms

Speech length metric value obtained on quality check in milliseconds

Type

float

property speech_relative_length

Speech relative length (speech length relative to the total audio length) metric value obtained on quality check

Type

float

class voicesdk_cc.media.QualityCheckMetricsThresholds

Class for quality checking thresholds.

__init__(*args, **kwargs)

Overloaded function.

  1. __init__()

  2. __init__(minimum_snr_db: float, minimum_speech_length_ms: float, minimum_speech_relative_length: float, maximum_multiple_speakers_detector_score: float)

maximum_multiple_speakers_detector_score (float): Maximum multiple speakers detector score allowed to pass quality check.

property maximum_multiple_speakers_detector_score

Maximum multiple speakers detector score allowed to pass quality check.

Type

float

property minimum_snr_db

Minimum signal-to-noise ratio required to pass quality check in dB.

Type

float

property minimum_speech_length_ms

Minimum speech length required to pass quality check in milliseconds.

Type

float

property minimum_speech_relative_length

Minimum speech relative length (speech length relative to the total audio length) required to pass quality check.

Type

float

class voicesdk_cc.media.QualityCheckScenario

Enumeration representing scenarios used to get recommended quality check thresholds.

VERIFY_TI_ENROLLMENT: Verification, TI enrollment step. VERIFY_TI_VERIFICATION: Verification, TI verification step. VERIFY_TD_ENROLLMENT: Verification, TD enrollment step. VERIFY_TD_VERIFICATION: Verification, TD verification step. LIVENESS: Liveness check.

Members:

VERIFY_TI_ENROLLMENT

VERIFY_TI_VERIFICATION

VERIFY_TD_ENROLLMENT

VERIFY_TD_VERIFICATION

LIVENESS

class voicesdk_cc.media.QualityCheckShortDescription

Enumeration representing short descriptions of the audio quality check results.

TOO_NOISY: Too noisy audio. TOO_SMALL_SPEECH_TOTAL_LENGTH: Too small speech length in the audio. TOO_SMALL_SPEECH_RELATIVE_LENGTH: Too small speech relative length (speech length relative to the total audio length). MULTIPLE_SPEAKERS_DETECTED: Multiple speakers detected. OK: Audio successfully passed quality check.

Members:

TOO_NOISY

TOO_SMALL_SPEECH_TOTAL_LENGTH

TOO_SMALL_SPEECH_RELATIVE_LENGTH

MULTIPLE_SPEAKERS_DETECTED

OK

class voicesdk_cc.media.SNRComputer

SNRComputer class, intended to calculate signal-to-noise (SNR) ratio with given audio signal.

__init__(init_data_path: str)

SNRComputer constructor.

Parameters

init_data_path (str) – Path to the directory containing init data.

compute_with_file(path_to_audio_file: str) float

Calculates SNR with given audio file.

Parameters

path_to_audio_file (str) – Path to audio file.

Returns

Computed SNR in dB.

Return type

float

compute_with_samples(*args, **kwargs)

Overloaded function.

  1. compute_with_samples(samples: numpy.ndarray[numpy.int16], sample_rate: int) -> float

Calculates SNR with given PCM16 audio samples.

Parameters
  • samples (numpy.array) – PCM16 (numpy.int16) audio samples.

  • sample_rate (int) – Input audio signal sample rate in Hz.

Returns

Computed SNR in dB.

Return type

float

  1. compute_with_samples(samples: numpy.ndarray[numpy.uint8], sample_rate: int) -> float

Calculates SNR with given PCM16 audio samples.

Parameters
  • samples (numpy.array) – PCM16 (numpy.uint8) audio samples.

  • sample_rate (int) – Input audio signal sample rate in Hz.

Returns

Computed SNR in dB.

Return type

float

  1. compute_with_samples(samples: numpy.ndarray[numpy.float32], sample_rate: int) -> float

Calculates SNR with given PCM16 audio samples.

Parameters
  • samples (numpy.array) – float (numpy.float32) audio samples.

  • sample_rate (int) – Input audio signal sample rate in Hz.

Returns

Computed SNR in dB.

Return type

float

class voicesdk_cc.media.SpeechEndpointDetector

Speech processor class for speech endpoint detection.

__init__(min_speech_length_ms: int, max_silence_length_ms: int, sample_rate: int)

SpeechEndpointDetector constructor.

Parameters
  • min_speech_length_ms (int) – Silence after speech threshold used to determine if speech is already ended (ms).

  • max_silence_length_ms (int) – Minimum speech length required to begin speech end detection (ms).

  • sample_rate (int) – Input audio sample rate.

add_samples(*args, **kwargs)

Overloaded function.

  1. add_samples(samples: numpy.ndarray[numpy.int16])

Adds new audio samples to the SpeechEndpointDetector.

Parameters

samples (numpy.array) – PCM16 (numpy.int16) audio samples.

Note: Audio sample rate is predefined in the constructor.

  1. add_samples(samples: numpy.ndarray[numpy.uint8])

Adds new audio samples to the SpeechEndpointDetector.

Parameters

samples (numpy.array) – PCM16 (numpy.uint8) audio samples.

Note: Audio sample rate is predefined in the constructor.

  1. add_samples(samples: numpy.ndarray[numpy.float32])

Adds new audio samples to the SpeechEndpointDetector.

Parameters

samples (numpy.array) – float (numpy.float32) audio samples.

Note: Audio sample rate is predefined in the constructor.

is_speech_ended() bool

Returns detection state.

Returns

True if speech end was detected, False otherwise.

Return type

bool

reset()

Resets detector state.

class voicesdk_cc.media.SpeechEndpointDetectorOpus

Speech processor class for speech endpoint detection in the Opus audio stream.

__init__(min_speech_length_ms: int, max_silence_length_ms: int, sample_rate: int)

SpeechEndpointDetectorOpus constructor.

Parameters
  • min_speech_length_ms (int) – Silence after speech threshold used to determine if speech is already ended (ms).

  • max_silence_length_ms (int) – Minimum speech length required to begin speech end detection (ms).

  • sample_rate (int) – Input audio sample rate.

add_packet(bytes: numpy.ndarray[numpy.uint8])

Adds Opus packet to the SpeechEndpointDetectorOpus.

Parameters

bytes (numpy.array) – A buffer containing single Opus packet. It is expected that packet contains data for single mono stream.

Note: Audio sample rate is predefined in the constructor.

is_speech_ended() bool

Returns detection state.

Returns

True if speech end was detected, False otherwise.

Return type

bool

reset()

Resets detector state.

class voicesdk_cc.media.SpeechEvent

Class representing a single speech event.

property audio_interval

Speech event audio interval.

Type

AudioInterval

property is_voice

Whether the frame contains speech or not.

Type

bool

class voicesdk_cc.media.SpeechInfo

Class that contains metrics related to Voice Activity Detection.

property background_length_ms

Total length of non-speech signal, milliseconds.

Type

float

property speech_length_ms

Total accumulated speech duration, milliseconds.

Type

float

property total_length_ms

Total audio record duration, milliseconds.

Type

float

class voicesdk_cc.media.SpeechSummary

Speech summary class.

property speech_events

Retrieves list of speech events.

Returns

speech events.

Return type

list

property speech_info

Retrieves speech info data.

Returns

speech info.

Return type

SpeechInfo

class voicesdk_cc.media.SpeechSummaryEngine

Speech summary engine class, intended to calculate SpeechSummary with given audio samples.

__init__(init_data_path: str)

SpeechSummaryEngine constructor.

Parameters

init_data_path (str) – Path to the directory containing engine init data.

create_stream(stream_sample_rate: int) voicesdk::media::python::SpeechSummaryStreamPy

Factory method for creating SpeechSummaryStream.

Parameters

stream_sample_rate (int) – Audio stream sample rate in Hz.

Returns

Created speech summary stream.

Return type

SpeechSummaryStream

get_speech_summary_from_file(path_to_audio_file: str) voicesdk::SpeechSummary

Calculates speech summary with given audio file.

Parameters

path_to_audio_file (str) – Path to audio file.

Returns

Speech summary.

Return type

SpeechSummary

get_speech_summary_from_samples(*args, **kwargs)

Overloaded function.

  1. get_speech_summary_from_samples(samples: numpy.ndarray[numpy.int16], sample_rate: int) -> voicesdk::SpeechSummary

Calculates speech summary with given PCM16 audio samples.

Parameters
  • samples (numpy.array) – PCM16 (numpy.int16) audio samples.

  • sample_rate (int) – Audio sample rate in Hz.

Returns

Speech summary.

Return type

SpeechSummary

  1. get_speech_summary_from_samples(samples: numpy.ndarray[numpy.uint8], sample_rate: int) -> voicesdk::SpeechSummary

Calculates speech summary with given PCM16 audio samples.

Parameters
  • samples (numpy.array) – PCM16 (numpy.uint8) audio samples.

  • sample_rate (int) – Audio sample rate in Hz.

Returns

Speech summary.

Return type

SpeechSummary

  1. get_speech_summary_from_samples(samples: numpy.ndarray[numpy.float32], sample_rate: int) -> voicesdk::SpeechSummary

Calculates speech summary with given PCM16 audio samples.

Parameters
  • samples (numpy.array) – float (numpy.float32) audio samples.

  • sample_rate (int) – Audio sample rate in Hz.

Returns

Speech summary.

Return type

SpeechSummary

class voicesdk_cc.media.SpeechSummaryStream

Stateful speech summary stream class, intended to calculate SpeechSummary with audio samples given in stream. New instance can be obtained with a SpeechSummaryEngine instance.

add_samples(*args, **kwargs)

Overloaded function.

  1. add_samples(samples: numpy.ndarray[numpy.int16])

Adds PCM16 audio samples to process.

Parameters

samples (numpy.array) – PCM16 (numpy.int16) audio samples.

Note: Audio sample rate is predefined at the stream creation time.

  1. add_samples(samples: numpy.ndarray[numpy.uint8])

Adds PCM16 audio samples to process.

Parameters

samples (numpy.array) – PCM16 (numpy.uint8) audio samples.

Note: Audio sample rate is predefined at the stream creation time.

  1. add_samples(samples: numpy.ndarray[numpy.float32])

Adds PCM16 audio samples to process.

Parameters

samples (numpy.array) – float (numpy.float32) audio samples.

Note: Audio sample rate is predefined at the stream creation time.

finalize()

Finalizes input audio stream to process remaining audio samples and produce result if it’s possible.

get_current_background_length() float

Returns current background length in milliseconds.

Returns

Current background length in milliseconds.

Return type

float

get_speech_event() media.SpeechEvent

Retrieves speech event from output queue.

Returns

One speech event.

Return type

SpeechEvent

Raises

RuntimeError – If output queue is empty.

Note: Use has_speech_events() to check if there are available speech event.

get_total_speech_info() media.SpeechInfo

Retrieves accumulated speech info data.

Returns

speech info.

Return type

SpeechInfo

get_total_speech_summary() voicesdk::SpeechSummary

Retrieves accumulated speech summary data.

Returns

speech summary.

Return type

SpeechSummary

has_speech_events() bool

Checks if any speech events are present in stream queue.

Returns

True if any events are present, false otherwise.

Return type

bool

reset()

Resets stream state: clears buffer, resets speech summary.

class voicesdk_cc.media.SpeechSummaryStreamOpus

Stateful speech summary stream class, intended to calculate SpeechSummary with audio samples given in Opus stream. New instance can be obtained with a SpeechSummaryEngine instance.

__init__(init_data_path: str, sample_rate: int)

SpeechSummaryStreamOpus constructor.

Parameters
  • init_data_path (str) – Path to the directory containing engine init data.

  • sample_rate (int) – Audio stream sample rate in Hz.

add_packet(bytes: numpy.ndarray[numpy.uint8])

Adds Opus packet to the SpeechSummaryStreamOpus.

Parameters

bytes (numpy.array) – A buffer containing single Opus packet. It is expected that packet contains data for single mono stream.

Note: Audio sample rate is predefined in the constructor.

finalize()

Finalizes input audio stream to process remaining audio samples and produce result if it’s possible.

get_current_background_length() float

Returns current background length in milliseconds.

Returns

Current background length in milliseconds.

Return type

float

get_speech_event() media.SpeechEvent

Retrieves speech event from output queue.

Returns

One speech event.

Return type

SpeechEvent

Raises

RuntimeError – If output queue is empty.

Note: Use has_speech_events() to check if there are available speech event.

get_total_speech_info() media.SpeechInfo

Retrieves accumulated speech info data.

Returns

speech info.

Return type

SpeechInfo

get_total_speech_summary() voicesdk::SpeechSummary

Retrieves accumulated speech summary data.

Returns

speech summary.

Return type

SpeechSummary

has_speech_events() bool

Checks if any speech events are present in stream queue.

Returns

True if any events are present, false otherwise.

Return type

bool

reset()

Resets stream state: clears buffer, resets speech summary.

Enrollment and verification for speaker audio

class voicesdk_cc.verify.VerifyResult

Voice verification result class.

property probability

Voice matching probability from 0 to 1, should be used for making a biometrics authentication decision.

Type

float

property score

Raw verification score, intended to be used for evaluation and data-wise calibration.

Type

float

class voicesdk_cc.verify.VerifyStreamResult

Streaming voice verification result class.

property audio_interval

Audio interval, which verify result refers to.

Type

AudioInterval

property verify_result

Voice verification result.

Type

voicesdk.verify.VerifyResult

class voicesdk_cc.verify.VoiceTemplateFactory

Voice verification engine class.

__init__(init_data_path: str)

VoiceTemplateFactory constructor.

Parameters

init_data_path (str) – Path to the directory containing factory init data.

check_quality_from_file(path_to_audio_file: str) verify.QualityCheckResult
Parameters

path_to_audio_file (str) – Path to audio file.

Returns

Quality check result.Note:

Return type

QualityCheckResult

Audio file sampling frequency should be equal to or greater than the value returned by get_minimum_audio_sample_rate.

check_quality_from_samples(*args, **kwargs)

Overloaded function.

  1. check_quality_from_samples(samples: numpy.ndarray[numpy.int16], sample_rate: int) -> verify.QualityCheckResult

Deprecated, use QualityCheckEngine API from media component instead. Checks whether audio buffer is suitable to use as voice enrollment entry from the quality perspective.

Parameters
  • samples (numpy.array) – PCM16 (numpy.int16) audio samples.

  • sample_rate (int) – Audio sample rate in Hz.

Returns

Quality check result.Note:

Return type

QualityCheckResult

Sampling frequency should be equal to or greater than the value returned by get_minimum_audio_sample_rate.

  1. check_quality_from_samples(samples: numpy.ndarray[numpy.uint8], sample_rate: int) -> verify.QualityCheckResult

Deprecated, use QualityCheckEngine API from media component instead. Checks whether audio buffer is suitable to use as voice enrollment entry from the quality perspective.

Parameters
  • samples (numpy.array) – PCM16 (numpy.uint8) audio samples.

  • sample_rate (int) – Audio sample rate in Hz.

Returns

Quality check result.Note:

Return type

QualityCheckResult

Sampling frequency should be equal to or greater than the value returned by get_minimum_audio_sample_rate.

  1. check_quality_from_samples(samples: numpy.ndarray[numpy.float32], sample_rate: int) -> verify.QualityCheckResult

Deprecated, use QualityCheckEngine API from media component instead. Checks whether audio buffer is suitable to use as voice enrollment entry from the quality perspective.

Parameters
  • samples (numpy.array) – Float (numpy.float32) audio samples.

  • sample_rate (int) – Audio sample rate in Hz.

Returns

Quality check result.Note:

Return type

QualityCheckResult

Sampling frequency should be equal to or greater than the value returned by get_minimum_audio_sample_rate.

create_voice_template_batch_from_file(input_batch: List[verify.VerifyFileBatchElement]) List[core.VoiceTemplate]

Creates multiple voice templates from the contents of the given WAV files.

Args: input_batch (list): List of VerifyFileBatchElement Returns: Created voice templates.

Note: This API is experimental and subject to change.

create_voice_template_batch_from_samples(*args, **kwargs)

Overloaded function.

  1. create_voice_template_batch_from_samples(input_batch: List[verify.VerifySamplesBatchElementFloat]) -> List[core.VoiceTemplate]

Creates multiple voice templates from given audio samples.

Args: input_batch (list): List of VerifySamplesBatchElementFloat Returns: Created voice templates.

Note: This API is experimental and subject to change.

  1. create_voice_template_batch_from_samples(input_batch: List[verify.VerifySamplesBatchElementInt16]) -> List[core.VoiceTemplate]

Creates multiple voice templates from given audio samples.

Args: input_batch (list): List of VerifySamplesBatchElementInt32 Returns: Created voice templates.

Note: This API is experimental and subject to change.

  1. create_voice_template_batch_from_samples(input_batch: List[verify.VerifySamplesBatchElementUint8]) -> List[core.VoiceTemplate]

Creates multiple voice templates from given audio samples.

Args: input_batch (list): List of VerifySamplesBatchElementUint8 Returns: Created voice templates.

Note: This API is experimental and subject to change.

create_voice_template_from_file(path_to_audio_file: str, channel_type: core.ChannelType = <ChannelType.TEL: 2>) core.VoiceTemplate

Creates voice template from the contents of the given audio file.

Parameters
  • path_to_audio_file (str) – Path to audio file.

  • channel_type (ChannelType, optional) – Input audio channel type, default is ChannelType.TEL.

Returns

Created voice template.Note:

Return type

VoiceTemplate

Audio file sampling frequency should be equal to or greater than the value returned by get_minimum_audio_sample_rate.

create_voice_template_from_samples(*args, **kwargs)

Overloaded function.

  1. create_voice_template_from_samples(samples: numpy.ndarray[numpy.int16], sample_rate: int, channel_type: core.ChannelType = <ChannelType.TEL: 2>) -> core.VoiceTemplate

Creates voice template from audio samples.

Parameters
  • samples (numpy.array) – PCM16 (numpy.int16) audio samples.

  • sample_rate (int) – Audio sample rate in Hz.

  • channel_type (ChannelType, optional) – Input audio channel type, default is ChannelType.TEL.

Returns

Created voice template.Note:

Return type

VoiceTemplate

Sampling frequency should be equal to or greater than the value returned by get_minimum_audio_sample_rate.

  1. create_voice_template_from_samples(samples: numpy.ndarray[numpy.uint8], sample_rate: int, channel_type: core.ChannelType = <ChannelType.TEL: 2>) -> core.VoiceTemplate

Creates voice template from audio samples.

Parameters
  • samples (numpy.array) – PCM16 (numpy.uint8) audio samples.

  • sample_rate (int) – Audio sample rate in Hz.

  • channel_type (ChannelType, optional) – Input audio channel type, default is ChannelType.TEL.

Returns

Created voice template.Note:

Return type

VoiceTemplate

Sampling frequency should be equal to or greater than the value returned by get_minimum_audio_sample_rate.

  1. create_voice_template_from_samples(samples: numpy.ndarray[numpy.float32], sample_rate: int, channel_type: core.ChannelType = <ChannelType.TEL: 2>) -> core.VoiceTemplate

Creates voice template from audio samples.

Parameters
  • samples (numpy.array) – Float (numpy.float32) audio samples.

  • sample_rate (int) – Audio sample rate in Hz.

  • channel_type (ChannelType, optional) – Input audio channel type, default is ChannelType.MI.

Returns

Created voice template.Note:

Return type

VoiceTemplate

Sampling frequency should be equal to or greater than the value returned by get_minimum_audio_sample_rate.

get_init_data_id() str

Returns ID of the init data, which was used to create the factory.

Returns

A string containing init data ID.

Return type

str

get_minimum_audio_sample_rate() int

Returns minimum supported input audio sampling frequency in Hz.

Returns

A minimum sampling rate in Hz.

Return type

int

merge_voice_templates(voice_templates: list) core.VoiceTemplate

Merges a list of voice templates of the same speaker producing a union template.

Parameters

voice_templates (list) – List of voice templates.

Returns

A union voice template.Note:

Return type

VoiceTemplate

All the templates should have the same init data ID as the factory instance.

class voicesdk_cc.verify.VoiceTemplateMatcher

Voice verification engine class.

__init__(init_data_path: str)

VoiceTemplateMatcher constructor.

Parameters

init_data_path (str) – Path to the directory containing matcher init data.

get_init_data_id() str

Returns ID of the init data, which was used to create the matcher.

Returns

A string containing init data ID.

Return type

str

match_voice_templates(template1: core.VoiceTemplate, template2: core.VoiceTemplate) verify.VerifyResult

Matches two voice templates one-to-one.

Parameters
Returns

Verification result.Note:

Return type

voicesdk.verify.VerifyResult

Both templates should have the same init data ID as the matcher instance.

class voicesdk_cc.verify.VoiceVerifyStream

Class for continuous voice verification using audio stream.

__init__(voice_template_factory: voicesdk::verify::python::VoiceTemplateFactoryPy, voice_template_matcher: voicesdk::verify::python::VoiceTemplateMatcherPy, voice_templates: List[core.VoiceTemplate], sample_rate: int, audio_context_length_seconds: int = 10, window_length_seconds: float = 3)

Note: Sampling frequency should be equal to or greater than the value returned by voice_template_factory.get_minimum_audio_sample_rate(). Voice template matcher, voice template factory and voice template should have the same init data ID.

add_samples(*args, **kwargs)

Overloaded function.

  1. add_samples(samples: numpy.ndarray[numpy.int16])

Adds PCM16 audio samples to process.

Parameters

samples (numpy.array) – PCM16 (numpy.int16) audio samples.Note:

Audio sample rate is predefined at the create time. If exception was thrown, then stream state is being reset except the accumulated results buffer.

  1. add_samples(samples: numpy.ndarray[numpy.uint8])

Adds PCM16 audio samples to process.

Parameters

samples (numpy.array) – PCM16 (numpy.uint8) audio samples.Note:

Audio sample rate is predefined at the create time. If exception was thrown, then stream state is being reset except the accumulated results buffer.

  1. add_samples(samples: numpy.ndarray[numpy.float32])

Adds PCM16 audio samples to process.

Parameters

samples (numpy.array) – float (numpy.float32) audio samples.Note:

Audio sample rate is predefined at the create time. If exception was thrown, then stream state is being reset except the accumulated results buffer.

finalize()

Finalizes input audio stream to process remaining audio samples and produce result if it’s possible.

get_verify_result() List[verify.VerifyStreamResult]

Retrieves verify result from output queue.

Returns

One verify result.

Return type

VerifyStreamResult

Raises

RuntimeError – If output queue is empty.Note:

Use has_verify_results() to check if there are available verify results.

has_verify_results() bool

Checks if there are verify results in output queue.

Returns

True is there are results available, else otherwise.

Return type

bool

reset()

Resets stream state.

class voicesdk_cc.verify.VoiceVerifyStreamOpus

Class for continuous voice verification using Opus audio stream.

__init__(voice_template_factory: voicesdk::verify::python::VoiceTemplateFactoryPy, voice_template_matcher: voicesdk::verify::python::VoiceTemplateMatcherPy, voice_template: core.VoiceTemplate, sample_rate: int, audio_context_length_seconds: int = 10)

audio_context_length_seconds (int, optional): Length of audio context for voice verification in seconds, must be at least 3 seconds.Note:

Sampling frequency should be equal to or greater than the value returned by voice_template_factory.get_minimum_audio_sample_rate(). Voice template matcher, voice template factory and voice template should have the same init data ID.

add_packet(bytes: numpy.ndarray[numpy.uint8])

Adds Opus packet to process.

Parameters

bytes (numpy.array) – Opus packet bytes (numpy.uint8). It is expected that packet contains data for single mono stream.Note:

Audio sample rate is predefined at the create time. If exception was thrown, then stream state is being reset except the accumulated results buffer.

finalize()

Finalizes input audio stream to process remaining audio samples and produce result if it’s possible.

get_verify_result() verify.VerifyStreamResult

Retrieves verify result from output queue.

Returns

One verify result.

Return type

VerifyStreamResult

Raises

RuntimeError – If output queue is empty.Note:

Use has_verify_results() to check if there are available verify results.

has_verify_results() bool

Checks if there are verify results in output queue.

Returns

True is there are results available, else otherwise.

Return type

bool

reset()

Resets stream state.