Diarization engine class (interface), intended to perform speaker diarization.
More...
#include <voicesdk/diarization/diarization.h>
|
virtual TimeStamps | GetSegmentation (const float *float_samples, const size_t samples_num, const size_t sample_rate, const size_t num_speakers=0)=0 |
| Performs speaker diarization from the given float (in [-1; 1] range) audio samples. More...
|
|
virtual TimeStamps | GetSegmentation (const int16_t *pcm16_samples, const size_t samples_num, const size_t sample_rate, const size_t num_speakers=0)=0 |
| Performs speaker diarization from the given PCM16 audio samples. More...
|
|
virtual TimeStamps | GetSegmentation (const uint8_t *pcm16_bytes, const size_t bytes_num, const size_t sample_rate, const size_t num_speakers=0)=0 |
| Performs speaker diarization using given PCM16 samples bytes representation. More...
|
|
virtual TimeStamps | GetSegmentation (const std::string &audio_path, const size_t num_speakers=0)=0 |
| Performs speaker diarization from the given audio file. More...
|
|
Diarization engine class (interface), intended to perform speaker diarization.
◆ Create()
static DiarizationEngine::Ptr voicesdk::diar::DiarizationEngine::Create |
( |
const std::string & |
init_dir | ) |
|
|
static |
◆ GetSegmentation() [1/4]
virtual TimeStamps voicesdk::diar::DiarizationEngine::GetSegmentation |
( |
const float * |
float_samples, |
|
|
const size_t |
samples_num, |
|
|
const size_t |
sample_rate, |
|
|
const size_t |
num_speakers = 0 |
|
) |
| |
|
pure virtual |
Performs speaker diarization from the given float (in [-1; 1] range) audio samples.
- Parameters
-
float_samples | pointer to array with samples |
samples_num | size of array with samples |
sample_rate | sample rate |
num_speakers | optional parameter, can be passed if number of speakers in record is known (e.g. 2 for phone conversation), increases diarization accuracy; if 0, then engine tries to determine number of speakers by itself, thus accuracy can be lower |
- Exceptions
-
- Returns
- timestamps of speakers utterances
◆ GetSegmentation() [2/4]
virtual TimeStamps voicesdk::diar::DiarizationEngine::GetSegmentation |
( |
const int16_t * |
pcm16_samples, |
|
|
const size_t |
samples_num, |
|
|
const size_t |
sample_rate, |
|
|
const size_t |
num_speakers = 0 |
|
) |
| |
|
pure virtual |
Performs speaker diarization from the given PCM16 audio samples.
- Parameters
-
pcm16_samples | pointer to array with samples |
samples_num | size of array with samples |
sample_rate | sample rate |
num_speakers | optional parameter, can be passed if number of speakers in record is known (e.g. 2 for phone conversation), increases diarization accuracy; if 0, then engine tries to determine number of speakers by itself, thus accuracy can be lower |
- Exceptions
-
- Returns
- timestamps of speakers utterances
◆ GetSegmentation() [3/4]
virtual TimeStamps voicesdk::diar::DiarizationEngine::GetSegmentation |
( |
const std::string & |
audio_path, |
|
|
const size_t |
num_speakers = 0 |
|
) |
| |
|
pure virtual |
Performs speaker diarization from the given audio file.
- Parameters
-
audio_path | path to audio file |
num_speakers | optional parameter, can be passed if number of speakers in record is known (e.g. 2 for phone conversation), increases diarization accuracy; if 0, then engine tries to determine number of speakers by itself, thus accuracy can be lower |
- Exceptions
-
- Returns
- timestamps of speakers utterances
◆ GetSegmentation() [4/4]
virtual TimeStamps voicesdk::diar::DiarizationEngine::GetSegmentation |
( |
const uint8_t * |
pcm16_bytes, |
|
|
const size_t |
bytes_num, |
|
|
const size_t |
sample_rate, |
|
|
const size_t |
num_speakers = 0 |
|
) |
| |
|
pure virtual |
Performs speaker diarization using given PCM16 samples bytes representation.
- Parameters
-
pcm16_bytes | pointer to array with bytes |
bytes_num | size of array with bytes |
sample_rate | sample rate |
num_speakers | optional parameter, can be passed if number of speakers in record is known (e.g. 2 for phone conversation), increases diarization accuracy; if 0, then engine tries to determine number of speakers by itself, thus accuracy can be lower |
- Exceptions
-
- Returns
- timestamps of speakers utterances