Skip to content

FAQ

Can I create voice profile by sequentially merging intermediate voice templates?

It may seem natural to merge voice templates one by one as you're getting more voice samples for a speaker. However, this logic leads to a significant accuracy degradation comparing to all-together merge. Thus, we highly recommend creating a union voice template (i.e. speaker voice profile) only when you got all available voice samples.

Below code samples demonstate wrong and right ways of merging voice templates into a union voice profile.

INCORRECT: sequential merging

...

auto template1 = voiceTemplateFactory.createVoiceTemplate("wav_file1.wav");
template1 = voiceTemplateFactory.mergeVoiceTemplates({ template1, voiceTemplateFactory.createVoiceTemplate("wav_file2.wav") });
template1 = voiceTemplateFactory.mergeVoiceTemplates({ template1, voiceTemplateFactory.createVoiceTemplate("wav_file3.wav") });
template1 = voiceTemplateFactory.mergeVoiceTemplates({ template1, voiceTemplateFactory.createVoiceTemplate("wav_file4.wav") });

// Now template1 is considered as complete voice profile

...
...

template1 = voice_template_factory.create_voice_template_from_file("wav_file1.wav")
template1 = voice_template_factory.merge_voice_templates([template1, voice_template_factory.create_voice_template_from_file("wav_file2.wav")])
template1 = voice_template_factory.merge_voice_templates([template1, voice_template_factory.create_voice_template_from_file("wav_file3.wav")])
template1 = voice_template_factory.merge_voice_templates([template1, voice_template_factory.create_voice_template_from_file("wav_file4.wav")])

# Now template1 is considered as complete voice profile

...
...

VoiceTemplate template1 = voiceTemplateFactory.createVoiceTemplate("wav_file1.wav");
template1 = voiceTemplateFactory.mergeVoiceTemplates({ template1, voiceTemplateFactory.createVoiceTemplate("wav_file2.wav") });
template1 = voiceTemplateFactory.mergeVoiceTemplates({ template1, voiceTemplateFactory.createVoiceTemplate("wav_file3.wav") });
template1 = voiceTemplateFactory.mergeVoiceTemplates({ template1, voiceTemplateFactory.createVoiceTemplate("wav_file4.wav") });

// Now template1 is considered as complete voice profile

...

CORRECT: batch merging

...

auto template1 = voiceTemplateFactory.createVoiceTemplate("wav_file1.wav");
auto template2 = voiceTemplateFactory.createVoiceTemplate("wav_file2.wav");
auto template3 = voiceTemplateFactory.createVoiceTemplate("wav_file3.wav");
auto template4 = voiceTemplateFactory.createVoiceTemplate("wav_file4.wav");

auto resTemplate = voiceTemplateFactory.mergeVoiceTemplates({ template1, template2, template3, template4 });

// Now resTemplate is considered as complete voice profile

...
...

template1 = voice_template_factory.create_voice_template_from_file("wav_file1.wav")
template2 = voice_template_factory.create_voice_template_from_file("wav_file2.wav")
template3 = voice_template_factory.create_voice_template_from_file("wav_file3.wav")
template4 = voice_template_factory.create_voice_template_from_file("wav_file4.wav")

auto resTemplate = voice_template_factory.merge_voice_templates([template1, template2, template3, template4])

# Now resTemplate is considered as complete voice profile

...
...

VoiceTemplate template1 = voiceTemplateFactory.createVoiceTemplate("wav_file1.wav");
VoiceTemplate template2 = voiceTemplateFactory.createVoiceTemplate("wav_file2.wav");
VoiceTemplate template3 = voiceTemplateFactory.createVoiceTemplate("wav_file3.wav");
VoiceTemplate template4 = voiceTemplateFactory.createVoiceTemplate("wav_file4.wav");

VoiceTemplate resTemplate = voiceTemplateFactory.mergeVoiceTemplates({ template1, template2, template3, template4 });

// Now resTemplate is considered as complete voice profile

...

Can I control CPU usage?

Many IDVoice CC SDK components are parallelized at different levels. Out-of-box distribution will automatically select the optimal number of threads to use (usually the selection is made to utilize almost all CPU resources). However, nowadays it is quite common to build multi-threaded applications with parallelism implemented not by internal libraries, but at the application level. In that case multi-threaded environments might conflict in terms of performance, so it could slow down your application.

Some real-life examples of this kind of situation:

  • mobile applications, where all the computationally expensive work is usually performed asynchronously
  • server software, where each client is being served with a separate thread

In these cases it is useful to tune SDK-level parallelism in order to achieve better latency or throughput. For these purposes IDVoice CC SDK provides a setNumThreads function which allows developer to set a desired number of threads available for IDVoice CC SDK. The other option is VOICESDK_NUM_THREADS environment variable, which provides the same functionality.

Setting number of threads at runtime:

#include <voicesdk/core/settings.h>

voicesdk::setNumThreads(6);
import net.idrnd.voicesdk.core.Settings;

Settings.setNumThreads(6);
from voicesdk.core import set_num_threads

set_num_threads(6)

Note

setNumThreads function call takes precedence over VOICESDK_NUM_THREADS environment variable.

Despite that setNumThreads can be successfully called at any runtime moment, this method should be called before engine initialization. Otherwise, the changes won't be applied to the engine instance.

Note

With any number of threads setting, any IDVoice CC SDK class or method is thread-safe.

Can I make voice template file or serialization smaller?

It is possible to employ voice template compression to get smaller binary representations of voice template:

#include <voicesdk/core/settings.h>

voicesdk::setUseVoiceTemplateCompression(true);

voice_template->saveToFile("voice_template.bin");
import net.idrnd.voicesdk.core.Settings;

Settings.setUseVoiceTemplateCompression(true);

voiceTemplate.saveToFile("voice_template.bin");
from voicesdk_cc.core import set_use_voice_template_compression

set_use_voice_template_compression(True)

voice_template.save_to_file("voice_template.bin")

The expected reduction in voice template binary representation size is approximately 4 times.