Skip to content

FAQ

Where to get init data folder for IDVoice SDK for Android?

IDVoice SDK for Android is distributed as a single AAR package containing precompiled native IDVoice SDK libraries, compiled Java classes and init data assets. You can find init data at voicesdk-aar-full-release.aar/assets/ folder. However, you should not extract them by yourself. IDVoice SDK provides a special auxiliary class called AssetsExtractor. This class extracts the required init data assets at application runtime and copies them to the device filesystem in order to use them as regular files. You can see a corresponding Java example below. Assets are extracted only at the very first run of the application and then cached at the filesystem.

Here is a complete list of available subfolders for extracted assets stored as AssetsExtractor static variables (note that actual folder set is defined by IDVoice SDK distribution, i.e. LIVENESS_INIT_DATA_SUBPATH is not present in verify-only release):

Variable name Class using the data
SPEECH_SUMMARY_INIT_DATA_SUBPATH SpeechSummaryEngine
SNR_COMPUTER_INIT_DATA_SUBPATH SNRComputer
LIVENESS_INIT_DATA_SUBPATH LivenessEngine
VERIFY_INIT_DATA_MIC_V1_SUBPATH VoiceTemplateFactory, VoiceTemplateMatcher

AAR package is actually a regular ZIP archive, so it can be easily opened and modified with any archive manager. This enables an option to reduce the size of application incorporating IDVoice SDK by removing unnecessary init data from the AAR. For instance, you are building an application for voice verification, then you can remove all the folders from voicesdk-aar-full-release.aar/assets/ except verify_init_data/.

Below example illustrates engine initialization flow for voice liveness check feature:

package net.app;

import android.content.Context;

import net.idrnd.android.media.AssetsExtractor;
import net.idrnd.voicesdk.liveness.LivenessEngine;

import java.io.File;

public class EngineManager {
    public LivenessEngine livenessEngine;

    private static EngineManager instance;

    private EngineManager() {}

    public static EngineManager getInstance() {
        if (instance == null) {
            instance = new EngineManager();
        }

        return instance;
    }

    public void init(Context context) {
        // 1) If init data was not extracted to external dir, extract it
        // this skips extraction if it was already done for this version
        AssetsExtractor assetsExtractor = new AssetsExtractor(context);
        File assetsDir = assetsExtractor.extractAssets();

        // 1.1) Retrieve init data path in external dir
        String initDataPath = assetsDir.getPath();

        // 2.1) Init LivenessEngine
        String livenessInitDataPath = new File(initDataPath, AssetsExtractor.LIVENESS_INIT_DATA_SUBPATH).getPath();
        livenessEngine = new LivenessEngine(livenessInitDataPath);
    }
}

Can I create voice profile by sequentially merging intermediate voice templates?

It may seem natural to merge voice templates one by one as you're getting more voice samples for a speaker. However, this logic leads to a significant accuracy degradation comparing to all-together merge. Thus, we highly recommend creating a union voice template (i.e. speaker voice profile) only when you got all available voice samples.

Below code samples demonstate wrong and right ways of merging voice templates into a union voice profile.

INCORRECT: sequential merging

...

auto template1 = voiceTemplateFactory.createVoiceTemplate("wav_file1.wav");
template1 = voiceTemplateFactory.mergeVoiceTemplates({ template1, voiceTemplateFactory.createVoiceTemplate("wav_file2.wav") });
template1 = voiceTemplateFactory.mergeVoiceTemplates({ template1, voiceTemplateFactory.createVoiceTemplate("wav_file3.wav") });
template1 = voiceTemplateFactory.mergeVoiceTemplates({ template1, voiceTemplateFactory.createVoiceTemplate("wav_file4.wav") });

// Now template1 is considered as complete voice profile

...
...

template1 = voice_template_factory.create_voice_template_from_file("wav_file1.wav")
template1 = voice_template_factory.merge_voice_templates([template1, voice_template_factory.create_voice_template_from_file("wav_file2.wav")])
template1 = voice_template_factory.merge_voice_templates([template1, voice_template_factory.create_voice_template_from_file("wav_file3.wav")])
template1 = voice_template_factory.merge_voice_templates([template1, voice_template_factory.create_voice_template_from_file("wav_file4.wav")])

# Now template1 is considered as complete voice profile

...
...

VoiceTemplate template1 = voiceTemplateFactory.createVoiceTemplate("wav_file1.wav");
template1 = voiceTemplateFactory.mergeVoiceTemplates({ template1, voiceTemplateFactory.createVoiceTemplate("wav_file2.wav") });
template1 = voiceTemplateFactory.mergeVoiceTemplates({ template1, voiceTemplateFactory.createVoiceTemplate("wav_file3.wav") });
template1 = voiceTemplateFactory.mergeVoiceTemplates({ template1, voiceTemplateFactory.createVoiceTemplate("wav_file4.wav") });

// Now template1 is considered as complete voice profile

...

CORRECT: batch merging

...

auto template1 = voiceTemplateFactory.createVoiceTemplate("wav_file1.wav");
auto template2 = voiceTemplateFactory.createVoiceTemplate("wav_file2.wav");
auto template3 = voiceTemplateFactory.createVoiceTemplate("wav_file3.wav");
auto template4 = voiceTemplateFactory.createVoiceTemplate("wav_file4.wav");

auto resTemplate = voiceTemplateFactory.mergeVoiceTemplates({ template1, template2, template3, template4 });

// Now resTemplate is considered as complete voice profile

...
...

template1 = voice_template_factory.create_voice_template_from_file("wav_file1.wav")
template2 = voice_template_factory.create_voice_template_from_file("wav_file2.wav")
template3 = voice_template_factory.create_voice_template_from_file("wav_file3.wav")
template4 = voice_template_factory.create_voice_template_from_file("wav_file4.wav")

auto resTemplate = voice_template_factory.merge_voice_templates([template1, template2, template3, template4])

# Now resTemplate is considered as complete voice profile

...
...

VoiceTemplate template1 = voiceTemplateFactory.createVoiceTemplate("wav_file1.wav");
VoiceTemplate template2 = voiceTemplateFactory.createVoiceTemplate("wav_file2.wav");
VoiceTemplate template3 = voiceTemplateFactory.createVoiceTemplate("wav_file3.wav");
VoiceTemplate template4 = voiceTemplateFactory.createVoiceTemplate("wav_file4.wav");

VoiceTemplate resTemplate = voiceTemplateFactory.mergeVoiceTemplates({ template1, template2, template3, template4 });

// Now resTemplate is considered as complete voice profile

...

Can I control CPU usage?

Many IDVoice SDK components are parallelized at different levels. Out-of-box distribution will automatically select the optimal number of threads to use (usually the selection is made to utilize almost all CPU resources). However, nowadays it is quite common to build multi-threaded applications with parallelism implemented not by internal libraries, but at the application level. In that case multi-threaded environments might conflict in terms of performance, so it could slow down your application.

Some real-life examples of this kind of situation:

  • mobile applications, where all the computationally expensive work is usually performed asynchronously
  • server software, where each client is being served with a separate thread

In these cases it is useful to tune SDK-level parallelism in order to achieve better latency or throughput. For these purposes IDVoice SDK provides a setNumThreads function which allows developer to set a desired number of threads available for IDVoice SDK. The other option is VOICESDK_NUM_THREADS environment variable, which provides the same functionality.

Setting number of threads at runtime:

#include <voicesdk/core/settings.h>

voicesdk::setNumThreads(6);
import net.idrnd.voicesdk.core.Settings;

Settings.setNumThreads(6);
from voicesdk.core import set_num_threads

set_num_threads(6)

Note

setNumThreads function call takes precedence over VOICESDK_NUM_THREADS environment variable.

Despite that setNumThreads can be successfully called at any runtime moment, this method should be called before engine initialization. Otherwise, the changes won't be applied to the engine instance.

Note

With any number of threads setting, any IDVoice SDK class or method is thread-safe.

Can I make voice template file or serialization smaller?

It is possible to employ voice template compression to get smaller binary representations of voice template:

#include <voicesdk/core/settings.h>

voicesdk::setUseVoiceTemplateCompression(true);

voice_template->saveToFile("voice_template.bin");
import net.idrnd.voicesdk.core.Settings;

Settings.setUseVoiceTemplateCompression(true);

voiceTemplate.saveToFile("voice_template.bin");
from voicesdk.core import set_use_voice_template_compression

set_use_voice_template_compression(True)

voice_template.save_to_file("voice_template.bin")

The expected reduction in voice template binary representation size is approximately 4 times.