REST API usage
Available endpoints¶
REST API provides separate endpoints for IDVoice CC SDK components, but they all follow the same pattern.
First, the API is specifically designed to be stateless, the API server does not store or cache any voiceprints or evaluation results locally. This helps to avoid having additional infrastructure burden to support deployment and makes it suitable for load balancing and horizontal scaling of any size.
Second, all API endpoints are split into *_from_file
and *_from_samples
variants.
- All
*_from_file
requests expect multipart POST request containingwav_file
part with uploaded wav file. - All
*_from_samples
requests expect octet-stream binary request with sample rate passed in the query and samples data in PCM16 in the body. - All requests that don't require binary data upload are plain JSON data.
- All API answers are JSON data as well.
Examples of endpoints include:
-
Get basic build information: GET
/core/get_build_info
. There is a version string and list of included components. -
Get license expiration date: GET
/core/get_expiration_date
. -
Creating a voice template:
/voice_template_factory/create_voice_template_from_file
, with optionalchannel_type
query parameter beingMIC
orTEL
. Usually you won't need this parameter, it is used for models that are specifically calibrated for cross-channel comparison. -
You can also create voice template and perform additional checks such as spoof check and audio quality check within a single request via
/voice_template_factory/create_voice_template_checked_from_file
. -
Comparing two voice templates:
/voice_template_matcher/match_voice_templates
. The input is in JSON, Use outputs of previous query astemplate1
andtemplate2
parameters. Note that both voice templates must be created using the same model. -
Getting anti-spoofing score for an audio: use
/antispoof_engine/is_spoof_file
. Some anti-spoofing models have additional per-attack scores, likescore_TTS
orscore_Replay
, but the catch-allscore
parameter is what you need in most cases. -
Getting speech metrics for an audio: use
/speech_summary_engine/get_speech_summary_from_file
.total_length_ms
andspeech_length_ms
parameters are most valuable in the resulting JSON. There is also a verbosespeech_events
parameter, which is helpful in case you need to trim the file to only contain speech later.
Other endpoints like /snr_computer/*
follow similar convention.
REST API Usage¶
The docker image itself exposes port 8080 which is used for communicating with underlying REST service. The service has /swagger-ui/index.html
endpoint where you can find documentation for endpoints and examples of all operations.
Note that the set of available methods is defined by the set of components which are present in the delivery. To determine which components are supported use /core/get_build_info/
API endpoint.
There are several basic command-line examples of using various endpoints at the right panel. Make sure you have curl
, jq
and jo
utilities installed.
Speaker verification REST API call example:
$ voice_template1=$(curl -s --form wav_file=@file1.wav -X POST localhost:8080/voice_template_factory/create_voice_template_from_file)
$ voice_template2=$(curl -s --form wav_file=@file2.wav -X POST localhost:8080/voice_template_factory/create_voice_template_from_file)
$ curl -H 'Content-Type: application/json' --data $(jo template1=$voice_template1 template2=$voice_template1) -X POST localhost:8080/voice_template_matcher/match_voice_templates
{"probability":1.0,"score":0.9999996423721313}
Speech summary REST API call example:
$ # real voice
$ curl --form wav_file=@speech.wav -X POST localhost:8080/speech_summary_engine/get_speech_summary_from_file | jq 'del(.vad_result)'
{
"background_length": 1.350000023841858,
"speech_signal_length": 1.7100000381469727,
"total_length": 3.066499948501587
}
$ # silence
$ curl --form wav_file=@silence.wav -X POST localhost:8080/speech_summary_engine/get_speech_summary_from_file | jq 'del(.vad_result)'
{
"background_length": 13.109999656677246,
"speech_signal_length": 0,
"total_length": 13.134499549865723
}
Error handling¶
REST and WebSocket API reflect exceptions that may be raised by the SDK and wrap them into 500 Internal Server Error
responses.
Request contents are validated, and 400 Bad Request
error response is returned if the given request is invalid.
Invalid REST API usage example:
$ curl --form invalid_name=@file.wav -X POST localhost:8080/diarization_engine/get_segmentation_from_file | jq 'del(.timestamp)'
{
"status": 400,
"error": "Bad Request",
"message": "Required request part 'wav_file' is not present",
"path": "/diarization_engine/get_segmentation_from_file"
}
Invalid Websocket usage example:
$ timeout 5 wsdump.py ws://localhost:8080/speech_summary_stream --raw --text 300
500
$ docker logs voicesdk-cc-server | grep 300
SpeechSummaryEngine: ResamplePipe: input sample rate 300 is invalid, should be 1000 <= sample rate <= 96000