Skip to content

REST API usage

Available endpoints

REST API provides separate endpoints for IDVoice CC SDK components, but they all follow the same pattern.

First, the API is specifically designed to be stateless, the API server does not store or cache any voiceprints or evaluation results locally. This helps to avoid having additional infrastructure burden to support deployment and makes it suitable for load balancing and horizontal scaling of any size.

Second, all API endpoints are split into *_from_file and *_from_samples variants.

  • All *_from_file requests expect multipart POST request containing wav_file part with uploaded wav file.
  • All *_from_samples requests expect octet-stream binary request with sample rate passed in the query and samples data in PCM16 in the body.
  • All requests that don't require binary data upload are plain JSON data.
  • All API answers are JSON data as well.

Examples of endpoints include:

  • Get basic build information: GET /core/get_build_info. There is a version string and list of included components.

  • Get license expiration date: GET /core/get_expiration_date.

  • Creating a voice template: /voice_template_factory/create_voice_template_from_file, with optional channel_type query parameter being MIC or TEL. Usually you won't need this parameter, it is used for models that are specifically calibrated for cross-channel comparison.

  • You can also create voice template and perform additional checks such as spoof check and audio quality check within a single request via /voice_template_factory/create_voice_template_checked_from_file.

  • Comparing two voice templates: /voice_template_matcher/match_voice_templates. The input is in JSON, Use outputs of previous query as template1 and template2 parameters. Note that both voice templates must be created using the same model.

  • Getting anti-spoofing score for an audio: use /antispoof_engine/is_spoof_file. Some anti-spoofing models have additional per-attack scores, like score_TTS or score_Replay, but the catch-all score parameter is what you need in most cases.

  • Getting speech metrics for an audio: use /speech_summary_engine/get_speech_summary_from_file. total_length_ms and speech_length_ms parameters are most valuable in the resulting JSON. There is also a verbose speech_events parameter, which is helpful in case you need to trim the file to only contain speech later.

Other endpoints like /snr_computer/* follow similar convention.

REST API Usage

The docker image itself exposes port 8080 which is used for communicating with underlying REST service. The service has /swagger-ui/index.html endpoint where you can find documentation for endpoints and examples of all operations.

Note that the set of available methods is defined by the set of components which are present in the delivery. To determine which components are supported use /core/get_build_info/ API endpoint.

There are several basic command-line examples of using various endpoints at the right panel. Make sure you have curl , jq and jo utilities installed.

Speaker verification REST API call example:

$ voice_template1=$(curl -s --form wav_file=@file1.wav -X POST localhost:8080/voice_template_factory/create_voice_template_from_file)
$ voice_template2=$(curl -s --form wav_file=@file2.wav -X POST localhost:8080/voice_template_factory/create_voice_template_from_file)
$ curl -H 'Content-Type: application/json' --data $(jo template1=$voice_template1 template2=$voice_template1) -X POST localhost:8080/voice_template_matcher/match_voice_templates
{"probability":1.0,"score":0.9999996423721313}

Speech summary REST API call example:

$ # real voice
$ curl --form wav_file=@speech.wav -X POST localhost:8080/speech_summary_engine/get_speech_summary_from_file | jq 'del(.vad_result)'
{
"background_length": 1.350000023841858,
"speech_signal_length": 1.7100000381469727,
"total_length": 3.066499948501587
}
$ # silence
$ curl --form wav_file=@silence.wav -X POST localhost:8080/speech_summary_engine/get_speech_summary_from_file | jq 'del(.vad_result)'
{
"background_length": 13.109999656677246,
"speech_signal_length": 0,
"total_length": 13.134499549865723
}

Error handling

REST and WebSocket API reflect exceptions that may be raised by the SDK and wrap them into 500 Internal Server Error responses.

Request contents are validated, and 400 Bad Request error response is returned if the given request is invalid.

Invalid REST API usage example:

$ curl --form invalid_name=@file.wav -X POST localhost:8080/diarization_engine/get_segmentation_from_file | jq 'del(.timestamp)'
{
  "status": 400,
  "error": "Bad Request",
  "message": "Required request part 'wav_file' is not present",
  "path": "/diarization_engine/get_segmentation_from_file"
}

Invalid Websocket usage example:

$ timeout 5 wsdump.py ws://localhost:8080/speech_summary_stream --raw --text 300
500
$ docker logs voicesdk-cc-server | grep 300
SpeechSummaryEngine: ResamplePipe: input sample rate 300 is invalid, should be 1000 <= sample rate <= 96000