Class SpeechEndpointDetector

java.lang.Object
net.idrnd.voicesdk.common.VoiceSdkNativePeer
net.idrnd.voicesdk.media.SpeechEndpointDetector
All Implemented Interfaces:
AutoCloseable

public class SpeechEndpointDetector extends VoiceSdkNativePeer
Provides the functionality of speech end detection in audio stream.

Enables streaming scenario for end detection when audio data is processed by continuous buffers. Intended usage scenario is following:

End detection is parameterized with 2 thresholds: minimum speech length and maximum silence length. End detection is triggered when minimum speech length is accumulated and then uninterrupted silence longer that 'max silence length' occurs. This class is stateful and is not thread safe.

This class serves as gateway to native Voice SDK implementation and allocates resources on native heap. To release the allocated memory, AutoCloseable.close() method should be invoked when the instance is no longer needed.

Any method that delegates to native call may throw VoiceSdkEngineException

  • Constructor Summary

    Constructors
    Constructor
    Description
    SpeechEndpointDetector(int minSpeechLengthMs, int maxSilenceLengthMs, int sampleRate)
    Initializes speech endpoint detector.
  • Method Summary

    Modifier and Type
    Method
    Description
    void
    addSamples(byte[] bytes)
    Adds audio samples for processing in PCM16 format
    void
    addSamples(float[] floatSamples)
    Audio samples for processing encoded in normalized float format
    void
    addSamples(short[] pcm16Samples)
    Adds audio samples for processing in PCM16 format
    boolean
    Checks if speech end is detected after the previous addSamples(byte[]) call
    void
    Resets detector, clearing all the accumulated statistics

    Methods inherited from class net.idrnd.voicesdk.common.VoiceSdkNativePeer

    close, equals, hashCode

    Methods inherited from class java.lang.Object

    getClass, notify, notifyAll, toString, wait, wait, wait
  • Constructor Details

    • SpeechEndpointDetector

      public SpeechEndpointDetector(int minSpeechLengthMs, int maxSilenceLengthMs, int sampleRate)
      Initializes speech endpoint detector.
      Parameters:
      minSpeechLengthMs - the threshold for required accumulated speech duration
      maxSilenceLengthMs - the threshold for the duration of continuous silence that triggers the end detection if the required amount of speech is accumulated
      sampleRate - sample rate of incoming audio data stream
      Throws:
      VoiceSdkEngineException - wraps native exceptions
  • Method Details

    • reset

      public void reset()
      Resets detector, clearing all the accumulated statistics
      Throws:
      VoiceSdkEngineException - wraps native exceptions
    • addSamples

      public void addSamples(byte[] bytes)
      Adds audio samples for processing in PCM16 format
      Parameters:
      bytes - Array of little-endian PCM16 audio bytes
      Throws:
      VoiceSdkEngineException - wraps native exceptions
    • addSamples

      public void addSamples(short[] pcm16Samples)
      Adds audio samples for processing in PCM16 format
      Parameters:
      pcm16Samples - Array of PCM16 audio samples
      Throws:
      VoiceSdkEngineException - wraps native exceptions
    • addSamples

      public void addSamples(float[] floatSamples)
      Audio samples for processing encoded in normalized float format
      Parameters:
      floatSamples - Array of float audio samples (in [-1, 1] range)
      Throws:
      VoiceSdkEngineException - wraps native exceptions
    • isSpeechEnded

      public boolean isSpeechEnded()
      Checks if speech end is detected after the previous addSamples(byte[]) call
      Returns:
      true if speech end is detected