API proto specification

VoiceDock provides a modular approach. For each module type, the gRPC API interaction specification is described in the Protocol Buffers format.

The VoiceDock concept is aimed at creating a universal and simple industrial-grade API that solves problems in the field of creating digital assistants, human interaction with artificial intelligence and related areas.

The mission of VoiceDock is to quickly adapt new research by scientists for use in production. So that previously written code can quickly switch to using the new implementation. It is enough just to create an implementation of the gRPC API module for the new algorithm.

API Overview

STT API - speech to text. (ASR - Automatic Speech Recognition).
TTS API - text to speech.
AI Chat API - text interaction with artificial intelligence

STT API

The API receives an audio stream in PCM format (int 16 bit little-endian bytes) and returns a stream of recognized text tokens.

proto	stt_api.proto
proto version	v1
implementation example	sttwhisper
area of application	voice assistant recognition of voice messages in instant messengers Next gen interactive voice response (IVR) Next gen voice answering machine smart voice recorder and others

TTS API

The API gets the text, language, speaker name and synthesizes the voice audio stream in PCM format (little-endian 16-bit bytes).

proto	tts_api.proto
proto version	v1
implementation example	ttspiper
area of application	voice assistant Next gen interactive voice response (IVR) Next gen voice answering machine and others

AI Chat API

The API accepts text and returns a stream of generated text tokens.

proto	aichat_api.proto
proto version	v1
implementation example	aichatllama
area of application	voice assistant Next gen interactive voice response (IVR) Next gen voice answering machine Text summary service and others