API proto specification
VoiceDock provides a modular approach. For each module type, the gRPC API interaction specification is described in the Protocol Buffers format.
The VoiceDock concept is aimed at creating a universal and simple industrial-grade API that solves problems in the field of creating digital assistants, human interaction with artificial intelligence and related areas.
The mission of VoiceDock is to quickly adapt new research by scientists for use in production. So that previously written code can quickly switch to using the new implementation. It is enough just to create an implementation of the gRPC API module for the new algorithm.
API Overview
- STT API - speech to text. (ASR - Automatic Speech Recognition).
- TTS API - text to speech.
- AI Chat API - text interaction with artificial intelligence
STT API
The API receives an audio stream in PCM format (int 16 bit little-endian bytes) and returns a stream of recognized text tokens.
proto | stt_api.proto |
---|---|
proto version | v1 |
implementation example | sttwhisper |
area of application |
|
TTS API
The API gets the text, language, speaker name and synthesizes the voice audio stream in PCM format (little-endian 16-bit bytes).
proto | tts_api.proto |
---|---|
proto version | v1 |
implementation example | ttspiper |
area of application |
|
AI Chat API
The API accepts text and returns a stream of generated text tokens.
proto | aichat_api.proto |
---|---|
proto version | v1 |
implementation example | aichatllama |
area of application |
|