File: README.overview.md

package info (click to toggle)
speech-dispatcher 0.12.0-5
links: PTS, VCS
area: main
in suites: forky, sid, trixie
size: 36,828 kB
sloc: ansic: 27,709; sh: 5,404; python: 1,224; makefile: 1,041; lisp: 583; cpp: 404
file content (176 lines) | stat: -rw-r--r-- 7,024 bytes
parent folder | download | duplicates (5)
Overview of Speech Dispatcher
=============================

Key features:
-------------

  * Common interface to different Text To Speech (TTS) engines
  * Handling concurrent synthesis requests — requests may come asynchronously
  from multiple sources within an application and/or from different applications
  * Subsequent serialization, resolution of conflicts and priorities of incoming
  requests
  * Context switching — state is maintained for each client connection
  independently, event for connections from within one application
  * High-level client interfaces for popular programming languages
  * Common sound output handling — audio playback is handled by Speech
  Dispatcher rather than the TTS engine, since most engines have limited sound
  output capabilities

What is a very high level GUI library to graphics, Speech Dispatcher is to
speech synthesis. The application neither needs to talk to the devices directly
nor to handle concurrent access, sound output and other tricky aspects of the
speech subsystem.

Supported TTS engines:
----------------------

  * Baratinoo (Voxygen)
  * Cicero
  * DECtalk Software (through a generic driver)
  * Epos (through a generic driver)
  * eSpeak
  * eSpeak+MBROLA (through a generic driver)
  * eSpeak NG
  * eSpeak NG+MBROLA
  * eSpeak NG+MBROLA (through a generic driver)
  * Festival
  * Flite
  * IBM TTS
  * Ivona
  * Kali TTS
  * llia_phon (through a generic driver)
  * MaryTTS (through a generic driver)
  * Mimic3 (through a generic driver)
  * Multispeech (driver is distributed together with the TTS engine)
  * Open JTalk
  * Pico (SVOX)
  * RHVoice (driver is distributed together with the TTS engine)
  * Swift (Cepstral) (through a generic driver)
  * Voxin

Supported sound output subsystems:
----------------------------------

  * ALSA
  * Libao
  * NAS
  * OSS
  * PipeWire
  * PulseAudio

The architecture is based on a client/server model. The clients are all the
applications in the system that want to produce speech (typically assisting
technologies). The basic means of client communication with Speech Dispatcher
is through a Unix socket or Inet TCP connection using the Speech Synthesis
Interface Protocol (See the SSIP documentation for more information). High-level
client libraries for many popular programming languages implement this protocol
to make its usage as simple as possible.

Supported client interfaces:
----------------------------

  * C/C++ API
  * Python 3 API
  * Java API
  * Emacs Lisp API
  * Common Lisp API
  * Guile API
  * Simple command line client

A golang API is also available on https://github.com/ilyapashuk/go-speechd

A rust crate is also available on https://crates.io/crates/ssip-client https://gitlab.com/lp-accessibility/ssip-client

Existing assistive technologies known to work with Speech Dispatcher:

  * BRLTTY (see https://brltty.app/)
  * ChromeVox Classic (https://chromewebstore.google.com/detail/screen-reader/kgejglhpjiefppelpmljglcjbhoiplfn)
  * Emacspeak+e2spd (see https://github.com/mglambda/e2spd)
  * Emacspeak+emacspeak-speechd (see https://github.com/taniodev/emacspeak-speechd)
  * Emacspeak+espd (see https://github.com/bartbunting/espd)
  * Fenrir (see https://github.com/chrys87/fenrir)
  * KMouth (see https://apps.kde.org/kmouth/)
  * Orca (see https://orca.gnome.org/)
  * speechd-el (see https://devel.freebsoft.org/speechd-el)
  * TDSR (see https://github.com/tspivey/tdsr)
  * YASR (see https://yasr.sourceforge.net/)

Voices settings
---------------
The available voices depend on the TTS engines and voices installed.

The voice to use can be set in speech-dispatcher itself, at the system and user
level, and from the client application, like Orca, speechd-el or Chromevox.
The settings in each application and in speech dispatcher are independent of
each others.

The settings in speech-dispatcher at the user level override those
made at the system level.


In speech-dispatcher, the system settings are recorded in the file
/etc/speech-dispatcher/speechd.conf among which a default synthesizer, a voice
type or symbolic name (e.g. MALE1) and a default language.

In turn, each installed voice is associated to a voice type and a language, thus
with this default setting a voice available with these characteristics (voice
type, language, synthesizer) will be chosen if available.


The default values of these voice parameters can also be set at the system
level customized at the user level: rate, pitch, pitch range and volume.

It is also possible to make the synthesizer depend on the language used.


The user settings are written in the file ~/.config/speech-dispatcher/spd.conf
using the application spd-conf, which also allows to modify the system settings.

spd-conf allows to set the synthesizer, the language and other voice parameters
but not select directly a specific voice. 


Instead a specific voice can be chosen from the client application, selecting it
by name in a proposed list that depends on the synthesizer chosen.

The voice name can be a first name like 'bob' or 'viginie", a locale code in the
form language_COUNTRY or a language code followed by a number, for instance.

The language code associated to each name is listed alongside it between
parenthesis, like (fr-FR) for French from France.

Where to look at in case of a sound or speech issue
---------------------------------------------------

Speech dispatcher links together all the components that contribute to speak a
text, so if you don't get speech at all or something is not spoken, or not the
way you expect, this can come from speech dispatcher itself or from any of those
components (or lack of) and their settings:
- the audio subsystem in use, e.g. alsa or pulseaudio,
- the synthesizer in use, e.g espeak-ng or pico,
- the client application, like Orca or speechd-el or an underlying software like
  at-spi,
- the application that provides the text to be spoken, like Firefox.

How to investigate a specific issue goes far beyond this document, but bear in
mind that all listed components can be involved, as the audio equipment in use
and the way it is linked to the computer.


Copyright (C) 2001-2009 Brailcom, o.p.s
Copyright (C) 2018-2020, 2022 Samuel Thibault <samuel.thibault@ens-lyon.org>
Copyright (C) 2018 Didier Spaier <didier@slint.fr>
Copyright (C) 2018 Alex ARNAUD <alexarnaud@hypra.fr>

This program is free software; you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free Software
Foundation; either version 2 of the License, or (at your option) any later
version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY
WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.  See the GNU General Public License for more details (file
COPYING in the root directory).

You should have received a copy of the GNU General Public License
along with this program.  If not, see <https://www.gnu.org/licenses/>.