1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176
|
Overview of Speech Dispatcher
=============================
Key features:
-------------
* Common interface to different Text To Speech (TTS) engines
* Handling concurrent synthesis requests — requests may come asynchronously
from multiple sources within an application and/or from different applications
* Subsequent serialization, resolution of conflicts and priorities of incoming
requests
* Context switching — state is maintained for each client connection
independently, event for connections from within one application
* High-level client interfaces for popular programming languages
* Common sound output handling — audio playback is handled by Speech
Dispatcher rather than the TTS engine, since most engines have limited sound
output capabilities
What is a very high level GUI library to graphics, Speech Dispatcher is to
speech synthesis. The application neither needs to talk to the devices directly
nor to handle concurrent access, sound output and other tricky aspects of the
speech subsystem.
Supported TTS engines:
----------------------
* Baratinoo (Voxygen)
* Cicero
* DECtalk Software (through a generic driver)
* Epos (through a generic driver)
* eSpeak
* eSpeak+MBROLA (through a generic driver)
* eSpeak NG
* eSpeak NG+MBROLA
* eSpeak NG+MBROLA (through a generic driver)
* Festival
* Flite
* IBM TTS
* Ivona
* Kali TTS
* llia_phon (through a generic driver)
* MaryTTS (through a generic driver)
* Mimic3 (through a generic driver)
* Multispeech (driver is distributed together with the TTS engine)
* Open JTalk
* Pico (SVOX)
* RHVoice (driver is distributed together with the TTS engine)
* Swift (Cepstral) (through a generic driver)
* Voxin
Supported sound output subsystems:
----------------------------------
* ALSA
* Libao
* NAS
* OSS
* PipeWire
* PulseAudio
The architecture is based on a client/server model. The clients are all the
applications in the system that want to produce speech (typically assisting
technologies). The basic means of client communication with Speech Dispatcher
is through a Unix socket or Inet TCP connection using the Speech Synthesis
Interface Protocol (See the SSIP documentation for more information). High-level
client libraries for many popular programming languages implement this protocol
to make its usage as simple as possible.
Supported client interfaces:
----------------------------
* C/C++ API
* Python 3 API
* Java API
* Emacs Lisp API
* Common Lisp API
* Guile API
* Simple command line client
A golang API is also available on https://github.com/ilyapashuk/go-speechd
A rust crate is also available on https://crates.io/crates/ssip-client https://gitlab.com/lp-accessibility/ssip-client
Existing assistive technologies known to work with Speech Dispatcher:
* BRLTTY (see https://brltty.app/)
* ChromeVox Classic (https://chromewebstore.google.com/detail/screen-reader/kgejglhpjiefppelpmljglcjbhoiplfn)
* Emacspeak+e2spd (see https://github.com/mglambda/e2spd)
* Emacspeak+emacspeak-speechd (see https://github.com/taniodev/emacspeak-speechd)
* Emacspeak+espd (see https://github.com/bartbunting/espd)
* Fenrir (see https://github.com/chrys87/fenrir)
* KMouth (see https://apps.kde.org/kmouth/)
* Orca (see https://orca.gnome.org/)
* speechd-el (see https://devel.freebsoft.org/speechd-el)
* TDSR (see https://github.com/tspivey/tdsr)
* YASR (see https://yasr.sourceforge.net/)
Voices settings
---------------
The available voices depend on the TTS engines and voices installed.
The voice to use can be set in speech-dispatcher itself, at the system and user
level, and from the client application, like Orca, speechd-el or Chromevox.
The settings in each application and in speech dispatcher are independent of
each others.
The settings in speech-dispatcher at the user level override those
made at the system level.
In speech-dispatcher, the system settings are recorded in the file
/etc/speech-dispatcher/speechd.conf among which a default synthesizer, a voice
type or symbolic name (e.g. MALE1) and a default language.
In turn, each installed voice is associated to a voice type and a language, thus
with this default setting a voice available with these characteristics (voice
type, language, synthesizer) will be chosen if available.
The default values of these voice parameters can also be set at the system
level customized at the user level: rate, pitch, pitch range and volume.
It is also possible to make the synthesizer depend on the language used.
The user settings are written in the file ~/.config/speech-dispatcher/spd.conf
using the application spd-conf, which also allows to modify the system settings.
spd-conf allows to set the synthesizer, the language and other voice parameters
but not select directly a specific voice.
Instead a specific voice can be chosen from the client application, selecting it
by name in a proposed list that depends on the synthesizer chosen.
The voice name can be a first name like 'bob' or 'viginie", a locale code in the
form language_COUNTRY or a language code followed by a number, for instance.
The language code associated to each name is listed alongside it between
parenthesis, like (fr-FR) for French from France.
Where to look at in case of a sound or speech issue
---------------------------------------------------
Speech dispatcher links together all the components that contribute to speak a
text, so if you don't get speech at all or something is not spoken, or not the
way you expect, this can come from speech dispatcher itself or from any of those
components (or lack of) and their settings:
- the audio subsystem in use, e.g. alsa or pulseaudio,
- the synthesizer in use, e.g espeak-ng or pico,
- the client application, like Orca or speechd-el or an underlying software like
at-spi,
- the application that provides the text to be spoken, like Firefox.
How to investigate a specific issue goes far beyond this document, but bear in
mind that all listed components can be involved, as the audio equipment in use
and the way it is linked to the computer.
Copyright (C) 2001-2009 Brailcom, o.p.s
Copyright (C) 2018-2020, 2022 Samuel Thibault <samuel.thibault@ens-lyon.org>
Copyright (C) 2018 Didier Spaier <didier@slint.fr>
Copyright (C) 2018 Alex ARNAUD <alexarnaud@hypra.fr>
This program is free software; you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free Software
Foundation; either version 2 of the License, or (at your option) any later
version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY
WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE. See the GNU General Public License for more details (file
COPYING in the root directory).
You should have received a copy of the GNU General Public License
along with this program. If not, see <https://www.gnu.org/licenses/>.
|