File: rule_format.txt

package info (click to toggle)
cicero 0.7.2-4
  • links: PTS, VCS
  • area: contrib
  • in suites: buster
  • size: 384 kB
  • sloc: python: 1,235; makefile: 25; sh: 6
file content (190 lines) | stat: -rw-r--r-- 6,294 bytes parent folder | download | duplicates (3)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
Phonetic rule definitions
=========================

A phonetic rule is of the form:

	(left context) [[ target ]] (right context) -> (phoneme list)

The engine is therefore substituting the target letter or string with the 
specified phoneme symbol according to the first rule that matches.

For example (taken from the French rules):

	[[ eau ]] -> o

In french, the sequence "eau" is always pronounced "o".

	pati [[ en ]] -> a~
	[[ en ]] n -> E

Those are more examples with some contextual restrictions.  The "en" in 
"patience" should be pronounced "a~" while the "en" in "penne" or "ennemi" 
is pronounced "E".  That's where the context matching comes into play to 
distinguish between those two "en" cases.

The left and right context are actual regular expressions.  For example 
you can have:

	[[ eu ]] [bfilnprv] -> @

which means that "eu" immediately followed by one of b, f, i, l, n, p, r 
or v is pronounced "@".

Or:

	(ba|com) [[ p ]] t ->

means that the "p" in words like "baptiser" or "compter" is silent 
(empty phoneme list).

The list of available phonemes depends on the voice database used with 
mbrola.  Please see the documentation for the given database you wish to 
use.

If you're not familiar with regular expressions already, it is strongly 
recommended that you learn about them first before reading any further.  
Documentation on regular expressions is available from many sources.  
Here's only a few of them, listed in increasing order of relevance:

	1) man 7 regex

	2) info regex

	3) man perlretut

	4) http://www.python.org/doc/2.3/lib/re-syntax.html

	5) http://www.amk.ca/python/howto/regex/

The Python Regular Expression HOWTO (number 5 above) is a must, and the 
Python Regular Expression Syntax (number 4 above) is the definite 
reference since the phonetic rule matching is all based upon Python's 
regular expression support.


Class substitutions
===================

Since everything is converted to lowercase before applying rules, we used 
uppercase letters to define handy "classes" which are just in fact kind of 
macros to substitute long and/or less obvious regular expressions.  For 
example:

	CLASS V [aeiouyàâéèêëîïôöùûü]

so whenever V is used in a match description, it gets substituted by 
[aeiouyàâéèêëîïôöùûü] which is a more convenient way to specify any french 
vowel.

	CLASS C [bcçdfghjklmnñpqrstvwxz]

Is for consonants, and then:

	CLASS L (?:V|C)

for any letter.

Then you can use those in context rules:

	(V|CCan) [[ s ]] V -> z

so to match "baiser" or "transition" for example.

And finally, some classes to mark either punctuations (P), the beginning 
of a word (S) or the end of a word (T):

	CLASS P [\,\.\;\:\!\?]
	CLASS S (?:^|_|P|\')
	CLASS T (?:$|_|P)

Note that the _ denotes a space.

So we now can write:

	sp [[ ect ]] s?T -> E

to mean that any "ect" ending a word preceded by "sp", including the 
possible plural form, should be pronounced "E".

Note that those classes may not be used in the target match between [[ ]].
You can have a look at the rules.en file for more examples of class usage.


Prefilter definitions
=====================

Those are regular expressions, too, to process text before applying 
phonetic rules.  This is used to convert everything to spelled out text, 
like numbers, special symbols, abbreviations, etc.  The syntax is quite 
straight forward:

	<regular_expression> -> "replacement string"

See the rules.fr file for example and/or inspiration.


Regression testing
==================

Whenever an addition and/or modification to the rule file is performed, 
please consider adding entries to the regression test file.  This ensures 
that no regression is introduced by your changes and that the newly 
handled cases won't be accidentally lost by future changes.  The format is 
simple with one entry per line as follows:

	<text> -> <phoneme list>

where <text> is any text input that may include multiple words or numbers, 
and <phoneme list> the resulting phoneme translation.  See the 
checklist.fr file for example which is the current French regression check 
list.

To add entries to the regression file, the wphons.py tool can be used as 
it prints on its standard output the translated phonemes for a given 
string.  For example, a quick way to add to the French regression file 
would be:

	./wphons.py "Les poules couvent au couvent." >> checklist.fr

And finally, to test it all simply run the regress.py tool which will 
check all entries and report any mismatch. 


Prosodic rule definitions
=========================

Our prosodic processing is extremely simple (no grammatical analysis of 
the original text).  It relies on surrounding punctuations and the number 
of syllables found in a word to apply a speed and pitch curve pattern to any 
given word.  For example:

	PROSO_SPEED . -30, 10

That means the word that is followed by a period (usually the end of a 
sentence) will have its phonemes' duration stretched gradually (or slowed 
down) up to 30% of the default duration towards the end of the word.  The 
next word after the period will have its first phonemes pronounced 10% 
faster at the beginning of the word with a gradual return to the default 
duration.

	PROSO_PITCH . [1] {"100 70"}
	PROSO_PITCH . [2] {"100 110", "100 70"}
	PROSO_PITCH . [3] {"0 120", "100 100", "100 70"}
	PROSO_PITCH . [4] {"0 110", "100 120", "100 100", "100 70"}

Those are the pitch curve applied to the word preceding a period 
according to the number ov vowels it contains.  Each tuple is a location 
expressed in percent of the vowel duration and a pitch factor.  Those are 
passed straight to mbrola (you can look at the mbrola documentation for 
more explanations on those).

For example, if you have "Hello." that's 2 vowels so the pitch would 
reach 110 when 100% of the "e" is pronounced, then to drop down to a 
pitch of 70 when 100% of the "o" is pronounced, as well as slowing down.

Note again that _ is a special punctuation to mean a space.

The PHO statement maps a specific phoneme to a class used to determine if 
it constitutes a vowel or not for the prosodic processing, and a default 
duration in milliseconds that can be stretched or shortened according to 
prosodic rules.  All used phonemes must be listed.