1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185
|
HACKING on Verbiste
Sections of this file:
1. How to add verbs and conjugation templates
2. How the conjugation code works
1. How to add verbs and conjugation templates
In the file data/verbs-fr.xml, some lines are of this form:
<v><i>abouter</i> <t>aim:er</t></v>
<v><i>aboutir</i> <t>fin:ir</t></v>
<v><i>aboyer</i> <t>netto:yer</t></v>
The <i> tag gives the infinitive form and the <t> tag gives the
"conjugation template" that is followed by the verb.
The templates are defined in the file data/conjugation-fr.xml.
In this file, the <template> tag has a name attribute that is
of the form <radical>:<termination>. For example, the template
name "aim:er" means that the non-changing prefix is always "aim".
Some template names start with a colon (e.g., ":être") because the
whole word can change in some tenses (e.g., "je suis").
The <template> tag contains the inflections of the several modes and
tenses of the French language. A tense is a list of all applicable
persons for that tense. Each person is represented by a <p> tag,
which can contain zero or more <i> tags, which give the actual text
of the inflections for that person.
For example, the indicative present tense of the "aim:er" template
is written like this:
<indicative>
<present>
<p><i>e</i></p>
<p><i>es</i></p>
<p><i>e</i></p>
<p><i>ons</i></p>
<p><i>ez</i></p>
<p><i>ent</i></p>
</present>
The six persons listed correspond to the usual pronouns: je, tu,
il, nous, vous, ils. A <p> tag contains no <i> tag when it is
impossible to conjugate the verb for that person and that tense.
A <p> tag can contain more than one <i> tag when multiple variants
are widely accepted. The template "ass:eoir" for example has some
<p> tags that contain three <i> tags.
The content of an <i> tag is appended to the radical part of the
template name to form the complete conjugated form of the verb.
For example, the radical "aim" followed by the inflection "ons"
gives "aimons".
After modifying those two XML files, the command
make check-data
can be given from the project's main directory (the parent of the
'data' directory) to check the validity of the files. This will
call xmllint, an XML validation command that comes with libxml2.
2. How the conjugation code works
Construct an instance of the FrenchVerbDictionary class.
Get the infinitive form of a verb, not a conjugated one (e.g.,
"aimer" but not "aimons").
Convert the verb to lower-case. It it is encoded in
Latin-1 (ISO-8859-1), use the tolowerLatin1() method on the
FrenchVerbDictionary object.
It the verb is in Latin-1, convert it to UTF-8 with the
latin1ToUTF8() method of the FrenchVerbDictionary object.
Get the name of the verb's conjugation template, by calling
the getVerbTemplate() method. It this method returns NULL,
then the given word is not known, or it is not an infinitive
form known to Verbiste.
For regular verbs of the first group like "aimer" or "coder",
the template is named "aim:er". The colon's position
represents the fact that in the complete conjugation,
only the last two letters of the infinitive form will be
replaced by the appropriate ending (je cod[e], nous cod[ons],
qu'il cod[ât], etc). The part that comes before the colon
is invariant.
Note that some template names start with a colon because
the entire word can change in some tenses and persons.
For example, the past participle of the verb "avoir"
(to have) is "eu", as in "j'ai eu du pain" (I have had
some bread).
Get the conjugation template's complete specification from the
template name obtained in the last step. This is done with
the getTemplate() method. If this method returns NULL, then
the given template name is not known to Verbiste. This should
not happen with template names obtained from getVerbTemplate().
Obtain the "radical" part of the given verb with the getRadical()
method.
The radical part of a verb is the prefix that stays
invariant. This method receives the infinitive form
of the given word and the corresponding template name.
If for example the infinitive is "coder" and the template
name is "aim:er", then the radical part is "cod". It will
be concatenated with a series of endings to produce the
whole conjugation.
To produce the whole conjugation of a verb, iterate through
all valid (non composed) modes and tenses.
The following combinations of modes and tenses are valid
in French. The identifiers given here are defined by the
library in the C++ namespace "verbiste".
INFINITIVE_MODE PRESENT_TENSE
INDICATIVE_MODE PRESENT_TENSE
INDICATIVE_MODE IMPERFECT_TENSE
INDICATIVE_MODE FUTURE_TENSE
INDICATIVE_MODE PAST_TENSE
CONDITIONAL_MODE PRESENT_TENSE
SUBJUNCTIVE_MODE PRESENT_TENSE
SUBJUNCTIVE_MODE IMPERFECT_TENSE
IMPERATIVE_MODE PRESENT_TENSE
PARTICIPLE_MODE PRESENT_TENSE
PARTICIPLE_MODE PAST_TENSE
Note that Verbiste does not produce the conjugation for
the composed tenses (composed past [j'ai codé], anterior
future [j'aurai codé], etc). These tenses can be produced
by using the past participle (e.g., "codé") with a simple
tense (here, indicative present and indicative future).
To produce the conjugation for a specific mode-tense combination,
use the generateTense() method.
This method requires the radical part of the original
infinitive, the conjugation template specification,
the *_MODE value, the *_TENSE value, and a reference to
a C++ vector of vectors of strings which will receive
the results.
Use the resulting structure -- for example to display the
conjugation for a certain tense. The strings are in UTF-8.
The utf8ToLatin1() method can be used to convert to ISO-8859-1.
The result is of type vector< vector<string> >. For each
person (in most tenses: je, tu, il, nous, vous, il), there
may be zero, one or more ways to conjugate a verb.
For example, there is only one way to conjugate "coder"
at the first person singular of the indicative present:
je code. But for "payer" (to pay), one can write both
"je paie" and "je paye". For some verbs, there is nothing.
The verb "férir" for example can only be used in the
infinitive and in the past participle.
This is why the results are structures the way they are.
The received vector contains up to six vectors-of-strings.
For the verb "payer" in the indicative present, the results
can be represented this way:
{
{ paie, paye },
{ paies, payes },
{ paie, paye },
{ payons },
{ payez },
{ paient },
}
The sources of the "french-conjugation" command should be studied
as an example of the procedure described here.
$Id: HACKING,v 1.4 2006/08/29 03:00:05 sarrazip Exp $
|