File: HACKING

package info (click to toggle)
verbiste 0.1.49-1
  • links: PTS, VCS
  • area: main
  • in suites: sid, trixie
  • size: 4,260 kB
  • sloc: xml: 24,823; ansic: 9,087; sh: 5,258; cpp: 4,480; makefile: 1,021; yacc: 288; perl: 281; lisp: 215; java: 47; sed: 16
file content (185 lines) | stat: -rw-r--r-- 7,708 bytes parent folder | download | duplicates (7)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
HACKING on Verbiste


Sections of this file:

    1. How to add verbs and conjugation templates
    2. How the conjugation code works


1. How to add verbs and conjugation templates

    In the file data/verbs-fr.xml, some lines are of this form:

        <v><i>abouter</i>               <t>aim:er</t></v>
        <v><i>aboutir</i>               <t>fin:ir</t></v>
        <v><i>aboyer</i>                <t>netto:yer</t></v>

    The <i> tag gives the infinitive form and the <t> tag gives the
    "conjugation template" that is followed by the verb.

    The templates are defined in the file data/conjugation-fr.xml.
    In this file, the <template> tag has a name attribute that is
    of the form <radical>:<termination>.  For example, the template
    name "aim:er" means that the non-changing prefix is always "aim".
    Some template names start with a colon (e.g., ":être") because the
    whole word can change in some tenses (e.g., "je suis").

    The <template> tag contains the inflections of the several modes and
    tenses of the French language.  A tense is a list of all applicable
    persons for that tense.  Each person is represented by a <p> tag,
    which can contain zero or more <i> tags, which give the actual text
    of the inflections for that person.

    For example, the indicative present tense of the "aim:er" template
    is written like this:

        <indicative>
                <present>
                        <p><i>e</i></p>
                        <p><i>es</i></p>
                        <p><i>e</i></p>
                        <p><i>ons</i></p>
                        <p><i>ez</i></p>
                        <p><i>ent</i></p>
                </present>

    The six persons listed correspond to the usual pronouns: je, tu,
    il, nous, vous, ils.  A <p> tag contains no <i> tag when it is
    impossible to conjugate the verb for that person and that tense.
    A <p> tag can contain more than one <i> tag when multiple variants
    are widely accepted.  The template "ass:eoir" for example has some
    <p> tags that contain three <i> tags.

    The content of an <i> tag is appended to the radical part of the
    template name to form the complete conjugated form of the verb.
    For example, the radical "aim" followed by the inflection "ons"
    gives "aimons".

    After modifying those two XML files, the command

        make check-data

    can be given from the project's main directory (the parent of the
    'data' directory) to check the validity of the files.  This will
    call xmllint, an XML validation command that comes with libxml2.


2. How the conjugation code works

    Construct an instance of the FrenchVerbDictionary class.

    Get the infinitive form of a verb, not a conjugated one (e.g.,
    "aimer" but not "aimons").

    Convert the verb to lower-case.  It it is encoded in
    Latin-1 (ISO-8859-1), use the tolowerLatin1() method on the
    FrenchVerbDictionary object.

    It the verb is in Latin-1, convert it to UTF-8 with the
    latin1ToUTF8() method of the FrenchVerbDictionary object.

    Get the name of the verb's conjugation template, by calling
    the getVerbTemplate() method.  It this method returns NULL,
    then the given word is not known, or it is not an infinitive
    form known to Verbiste.

        For regular verbs of the first group like "aimer" or "coder",
        the template is named "aim:er".  The colon's position
        represents the fact that in the complete conjugation,
        only the last two letters of the infinitive form will be
        replaced by the appropriate ending (je cod[e], nous cod[ons],
        qu'il cod[ât], etc).  The part that comes before the colon
        is invariant.

        Note that some template names start with a colon because
        the entire word can change in some tenses and persons.
        For example, the past participle of the verb "avoir"
        (to have) is "eu", as in "j'ai eu du pain" (I have had
        some bread).

    Get the conjugation template's complete specification from the
    template name obtained in the last step.  This is done with
    the getTemplate() method.  If this method returns NULL, then
    the given template name is not known to Verbiste.  This should
    not happen with template names obtained from getVerbTemplate().

    Obtain the "radical" part of the given verb with the getRadical()
    method.

        The radical part of a verb is the prefix that stays
        invariant.  This method receives the infinitive form
        of the given word and the corresponding template name.
        If for example the infinitive is "coder" and the template
        name is "aim:er", then the radical part is "cod".  It will
        be concatenated with a series of endings to produce the
        whole conjugation.

    To produce the whole conjugation of a verb, iterate through
    all valid (non composed) modes and tenses.

        The following combinations of modes and tenses are valid
        in French.  The identifiers given here are defined by the
        library in the C++ namespace "verbiste".

            INFINITIVE_MODE     PRESENT_TENSE
            INDICATIVE_MODE     PRESENT_TENSE
            INDICATIVE_MODE     IMPERFECT_TENSE
            INDICATIVE_MODE     FUTURE_TENSE
            INDICATIVE_MODE     PAST_TENSE
            CONDITIONAL_MODE    PRESENT_TENSE
            SUBJUNCTIVE_MODE    PRESENT_TENSE
            SUBJUNCTIVE_MODE    IMPERFECT_TENSE
            IMPERATIVE_MODE     PRESENT_TENSE
            PARTICIPLE_MODE     PRESENT_TENSE
            PARTICIPLE_MODE     PAST_TENSE

        Note that Verbiste does not produce the conjugation for
        the composed tenses (composed past [j'ai codé], anterior
        future [j'aurai codé], etc).  These tenses can be produced
        by using the past participle (e.g., "codé") with a simple
        tense (here, indicative present and indicative future).

    To produce the conjugation for a specific mode-tense combination,
    use the generateTense() method.

        This method requires the radical part of the original
        infinitive, the conjugation template specification,
        the *_MODE value, the *_TENSE value, and a reference to
        a C++ vector of vectors of strings which will receive
        the results.

    Use the resulting structure -- for example to display the
    conjugation for a certain tense.  The strings are in UTF-8.
    The utf8ToLatin1() method can be used to convert to ISO-8859-1.

        The result is of type vector< vector<string> >.  For each
        person (in most tenses: je, tu, il, nous, vous, il), there
        may be zero, one or more ways to conjugate a verb.

        For example, there is only one way to conjugate "coder"
        at the first person singular of the indicative present:
        je code.  But for "payer" (to pay), one can write both
        "je paie" and "je paye".  For some verbs, there is nothing.
        The verb "férir" for example can only be used in the
        infinitive and in the past participle.

        This is why the results are structures the way they are.
        The received vector contains up to six vectors-of-strings.
        For the verb "payer" in the indicative present, the results
        can be represented this way:

            {
                { paie, paye },
                { paies, payes },
                { paie, paye },
                { payons },
                { payez },
                { paient },
            }

    The sources of the "french-conjugation" command should be studied
    as an example of the procedure described here.


$Id: HACKING,v 1.4 2006/08/29 03:00:05 sarrazip Exp $