File: TODO

package info (click to toggle)
presage 0.9.1-2.6
  • links: PTS, VCS
  • area: main
  • in suites: sid, trixie
  • size: 11,768 kB
  • sloc: cpp: 86,282; sh: 11,775; ansic: 4,043; python: 1,218; makefile: 1,026; cs: 1,009; xml: 57
file content (286 lines) | stat: -rw-r--r-- 9,825 bytes parent folder | download | duplicates (5)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
Copyright (C) 2008  Matteo Vescovi <matteo.vescovi@yahoo.co.uk>
___________________
The Presage project
~~~~~~~~~~~~~~~~~~~

TODO list
---------


GUI apps:
* qprompter
** integrate into build system
* gprompter
** gray in and out redo and undo menu items
** toolbar icon size
** autocomp max height
** would be nice to have status bar with KSR rate

Architectural restructure: 
- n-gram language model database format and database connector

  The current database format stores the string in all n-grams,
  i.e. for "the quick brown" fragment we'll have

  1-gram: <word, count>
  <the, 20>

  2-gram: <word_1, word, count>
  <the, brown, 10>

  3-gram: <word_2, word_1, word, count>
  <the, quick, brown, 1>

  A possibly more time-efficient and space-efficient approach to
  structuring the language model involves having n-gram records refer
  to (n-1)-gram records instead of repeating the word strings, i.e.:

  1-gram: <uid, word, count>
  <1023, the, 20>

  2-gram: <uid, 1-gram, word, count>
  <2204, 1023, brown, 10>

  3-gram: <uid, 2-gram, word, count>
  <3452, 2204, brown, 1>

  To build up the full 3-gram string "the quick brown", the references
  to the 2-gram and 1-gram need to be walked. However, the predictive
  algorithm needs to look at the counts for all k-grams where k is in
  [1, n], so this would not be an additional time cost. The database
  size would reduce as it would not need to store repetitions of the
  words in each n-gram table.

- selector
  should be a class similar to current PredictorActivator i.e. a class
  that invokes other classes' method to perform work.
  Current Selector's functionality should be broken up in Filter
  objects i.e. an abstract Filter class and implementation of various
  filters (repetion filter, greedy filter, etc)

- combiner
  clean up the mess that is our current Predictor implementation,
   particularly with regards to the Combiner handling and
   implementation. Considering making Combiner a concrete class that
   uses different CombinationStrategy objects to do combine
   predictions. Combiner object would know how to retrieve its config
   values and which Strategy to create and use.

- registry [DONE]:

  Predictor class functionality should be split up. There should be
  one PluginRegitry class which holds the active plugins and whose
  interface consists of a call that returns an iterator to the
  plugins.

  Predictor would obtain an iterator from PluginRegistry and invoke
  the predict() method on each Plugin pointed to by the iterator.

  A new Learner class could invoke the learn() method on them when
  needed.

  This way, the reverse dependency that implementing learning cause
  between ContextTracker and Predictor would disappear, being
  substituted by a single dependency on Registry and the introduction
  of a new Learner class (name still to decide).

  The registry should eventually just be a simple wrapper around
  plump.


Short term:
* Logger
 - implement logger level inheritance from parent module
 - SqliteDatabaseConnector callback: had to disable logging there because
   static method, investigate on how it can be re-enabled
* test performance with different n values in n-gram
* Consider removing the following public methods from Variable
   interface:
   . Variable(const std::vector<std::string>& variable);
   . size_t size() const;
* consider removing src/tools/ngram.* code
* smoothed n-gram predictor
  - is it possible to reduce calls to count() to improve performance?
* rationalise user-specific and system data files location and config files location
  - option to comply with XDG basedir spec for config files and data files
* add proper unicode support
* determine whether to enable dictionary plugin by default
   (dictionary file?)
* rewrite strtoupper and strtolower utility functions to use a pointer
   to function to do the individual char conversion
* add ContextTracker tests for control chars
* put everything inside the presage namespace
* write more integration tests
* write Combiner implementations (various combination strategies)
* add more tests, increase test coverage
* bug: validate string passed to sql_exec query function, unsanitized
   string can cause security problems
* implement activation map predictive plugin

- try to improve reverseTokenizer::progress() accuracy
   currently it uses a delta of 0.7, should try to get it down to 0.3
- Class ContextTracker could initialize Tokenizer's members separator
   and blankspace on a member initializer list. Also, Tokenizer could
   take references to string instead of pointers.

Medium term:
* fix character codes
* integration of the plump framework

Long term:
* use timer alarm to implement threaded predictor activator
* improve exceptions handling
* add more predictive plugins

Longer term:
* add gettext support




VARIOUS NOTES
=============

Plugins and Profiles and Managers
---------------------------------

A problem arises when a profile requires that more than one instance
of a Plugin object is created.

profile: pluginA, pluginB, pluginA

plugins: pluginA, pluginB, pluginA

libraries: libpluginA, libpluginB

We need to be able to distinguish (therefore separately manage) plugin
objects and library objects and profile objects.

libpluginA --->	pluginA
            |
libpluginB -+->	pluginB
            |
	    `->	pluginA

ProfileManager should invoke the construction of Plugin objects and
initiate their option values using a PluginFactory class.

PluginManager should manager the association between a Plugin object
and the module (library) object that contains the Plugin.

Plump, the Pluggable Lightweight Multithreaded Platform, was created
to solve this and other problems and is going to become presage's
plugin framework implementation.


Plump framework integration
~~~~~~~~~~~~~~~~~~~~~~~~~~~

   The dynamic loading and plugin management system currently implemented
   is going to be scrapped in favour of the more general and portable
   plump framework.

   Plump is a Pluggable Lightweight Ubiquitous Multithreaded Platform
   which makes integration, usage and deployment of a plugin framework
   dead easy.

   Plump integration into presage will require a number of changes to
   presage architecture, affecting Predictor and PluginManager
   classes in particular.

   Predictor and PluginManager classes will delegate much of their
   current functionality to plump. Plump will render the functionality
   provided by PluginManager redundant, as everything that
   PluginManager does will be done by plump. Similarly, part of the
   Predictor class functionality will be replaced by plump too.

   Predictor was intended to be used to execute the plugins in a
   serial or parallel mode. Plump will do that. Predictor will still
   be in charge of collecting the result of each plugin's run and
   combining them into a global prediction.

   PluginManager was in fact a lesser plump. PluginManager can be
   considered a precursor to plump. Plump has been designed to solve
   the same problems that PluginManager was intended to solve, plus a
   bit more.


Plugins creation and initialisation
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

A few things should happen:
   plugin objects should be instantiated based on configuration files,
   that is if the configuration file uses the plugin, then an instance
   of the corresponding class implementing the plugin should be
   instantiated

   plugin objects should be initialised with the options contained in
   the configuration file

The most sensible way to achieve this requirements seems to revolve
around having a plugin factory class which:

   determines which and how many instances of plugin classes need to
   be instantiated from the xml configuration file

   passes a pointer to the root the xml representation of the options
   specific to that plugin so that the plugin constructor can
   initialise its internal state accordingly

This results in:

   plugins know how to initialise themselves
   the information required for initizialisation is passed to the
   plugin's constructor
   the information is passed in xml parse tree format


Points to ponder:
(o) the plugin factory needs to be able to determine which plugin
   class to instantiate a plugin from based on the content of the
   configuration file (xml file). A solution could be that the module
   implementing the plugin class exports a string corresponding to the
   plugin type/name.
(o) it is necessary to be able to associate a plugin object with
   initialisation data. In other words, each plugin class needs to
   have an associated string that describes its kind. Or we can use
   run-time type information.
(o) in light of all this, it is probably worth designing a versioning
   system for plugin classes to be implemented as exported symbols in
   the plugin module.




STEP to autoconfiscate
~~~~~~~~~~~~~~~~~~~~~~

aclocal
libtoolize --force --ltdl
autoheader
autoconf
automake -a --copy

or source the bootstrap script provided (in svn repo):
. bootstrap


########/

Copyright (C) 2008  Matteo Vescovi <matteo.vescovi@yahoo.co.uk>

Presage is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License along
with this program; if not, write to the Free Software Foundation, Inc.,
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.

########\