File: database.html

package info (click to toggle)
bitpim 1.0.7%2Bdfsg1-3
  • links: PTS, VCS
  • area: main
  • in suites: wheezy
  • size: 31,384 kB
  • sloc: python: 267,746; cpp: 2,076; perl: 600; ansic: 409; sh: 226; makefile: 152; sed: 1
file content (295 lines) | stat: -rw-r--r-- 10,644 bytes parent folder | download | duplicates (5)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
  <head>
    <title>BitPim and databases</title>
  </head>

  <body>
    <h1>BitPim and databases</h1>

    <p>BitPim currently stores information as a Python dictionary.
    This information is saved in multiple files (one per information
    type) as sourceable Python code.  The data can be easily inspected
    with a plain text editor.

    <p>Users never need to explicitly load and save data (ie there is
    no need for them to manage the transitioning of data between
    temporary storage - RAM - and persistent storage - disk.)

    <h2>Problems and goals</h2>

    <p>BitPim currently has no undo functionality.  Any edits take
    effect immediately and there is no ability to reverse mistakes.

    <p>It is currently not possible to do a sync.  Syncing requires
    being able to examine two snapshots of data and generate a list of
    changes that were made (eg the name "John Smith" was changed to
    "John Smythe")

    <p>BitPim doesn't work correctly when run concurrently as the same
    user.  The user is not prevented from starting a second instance,
    and multiple instances just continue oblivious to each other.  The
    old solution of preventing multiple instances at startup is no
    longer appropriate since users can and do access their machines
    via different means (eg logging in on the console and logging in
    remotely).  Some programs such as Mozilla/Firefox force you to
    have multiple independent profiles, which is <b>very</b> annoying.

    <p>BitPim currently doesn't support multiple information stores.
    This happens if there are multiple users who login as the same
    user at the operating system level.  There is <a
    href="http://bitpim.org/testhelp/howto-multipleusers.htm">some
    advice</a> in the online doc, which amounts to switching the
    preferences behind BitPim's back before starting to switch the
    main data directory.

    <p>Care also needs to be taken over dealing with version issues.
    This means BitPim starting up with an older version of the saved
    data, or the saved data being a newer format than the current
    version.

    <p>BitPim currently holds all data in memory.  This makes memory
    consumption equal the amount of data, and can get very large.

    <h2>Solution - SQLite</h2>

    <p>BitPim will be migrating to use the SQLite database.  SQLite is
    accessed using SQL syntax and is an embedded database - you have
    it as part of your program and do not contact it over the network.
    The Python wrapper is pysqlite and everything is available under
    appropriate licenses and is available on all platforms.

    <p>It has many many other nice properties such as using a single
    file, being safe for usage in multi-threaded and multi-process
    environments, is ACID compliant (Atomic, Consistent, Isolated, and
    Durable), survives power and unexpected program termination, etc.
    There is no access control or other security issues to deal with.
    The only requirement is access to the single file (via normal
    filesystem and process permissions).

    <p>Version 3 of SQLite uses unicode natively for strings, supports
    BLOBS (binary large objects), and allows unlimited field size.

    <p>SQLite is also different than other databases in that the type
    of a field is attached to each value in each record, rather than
    to the column as a whole.  This is very similar to how Python
    works where the type of a value is attached to the value itself,
    not to the name it is given.  (Contrast with C/Java where the type
    is associated with the variable name, not the value).

    <p>Some more reading on SQLite:

      <ul>
      <li><a href="http://www.sqlite.org">Main web site</a>

      <li><a
      href="http://www.sqlite.org/php2004/slides-all.html">Internal
      architecture</a>.  Also includes tips on how to corrupt the
      database :-)

    </ul>

    <h2>Startup/meta information</h2>

    <p>One table will contain meta information.  Primarily this will
    be the version of BitPim to which the the database corresponds.

    <p>On startup, BitPim will inspect the version information.  If it
    is older than the current version of BitPim then a copy of the
    file will be made.

      <blockquote>For example, if the current version of BitPim is 1.2
      and the database says it is for 1.1 then a copy will be made as
      <code>foo-1.1-`date`</code>
    </blockquote>

    <h2>Main method</h2>

    <p>The main data type used in BitPim will be the dict, as is
    currently the case.  dicts will be saved to tables with each dict
    key being a column in the table.  <code>None</code> values in
    Python are mapped to <code>null</code> in SQL.  When reading from
    the table, a dict is produced based on the columns.  Note that
    columns with a null value will not have any key in the returned
    dict.

    <p>When saving to a table, the support code will automatically
    create columns as needed, that default to <code>null</code>.

    <p>Each table will be a journal.  Existing rows will never be
    modified, only new rows added at the end.  A distinct entry is
    identified by a unique identifier and is stored in a column named
    <code>__uid__</code>.  Consequently a table will typically look
    like this:

    <p><blockquote><table border=1 cellpadding=5>
      <tr><th>primary key<br>(integer)
          <th>Name
          <th>Phone number
          <th>__uid__
        </tr>

        <tr>
          <td>0
          <td>John Smith
          <td>123456
          <td>0x4523
        </tr>

        <tr>
          <td>1
          <td>Fred Bloggs
          <td>7676987897
          <td>0x8769
        </tr>

        <tr>
          <td>2
          <td>John Smythe
          <td>123456
          <td>0x4523
        </tr>

        <tr>
          <td>3
          <td>Spiderman
          <td>435435345
          <td>0x7888
        </tr>

        <tr>
          <td>4
          <td>John Smythe
          <td>123888
          <td>0x4523
        </tr>
        
      </table>

    </blockquote>

    <p>You can see how the "John Smith" record was editted at row 2,
    and again at row 4.  To produce the list of "current" records, the
    last entry the table for a particular <code>uid</code> is used.

    <p>There will be an additional <code>__timestamp__</code> column.
    That will allow for retrieving old values for any record, as well
    as archiving off very old values (eg if the user doesn't care
    about anything old than 6 months).

    <p>There will also be a <code>__deleted__</code> column which is
    set to true when a particular uid is deleted.

    <p>The uid will actually be a long unique string.  For phonebook
    records, it will be the bitpim serial.

    <p>This scheme will allow easy undo's since you can always find
    out what any particular record used to look like.  You can also
    track a mass action (eg 10 records being selected and then all
    deleted at the same time) since they will have the same timestamp.

    <p>Undo will also be possible between runs of the program, and
    even amongst multiple running concurrent instances!

    <h2>Implementation plan</h2>

    <p>The initial implementation will create a new
    <code>database.py</code> file.  The existing phonebook code will
    point to this new module.  Code in <code>database.py</code> will
    continue to use the existing routines that read and write
    <code>index.idx</code> as well as to the sqlite database.  The data
    will be compared between the two to ensure it is working
    correctly.  Once we are certain the code talking to sqlite is
    correct, then the code using <code>index.idx</code> will be
    switched off.

    <p>The process will then be repeated for the other data types
    (calendar, wallpaper and ringtones).

    <p>New data sources (SMS, call history, voice and text memos) will
    just use the sqlite database exclusively.

    <h2>Undecided issue - lists of dicts</h2>
    
    <p>Almost every field in a phonebook entry is a list of dicts.
    This can be stored as a single value (the string representation),
    or a new table can be created, and redirect to that.  Both
    approaches are shown below for the phone numbers column with most
    other columns omitted for the purpose of clarification.

    <h3>Stringized</h3>

    <p><blockquote><table border=1 cellpadding=5>
          <tr><th>primary key<br>(integer)
          <th>Name
          <th>Phone numbers
          </tr>
          
          <tr><td>0
            <td>John Smith
            <td>[{'number': '1234567890', 'type': 'home'}, {'number': '233423423', 'type': 'work'}]
          </tr>
            
      </table>
      </blockquote>

   <h3>Indirect table</h3>
    <p><blockquote><table border=1 cellpadding=5>
              <tr><th>primary key<br>(integer)
                <th>Name
                <th>Phone numbers
              </tr>
          
              <tr><td>0
                <td>John Smith
                <td>__numbertable:0,2
          </tr>
      </table>
      
      <p><b>_numbertable</b> is:
      <p><table border=1 cellpadding=5>
              <tr><th>primary key<br>(integer)
                <th>number
                <th>type
              </tr>
          
              <tr><td>0
                <td>1234567890
                <td>home
              </tr>

              <tr><td>1
                <td>76547657
                <td>cell
              </tr>

              <tr><td>2
                <td>233423423       
                <td>work
              </tr>

      </table>
</blockquote>

    <p>I am leaning towards implementig both schemes in
    <code>database.py</code> since they are not mutually exclusive and
    then see how it goes.  My instinct is inclined towards the
    indirect table since it will save space, and allows faster
    searches down the road.

    <h2>Undecided issue - bitmaps and wallpapers</h2>

    <p>sqlite does allow storing blobs (binary large objects) in the
    database itself, so we could store the actual files directly in
    the database.  The other alternative is store the files on disk
    with non-descript names (eg <code>0000001.jpg</code>) and then
    point to the file from the relevant records.

    <p>My instinct is for the latter approach for wallpaper and
    ringtones since it will keep the database smaller.  For other file
    like items such as text memos and SMS messages, I would be
    inclined to keep them directly in the database.


  </body>
</html>