1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295
|
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<title>BitPim and databases</title>
</head>
<body>
<h1>BitPim and databases</h1>
<p>BitPim currently stores information as a Python dictionary.
This information is saved in multiple files (one per information
type) as sourceable Python code. The data can be easily inspected
with a plain text editor.
<p>Users never need to explicitly load and save data (ie there is
no need for them to manage the transitioning of data between
temporary storage - RAM - and persistent storage - disk.)
<h2>Problems and goals</h2>
<p>BitPim currently has no undo functionality. Any edits take
effect immediately and there is no ability to reverse mistakes.
<p>It is currently not possible to do a sync. Syncing requires
being able to examine two snapshots of data and generate a list of
changes that were made (eg the name "John Smith" was changed to
"John Smythe")
<p>BitPim doesn't work correctly when run concurrently as the same
user. The user is not prevented from starting a second instance,
and multiple instances just continue oblivious to each other. The
old solution of preventing multiple instances at startup is no
longer appropriate since users can and do access their machines
via different means (eg logging in on the console and logging in
remotely). Some programs such as Mozilla/Firefox force you to
have multiple independent profiles, which is <b>very</b> annoying.
<p>BitPim currently doesn't support multiple information stores.
This happens if there are multiple users who login as the same
user at the operating system level. There is <a
href="http://bitpim.org/testhelp/howto-multipleusers.htm">some
advice</a> in the online doc, which amounts to switching the
preferences behind BitPim's back before starting to switch the
main data directory.
<p>Care also needs to be taken over dealing with version issues.
This means BitPim starting up with an older version of the saved
data, or the saved data being a newer format than the current
version.
<p>BitPim currently holds all data in memory. This makes memory
consumption equal the amount of data, and can get very large.
<h2>Solution - SQLite</h2>
<p>BitPim will be migrating to use the SQLite database. SQLite is
accessed using SQL syntax and is an embedded database - you have
it as part of your program and do not contact it over the network.
The Python wrapper is pysqlite and everything is available under
appropriate licenses and is available on all platforms.
<p>It has many many other nice properties such as using a single
file, being safe for usage in multi-threaded and multi-process
environments, is ACID compliant (Atomic, Consistent, Isolated, and
Durable), survives power and unexpected program termination, etc.
There is no access control or other security issues to deal with.
The only requirement is access to the single file (via normal
filesystem and process permissions).
<p>Version 3 of SQLite uses unicode natively for strings, supports
BLOBS (binary large objects), and allows unlimited field size.
<p>SQLite is also different than other databases in that the type
of a field is attached to each value in each record, rather than
to the column as a whole. This is very similar to how Python
works where the type of a value is attached to the value itself,
not to the name it is given. (Contrast with C/Java where the type
is associated with the variable name, not the value).
<p>Some more reading on SQLite:
<ul>
<li><a href="http://www.sqlite.org">Main web site</a>
<li><a
href="http://www.sqlite.org/php2004/slides-all.html">Internal
architecture</a>. Also includes tips on how to corrupt the
database :-)
</ul>
<h2>Startup/meta information</h2>
<p>One table will contain meta information. Primarily this will
be the version of BitPim to which the the database corresponds.
<p>On startup, BitPim will inspect the version information. If it
is older than the current version of BitPim then a copy of the
file will be made.
<blockquote>For example, if the current version of BitPim is 1.2
and the database says it is for 1.1 then a copy will be made as
<code>foo-1.1-`date`</code>
</blockquote>
<h2>Main method</h2>
<p>The main data type used in BitPim will be the dict, as is
currently the case. dicts will be saved to tables with each dict
key being a column in the table. <code>None</code> values in
Python are mapped to <code>null</code> in SQL. When reading from
the table, a dict is produced based on the columns. Note that
columns with a null value will not have any key in the returned
dict.
<p>When saving to a table, the support code will automatically
create columns as needed, that default to <code>null</code>.
<p>Each table will be a journal. Existing rows will never be
modified, only new rows added at the end. A distinct entry is
identified by a unique identifier and is stored in a column named
<code>__uid__</code>. Consequently a table will typically look
like this:
<p><blockquote><table border=1 cellpadding=5>
<tr><th>primary key<br>(integer)
<th>Name
<th>Phone number
<th>__uid__
</tr>
<tr>
<td>0
<td>John Smith
<td>123456
<td>0x4523
</tr>
<tr>
<td>1
<td>Fred Bloggs
<td>7676987897
<td>0x8769
</tr>
<tr>
<td>2
<td>John Smythe
<td>123456
<td>0x4523
</tr>
<tr>
<td>3
<td>Spiderman
<td>435435345
<td>0x7888
</tr>
<tr>
<td>4
<td>John Smythe
<td>123888
<td>0x4523
</tr>
</table>
</blockquote>
<p>You can see how the "John Smith" record was editted at row 2,
and again at row 4. To produce the list of "current" records, the
last entry the table for a particular <code>uid</code> is used.
<p>There will be an additional <code>__timestamp__</code> column.
That will allow for retrieving old values for any record, as well
as archiving off very old values (eg if the user doesn't care
about anything old than 6 months).
<p>There will also be a <code>__deleted__</code> column which is
set to true when a particular uid is deleted.
<p>The uid will actually be a long unique string. For phonebook
records, it will be the bitpim serial.
<p>This scheme will allow easy undo's since you can always find
out what any particular record used to look like. You can also
track a mass action (eg 10 records being selected and then all
deleted at the same time) since they will have the same timestamp.
<p>Undo will also be possible between runs of the program, and
even amongst multiple running concurrent instances!
<h2>Implementation plan</h2>
<p>The initial implementation will create a new
<code>database.py</code> file. The existing phonebook code will
point to this new module. Code in <code>database.py</code> will
continue to use the existing routines that read and write
<code>index.idx</code> as well as to the sqlite database. The data
will be compared between the two to ensure it is working
correctly. Once we are certain the code talking to sqlite is
correct, then the code using <code>index.idx</code> will be
switched off.
<p>The process will then be repeated for the other data types
(calendar, wallpaper and ringtones).
<p>New data sources (SMS, call history, voice and text memos) will
just use the sqlite database exclusively.
<h2>Undecided issue - lists of dicts</h2>
<p>Almost every field in a phonebook entry is a list of dicts.
This can be stored as a single value (the string representation),
or a new table can be created, and redirect to that. Both
approaches are shown below for the phone numbers column with most
other columns omitted for the purpose of clarification.
<h3>Stringized</h3>
<p><blockquote><table border=1 cellpadding=5>
<tr><th>primary key<br>(integer)
<th>Name
<th>Phone numbers
</tr>
<tr><td>0
<td>John Smith
<td>[{'number': '1234567890', 'type': 'home'}, {'number': '233423423', 'type': 'work'}]
</tr>
</table>
</blockquote>
<h3>Indirect table</h3>
<p><blockquote><table border=1 cellpadding=5>
<tr><th>primary key<br>(integer)
<th>Name
<th>Phone numbers
</tr>
<tr><td>0
<td>John Smith
<td>__numbertable:0,2
</tr>
</table>
<p><b>_numbertable</b> is:
<p><table border=1 cellpadding=5>
<tr><th>primary key<br>(integer)
<th>number
<th>type
</tr>
<tr><td>0
<td>1234567890
<td>home
</tr>
<tr><td>1
<td>76547657
<td>cell
</tr>
<tr><td>2
<td>233423423
<td>work
</tr>
</table>
</blockquote>
<p>I am leaning towards implementig both schemes in
<code>database.py</code> since they are not mutually exclusive and
then see how it goes. My instinct is inclined towards the
indirect table since it will save space, and allows faster
searches down the road.
<h2>Undecided issue - bitmaps and wallpapers</h2>
<p>sqlite does allow storing blobs (binary large objects) in the
database itself, so we could store the actual files directly in
the database. The other alternative is store the files on disk
with non-descript names (eg <code>0000001.jpg</code>) and then
point to the file from the relevant records.
<p>My instinct is for the latter approach for wallpaper and
ringtones since it will keep the database smaller. For other file
like items such as text memos and SMS messages, I would be
inclined to keep them directly in the database.
</body>
</html>
|