1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198
|
---------------------------------------------------------
SQLITE DRIVER IDEAS, ISSUES, PROPOSALS
Copyright (C) 2003 Jarosław Staniek staniek at kde.org
Started: 2003-07-09
---------------------------------------------------
1. In most situations (especially on massive data operations) we do not want get types of the columns,
so:
PRAGMA show_datatypes = OFF;
2. SQLite automatically adds primary key to the table if there is no such key.
Such pkey column is not visible for statemets like 'select * from table',
'select oid,* from table' need to be executed to also get this special column.
See section '3.1 The ROWID of the most recent insert' of c_interface.html file.
3. For smaller tables (how small? -- add configuration for this) sqlite_get_table() 'in memory'
function could be used to speed up rows retrieving.
4. Queries entered by user in the Query Designer should be checked for syntactically or logically validity and transformed to SQLite-compatible form befor execution. It is nonsense to ask SQLite engine if the given sql statement is valid, because then we wouldn't show too detailed error message to the user.
5. SQLite not only doesn't handles column types but also doesn't checks value sizes, eg. it is possible to insert string of length 100 to the column of size 20.
These checks should be made in KDb SQLite driver. In fact for each driver these checks could be made because user wants get a descriptive, localized, friendly message what's wrong. No single engine provides this of course. We need to store such a parameters like field size in project meta-data as sqlite doesn't stores that in any convenient way. It stores only 'CREATE TABLE' statement, as is.
6. Possible storage methods for SQLite database embedded in KDb project:
A. Single SQLite-compatible database file (let's name it: .sqlite file)
- Advantages: Best deal for bigger databases - no need for rewriting data form SQLite file to another,
fastest open and save times. DB data consumes disk space only once. Other applications that uses SQLite library could also make use of standard format of .sqlite file's contents. KDb project and data would be easily, defacto, separated, what is considered as good method in DB programming.
- Disadvantages: User (who may want to transfer a database) need to know that .kexi file doesn't stores his data but .sqlite is for that.
B. Single SQLite-compatible database file embedded inside a KDb project .kexi file.
SQLite requires an access to a file in its own (raw) format to be available somewhere in the path. If SQLite storing layer could be patched to adding an option for seek to given file position, sqlite data can be stored after KDb project data. When sqlite raw data file could be saved after a KDb project's data, rewriting the project contents should be performed (and this is done quite frequently). So, finally storing both files concatenated during normal operations is risky, costly and difficult to implement cleanly.
- Advantages: User do not need to know that there is sqlite used in KDb as embedded DB engine (and even if there is any sql engine). Transferring just one file between machines means successfully transferring data and project.
- Disadvantages: lack of everything described as advantages of A. method: difficult and costly open and save operations (unless SQLite storing layer could be patched).
Extensions and compilations of the both above methods:
- .sqlite files are really good compressable, so compress option can be added (if not for regular saving, then at least for "Email project & data" or 'Save As' actions. For these actions concatenating the sqlite data with KDb project's data would be another option convenient from user's point of view.
CURRENT IMPLEMENTATION: B way is selected with above extensions added to the TODO list.
7. SQLite-builtin views are read-only. So the proposal is not to use them. Here is why:
We want have rw queries in KDb if main table in a query is rw.
<DEFINITION>: Main table T in a query Q is a table that is not at 'many' side of query relations.
</DEFINITION>
<Example>:
table persons (name varchar, city integer);
table cities (id integer primary key, name varchar);
DATA: [Jarek, 1]-------[1, Warsaw]
/
[Jakub, 1]-----/
query: select * from persons, cities
Now: 'cities' table is the main table (in other words it is MASTER table in this query).
'cities' table is rw table in this query, while 'persons' table is read-only because it is at 'many' side
in persons-cities relation. Modifying cities.id field, appropriate persons.city values in related
records will be updated if there is cascade update enabled.
</Example>
IDEAS:
A) Query result view (table view, forms, etc.) should allow editing fields from
main (master) table of this query, so every field object KDbField should have a method:
bool KDbField::isWritable() to allow GUI check if editing is allowed. Look that given field object
should be allocated for given query independently from the same field allocated for table schema.
The first field object can disallow editing while the latter can allow editing (because it is
component of regular table).
B) Also add method for QString KDbField that returns i18n'd message about the reasons
of disallowing for editing given field in a context of given query.
----------------------------------------------------------------
8. ERRORS Found
8.1 select * from (select name from persons limit 1) limit 2
-should return 1 row; returns 2
----------------------------------------------------------------
HINTS:
PRAGMA table_info(table-name);
For each column in the named table, invoke the callback function
once with information about that column, including the
column name, data type, whether or not the column can be NULL,
and the default value for the column.
---------------------------------------------------------------
OPTIMIZATION:
https://www.mail-archive.com/sqlite-users@sqlite.org/msg04117.html
From: D. Richard Hipp [mailto:[EMAIL PROTECTED]
Sent: Friday, October 08, 2004 5:59 PM
To: [EMAIL PROTECTED]
Subject: Re: [sqlite] Questions about sqlite's join translation
Keith Herold wrote:
> The swiki says that making JOINs into a where clause is more
> efficient, since sqlite translates the join condition into a where
> clause.
When SQLite sees this:
SELECT * FROM a JOIN b ON a.x=b.y;
It translate it into the following before compiling it:
SELECT * FROM a, b WHERE a.x=b.y;
Neither form is more efficient that the other. Both will generate
identical code. (There are subtle differences on an LEFT OUTER JOIN,
but those details can be ignored when you are looking at things at a
high level, as we are.)
> It also
> says that you make queries more effiecient by minimizing the number of
> rows returned in the FROM clause as far to the left as possible in the
> join. Does the latter matter if you are translating everything into a
> where clause anyway?
>
SQLite implements joins using nested loops with the outer
loop formed by the first table in the join and the inner loop formed by
the last table in the join. So for the example above you would have:
For each row in a:
For each row in b such that b.y=a.x:
Return the row
If you reverse the order of the tables in the FROM clause like
this:
SELECT * FROM b, a WHERE a.x=b.y;
You should get an equivalent result on output, but SQLite will implement
the query differently. Specifically it does this:
For each row in b:
For each row in a such that a.x=b.y:
Return the row
The trick is that you want to arrange the order of tables so that the
"such that" clause on the inner loop is able to use an index to jump
right to the appropriate row instead of having to do a full table scan.
Suppose, for example, that you have an index on a(x) but not on b(y).
Then if you do this:
SELECT * FROM a, b WHERE a.x=b.y;
For each row in a:
For each row in b such that b.y=a.x:
Return the row
For each row in a, you have to do a full scan of table b. So the time
complexity will be O(N^2). But if you reverse the order of the tables
in the FROM clause, like this:
SELECT * FROM b, a WHERE b.y=a.x;
For each row in b:
For each row in a such that a.x=b.y
Return the row
No the inner loop is able to use an index to jump directly to the rows
in a that it needs and does not need to do a full scan of the table.
The time complexity drops to O(NlogN).
So the rule should be: For every table other than the first, make sure
there is a term in the WHERE clause (or the ON or USING clause if that
is your preference) that lets the search jump directly to the relavant
rows in that table based on the results from tables to the left.
Other database engines with more complex query optimizers will typically
attempt to reorder the tables in the FROM clause in order to give you
the best result. SQLite is more simple-minded - it codes whatever you
tell it to code.
Before you ask, I'll point out that it makes no different whether you
say "a.x=b.y" or "b.y=a.x". They are equivalent. All of the following
generate the same code:
ON a.x=b.y
ON b.y=a.x
WHERE a.x=b.y
WHERE b.y=a.x
---------------------------------------------------------------
|