File: sqlite_issues.txt

package info (click to toggle)
kdb 3.2.0-9
links: PTS, VCS
area: main
in suites: forky, sid
size: 6,276 kB
sloc: cpp: 38,360; yacc: 940; python: 779; sh: 711; ansic: 440; lex: 367; xml: 182; sql: 51; makefile: 10
file content (198 lines) | stat: -rw-r--r-- 9,375 bytes
parent folder | download | duplicates (4)
---------------------------------------------------------
 SQLITE DRIVER IDEAS, ISSUES, PROPOSALS
 Copyright (C) 2003 Jarosław Staniek staniek at kde.org
 Started: 2003-07-09
---------------------------------------------------


1. In most situations (especially on massive data operations) we do not want get types of the columns,
so:

PRAGMA show_datatypes = OFF;


2. SQLite automatically adds primary key to the table if there is no such key.
Such pkey column is not visible for statemets like 'select * from table',
'select oid,* from table' need to be executed to also get this special column.

See section '3.1 The ROWID of the most recent insert' of c_interface.html file.


3. For smaller tables (how small? -- add configuration for this) sqlite_get_table() 'in memory'
function could be used to speed up rows retrieving.


4. Queries entered by user in the Query Designer should be checked for syntactically or logically validity and transformed to SQLite-compatible form befor execution. It is nonsense to ask SQLite engine if the given sql statement is valid, because then we wouldn't show too detailed error message to the user.


5. SQLite not only doesn't handles column types but also doesn't checks value sizes, eg. it is possible to insert string of length 100 to the column of size 20.
These checks should be made in KDb SQLite driver. In fact for each driver these checks could be made because user wants get a descriptive, localized, friendly message what's wrong. No single engine provides this of course. We need to store such a parameters like field size in project meta-data as sqlite doesn't stores that in any convenient way. It stores only 'CREATE TABLE' statement, as is.


6. Possible storage methods for SQLite database embedded in KDb project:
	A. Single SQLite-compatible database file (let's name it: .sqlite file)
		- Advantages: Best deal for bigger databases - no need for rewriting data form SQLite file to another,
			fastest open and save times. DB data consumes disk space only once. Other applications that uses SQLite library could also make use of standard format of .sqlite file's contents. KDb project and data would be easily, defacto, separated, what is considered as good method in DB programming.
		- Disadvantages: User (who may want to transfer a database) need to know that .kexi file doesn't stores his data but .sqlite is for that.

	B. Single SQLite-compatible database file embedded inside a KDb project .kexi file.
		SQLite requires an access to a file in its own (raw) format to be available somewhere in the path. If SQLite storing layer could be patched to adding an option for seek to given file position, sqlite data can be stored after KDb project data. When sqlite raw data file could be saved after a KDb project's data, rewriting the project contents should be performed (and this is done quite frequently). So, finally storing both files concatenated during normal operations is risky, costly and difficult to implement cleanly.
		- Advantages: User do not need to know that there is sqlite used in KDb as embedded DB engine (and even if there is any sql engine). Transferring just one file between machines means successfully transferring data and project.
		- Disadvantages: lack of everything described as advantages of A. method: difficult and costly open and save operations (unless SQLite storing layer could be patched).

	Extensions and compilations of the both above methods:
		- .sqlite files are really good compressable, so compress option can be added (if not for regular saving, then at least for "Email project & data" or 'Save As' actions. For these actions concatenating the sqlite data with KDb project's data would be another option convenient from user's point of view.

	CURRENT IMPLEMENTATION: B way is selected with above extensions added to the TODO list.


7. SQLite-builtin views are read-only. So the proposal is not to use them. Here is why:
	We want have rw queries in KDb if main table in a query is rw.
	<DEFINITION>: Main table T in a query Q is a table that is not at 'many' side of query relations.
	</DEFINITION>
	<Example>:
	table persons (name varchar, city integer);
	table cities (id integer primary key, name varchar);

	DATA: [Jarek, 1]-------[1, Warsaw]
	                      /
	      [Jakub, 1]-----/

	query: select * from persons, cities
	Now: 'cities' table is the main table (in other words it is MASTER table in this query).
	'cities' table is rw table in this query, while 'persons' table is read-only because it is at 'many' side
	in persons-cities relation. Modifying cities.id field, appropriate persons.city values in related
	records will be updated if there is cascade update enabled.
	</Example>
	IDEAS:
	A) Query result view (table view, forms, etc.) should allow editing fields from
		main (master) table of this query, so every field object KDbField should have a method:
		bool KDbField::isWritable() to allow GUI check if editing is allowed. Look that given field object
		should be allocated for given query independently from the same field allocated for table schema.
		The first field object can disallow editing while the latter can allow editing (because it is
		component of regular table).
	B) Also add method for QString KDbField that returns i18n'd message about the reasons
		of disallowing for editing given field in a context of given query.


----------------------------------------------------------------
8. ERRORS Found
8.1 select * from (select name from persons limit 1) limit 2
 -should return 1 row; returns 2

----------------------------------------------------------------

HINTS:

PRAGMA table_info(table-name);
For each column in the named table, invoke the callback function
once with information about that column, including the
column name, data type, whether or not the column can be NULL,
and the default value for the column.


---------------------------------------------------------------
OPTIMIZATION:

https://www.mail-archive.com/sqlite-users@sqlite.org/msg04117.html

From: D. Richard Hipp [mailto:[EMAIL PROTECTED]
Sent: Friday, October 08, 2004 5:59 PM
To: [EMAIL PROTECTED]
Subject: Re: [sqlite] Questions about sqlite's join translation


Keith Herold wrote:
> The swiki says that making JOINs into a where clause is more
> efficient, since sqlite translates the join condition into a where
> clause.

When SQLite sees this:

    SELECT * FROM a JOIN b ON a.x=b.y;

It translate it into the following before compiling it:

    SELECT * FROM a, b WHERE a.x=b.y;

Neither form is more efficient that the other.  Both will generate
identical code.  (There are subtle differences on an LEFT OUTER JOIN,
but those details can be ignored when you are looking at things at a
high level, as we are.)

 > It also
> says that you make queries more effiecient by minimizing the number of

> rows returned in the FROM clause as far to the left as possible in the

> join.  Does the latter matter if you are translating everything into a

> where  clause anyway?
>

SQLite implements joins using nested loops with the outer
loop formed by the first table in the join and the inner loop formed by
the last table in the join.  So for the example above you would have:

    For each row in a:
      For each row in b such that b.y=a.x:
        Return the row

If you reverse the order of the tables in the FROM clause like
this:

    SELECT * FROM b, a WHERE a.x=b.y;

You should get an equivalent result on output, but SQLite will implement
the query differently.  Specifically it does this:

    For each row in b:
      For each row in a such that a.x=b.y:
        Return the row

The trick is that you want to arrange the order of tables so that the
"such that" clause on the inner loop is able to use an index to jump
right to the appropriate row instead of having to do a full table scan.
Suppose, for example, that you have an index on a(x) but not on b(y).
Then if you do this:

    SELECT * FROM a, b WHERE a.x=b.y;

    For each row in a:
      For each row in b such that b.y=a.x:
        Return the row

For each row in a, you have to do a full scan of table b.  So the time
complexity will be O(N^2).  But if you reverse the order of the tables
in the FROM clause, like this:

    SELECT * FROM b, a WHERE b.y=a.x;

    For each row in b:
      For each row in a such that a.x=b.y
        Return the row

No the inner loop is able to use an index to jump directly to the rows
in a that it needs and does not need to do a full scan of the table.
The time complexity drops to O(NlogN).

So the rule should be:  For every table other than the first, make sure
there is a term in the WHERE clause (or the ON or USING clause if that
is your preference) that lets the search jump directly to the relavant
rows in that table based on the results from tables to the left.

Other database engines with more complex query optimizers will typically
attempt to reorder the tables in the FROM clause in order to give you
the best result.  SQLite is more simple-minded - it codes whatever you
tell it to code.

Before you ask, I'll point out that it makes no different whether you
say "a.x=b.y" or "b.y=a.x".  They are equivalent.  All of the following
generate the same code:

      ON a.x=b.y
      ON b.y=a.x
      WHERE a.x=b.y
      WHERE b.y=a.x

---------------------------------------------------------------