File: DESIGN

package info (click to toggle)
rhythmbox 3.4.9-4
links: PTS, VCS
area: main
in suites: sid
size: 27,376 kB
sloc: ansic: 114,861; python: 4,941; xml: 730; javascript: 350; perl: 307; sh: 84; makefile: 43
file content (270 lines) | stat: -rw-r--r-- 10,725 bytes
parent folder | download | duplicates (4)

* Goals for a new Rhythmbox tree database backend

** Not complicated

We want to present a simple API to the rest of Rhythmbox.  A
full-blown SQL is obviously unnecessary.  The only thing that
Rhythmbox cares about are songs.  So conceptually, our database
will only support getting lists of songs.

** Permit optimization

In keeping with a simple API, we should also attempt as much as
possible to provide opportunities for optimization.  Specifically
the old RBNode database was highly optimized for genre/artist/album
filtering, and we want to keep that capability.

** Multiple implementations

The interface should also permit multiple implementations.  For
example, there are the crazy people out there who have all their
music metadata in a SQL database already, so having our own
database doesn't help them much.

It also may turn out that using something like SQLite could
speed up Rhythmbox for large numbers of files (say > 6k or so).
This is an area where the current Rhythmbox kind of falls over.
So permitting multiple implementations will allow us more
flexibility for experimentation.

** What is a song?

Conceptually, it's just a collection of attributes.  The most
important is the type, which says whether it's a local file,
an internet radio station, etc.  Other attributes include
location, track number, rating, last play time, play count,
quality, etc.

*** What are attributes?

We want the ability to have an arbitrary set of attributes of
various data types, including UTF-8 strings, integers, and random
binary data (e.g. for sort keys).

**** Saved/unsaved attributes

The database should support "temporary" keys, which are not actually
saved to the database on exit.  This is useful for sorting keys,
for example.

** Multithread-capable

We need to ensure that the UI doesn't block.  Generally this
means using threads.  This introduces a lot of locking issues
that need to be addressed.

The database should be also optimized for common operations,
and this definitely means reading.  Writing should be lower
priority, although we don't want to completely starve writers.

Because we need to support multiple backends, we can only really
support coarse-grained locking (i.e. one big global RW lock).

*** Multithreaded entry attribute changes

This issue is not as simple as one might think.  We definitely
want to be able to use threads for this - for example, after
the user has changed a bunch of songs on disk, we need
to update the UI to reflect those changes, incrementally.
That involves emitting a gtk_tree_model_row_changed signal, for
which we need to hold the GDK lock.

*** Deadlock 

We have to consider deadlock situations here, where one thread is
holding the GDK lock, and wants the database lock, and then another
thread is holding the db lock and wants the GDK lock.  In the old
design, the any other thread could grab the GDK lock to update the UI,
which is wrong.

*** RBEntryModel

In general, we don't want the various processing threads in
Rhythmbox to be aware that there is this huge slow UI to
keep updated.  Setting an attribute or delete an entry
shouldn't involve doing anything else besides calling the
function.

However, we do need to update the UI somehow.  The UI
is represented by the GtkTreeModel implementations
returned by a query.  Thus, one plan is to have an update
queue.  Whenever a thread other than the main one modifies
RhythmDB, we push an update notifier onto the queue.  Then,
we have another thread (besides the main one) dedicated
to processing this queue, and emitting the appropriate
row_changed signals.

**** GtkTreeModel mapping efficiency

In order to emit the row_changed signal, we will need
to efficiently look up a GtkTreeIter, given an entry.
Perhaps we have to write our own GtkTreeModel implementation
that combines a GHashTable and a GList?  Or just subclass
GtkListStore, and add a GHashTable index?

*** Deletions

Handling deletions well is important.  Currently
doing a "Select All" followed by "Delete" can take
some time.  Now, this works two ways.  First, the user
can request a deletion.  To handle this, we simply
queue deletion requests from the UI, and then have
a thread which processes those.

However, we also need to handle deletions from other
threads, such as the database processing thread.  This
brings up an issue - we cannot treat deletions like
other operations.  If an entry is deleted from the
database, but the UI isn't updated - the user may
try to change the properties of an entry that's
already been deleted.  How to solve this?

*** Additions

Additions fairly straightforward; we don't have
as many database consistency issues as deletions.  Basically
we just have a thread add something to the database, and then
queue a signal emission for the main UI, much like for property
changes.

** RhythmDBEntry  (basically equivalent of RBNode)

This is an opaque handle.  You can't access it directly at all.

** Queries

This is crucial.  Rhythmbox's genre/artist/album
browser is just one example of querying the database.  It's
arguably Rhythmbox's most useful feature at the moment.

First of all, we want to support at least the genre/artist/album
queries.  Equally important though, we need to be flexible enough
to query on any attribute.  This will be an important basis for
implementing iTunes-like smart playlists.

The main idea of the new API is that it simply returns a set of
nodes which match the query.  Then a new object which implements
the GtkTreeModel interface is wrapped around this set.  This
has the potential to be a lot more efficient than the previous approach.
In the previous approach, the whole database was always presented in a
GtkTreeModel interface, and then EggTreeModelFilter was used to filter it.
The downside of this approach is that the filtering is happening a lot
more; for example, finding the next/previous node caused each node to
be filtered again.  For large databases, this is pretty inefficient; if
we have 10000 songs, and we're only viewing say one album (10 songs),
then we have to skip over approximately 5000 songs on average to find
out whether we're at the end or not.  This gets very expensive, very fast.

So here's how it will work in the new implementation.  When the user clicks
on an album, this only requires one query.  So, we will just call
rhythmdb_do_entry_query like this:

album = compute_album_id ();
...
GPtrArray *set = rhythmdb_do_entry_query (db, RHYTHMDB_QUERY_PROP_EQUALS, "album", album, RHYTHMDB_QUERY_END);
GtkTreeModel *model = rb_db_model_new (set);

Then, we will make the main song view use this model:

gtk_tree_view_set_model (mainview, model);

However, suppose that the user clicks on a genre.  This will require
several queries, because we need to filter the artist and album browsers
by artists/albums which are in those genres.  As before, we filter the main
view by the query:

genre = get_genre_id ();
...
GPtrArray *set = rhythmdb_do_entry_query (db, RHYTHMDB_QUERY_PROP_EQUALS, "genre", genre, RHYTHMDB_QUERY_END);
GtkTreeModel *model = rb_db_model_new (set);
gtk_tree_view_set_model (mainview, model);

However, we also need to gather the lists of artists/albums.  We use the
specialized query rhythmdb_do_property_query, which returns just a flat
GtkTreeModel.  This call looks like this:

set = rhythmdb_do_property_query (db, "artist", RHYTHMDB_QUERY_PROP_EQUALS,
                                  "genre", "Rock", RHYTHMDB_QUERY_END);
model = rb_db_model_new (set);
gtk_tree_view_set_model (artistview, model);
set = rhythmdb_do_property_query (db, "album", RHYTHMDB_QUERY_PROP_EQUALS,
                                 "genre", "Rock", RHYTHMDB_QUERY_END);
model = rb_db_model_new (set);
gtk_tree_view_set_model (artistview, model);

Conceptually, the property_query call just gathers all the song entries
which match the query, and returns the unique list of their values for the
specified property.  However, it is not actually implemented this way,
at least not for commonly queried properties such as artist and album.

** The first implementation - transitioning RBNode to this API

The main idea of RBNode is the tree structure.  If you think about it,
the tree structure is really just a fancy optimization for querying
the artist/album/genre properties.  We will keep this idea, but hide it
behind the flat structure API.  We will transparently optimize certain
queries, such as the artist/album ones.  This will require handling those
properties specially; for example, when we set the artist of a song,
this will require restructuring the tree database to match.

** SQLite ideas

As we mentioned before, the nice thing about this API is that it could
also be fairly easily implemented by a real database such as SQLite.
For example, this call:

/* Here, 19 is a unique ID referring to a genre such as "Rock */
rhythmdb_do_entry_query (db, RHYTHMDB_QUERY_PROP_EQUALS, "genre", 19, RHYTHMDB_QUERY_END);

could be rewritten in SQL:

select * from songs where genreid=19;

And this call:

/* Here, 37 is a unique ID referring to a genre such as "Rock */
rhythmdb_do_property_query (db, "album", RHYTHMDB_QUERY_PROP_EQUALS, "genre", 37, RHYTHMDB_QUERY_END);

could be rewritten as:

select distinct album from songs where genreid=37;

** Static playlists

We need to support user-orderable playlists.  Now, they should present
the same GtkTreeModel-type interface as the queries.  This will
allow a lot of code reuse for playback, etc.  However - a playlist also
has to be mutable.  The user needs to be able to drag and drop entries to
reorder them, delete files, add files, etc.  To implement this, we can
make playlists simply be GtkListStores.  This should be relatively
efficient since playlists tend not to be large.

This playlist will need to hook into the entry_destroyed signal to know
to remove entries from playlists that have been removed from the library.

** Smart playlists (do we want a better name)?

This will be harder.  A "smart" playlist will need to watch for changes
in the database and refresh if appropriate.  Well, it wouldn't be hard
to watch for ANY change in the DB and refresh, but this would obviously
be very expensive.  We'll want to intelligently refresh only if the
smart playlist query requries it.

* A plan for refactoring Rhythmbox to integrate RhythmDB

** Shell will create/destroy the database object

** Library takes database object, creates new thread to refresh metadata

** Morph RBIRadioBackend into the trival class RBIRadioLoader

** Turn RBNodeView into RBEntryView
*** RBTreeModelNode goees away
Instead, just add a bunch of columns into an RBTreeView, but set
the column_func to get the correct value


Local Variables:
mode: outline
End: