Below are some things that still need to be done, in order of priority.
More/better statistics in the client and tracker
The statistics viewable on the tracker and client's web interfaces are
not very complete or useful. The client's downloaded data numbers should
not reset on every restart, and should show the amount downloaded from
peers vs. the amount downloaded from the server. Also, more detailed
info could be available by clicking on the torrent, perhaps showing a
list of the connected peers, and how much they have downloaded. The
tracker also needs to keep it's numbers over restarts, and should show
more info on how much all peers have downloaded/uploaded for each
Use python-debian for parsing RFC 822 files.
There are already routines available in python-debian for parsing
Packages files, so use them instead of writing new ones. The torrent
creation routines and the hippy and uniquely programs proably need
changes for this.
Investigate using other archives
Some investigation needs to be done of how other archives will work with
the current setup (i.e. without sub-piece data or other addons). Though
they should work fine (just less efficiently), others have reported
problems with them. Some archives to try are security.d.o, volatile.d.o,
debian-multimedia.org, and archive.ubuntu.com.
Add timeout checks for the created threads
The threads created to perform some tasks need to have timeout checks
added for them. This is especially true for the cache download threads,
which (if hung) will prevent any future cache downloads from the site.
Check if the torrent is already running before getting from the cache
When a Packages download results in a 304 not modified message because
the file is in the cache and up to date with the server, the Packages
file still needs to be uncompressed and turned into a torrent to
determine if it is currently running or not. This is time consuming and
unnecessary. A better solution is to store some information about which
Packages files belong to which running torrents, and then check that.
Check for problems with the experimental saveas_style 2
Using the saveas_style = 2 option saves the files downloaded from the
torrents in the same directory structure as the files downloaded by
proxy. This is more efficient, since the files can be retrieved by the
cacher without having to check the running torrents for them. However,
it could also cause problems if the file has not yet completed
downloading (will fail hash check). Some investigation needs to be done
of the possiblity of this and how best to check for it.
AptListener queue waiting for downloads should be using callbacks
AptListener currently queues requests for packages that have not
completed downloading, and then checks this queue every 1 second. This
could be made more efficient by adding callbacks to PiecePicker or
StorageWrapper, so that when a piece comes in and passes the hash check,
then the AptListener will process any queued requests for that piece.
Different forms of HTTP Downloading may open too many connections
There are a lot of different ways for the program to download from a
mirror, including the HTTPDownloader for pieces (one per torrent), the
HTTPCache downloader (one per program), etc... Most are threaded, and
could end up opening a lot of connections to a single mirror. A better
implementation would be to manage these connections all in one place,
so that they can be controlled together and a max number of connections
could be set.
Don't ignore Packages.diff
Currently requests for Packages.diff return a 404 not found error, as
the AptListener doesn't yet know how to process these. This should be
improved to store a cached copy of the Packages file, and apply the
diffs to it as they download, then use the final file as the torrent
file. Security checking of the hashes of the diffs or the final file may
Make the URL download better
The URL download is currently a little hacky, and could be improved.
The urlopen class could be made into an iterator, and could decompress
files with the ".gz" extension itself.
Pre-allocating files is necessary
In order for the download to work properly, the allocation method must
be set to pre-allocate (normal doesn't work, the others probably don't).
This is due to the pieces no longer being the same size, and so data
cannot be moved around between them like it was previously. This may not
be an issue after a maximum piece size is introduced, though
pre-allocation may still be necessary to serve downloaded files while
other downloads continue. Pre-allocation with priorities enabled does
not pre-allocate the entire archive. Later, when multiple pieces per
file are used, an allocation method could be used within a file.
Statistics for the swarm are incorrect
The statistics for file completion in the swarm are incorrect. This is
due to using the number of pieces each peer has as a measure of their
completion. Since some pieces are very small, and some very large, this
is no longer accurate. This may not need fixing though, as piece sizes
will become more uniform in later versions, and these statistics may not
Switch to using the poll() system call instead of the select module
The poll() system call, supported on most Unix systems, provides better
scalability for network servers that service many, many clients at the
same time. poll() scales better because the system call only requires
listing the file descriptors of interest, while select() builds a
bitmap, turns on bits for the fds of interest, and then afterward the
whole bitmap has to be linearly scanned again. select() is O(highest
file descriptor), while poll() is O(number of file descriptors).
Sources files contain 2 or 3 files for every source package, but no
SHA1 sums for any of them. These sums would have to be generated, or
preferably sub-file sums, before anything can be done.