Below are some things that still need to be done, in order of priority. More/better statistics in the client and tracker The statistics viewable on the tracker and client's web interfaces are not very complete or useful. The client's downloaded data numbers should not reset on every restart, and should show the amount downloaded from peers vs. the amount downloaded from the server. Also, more detailed info could be available by clicking on the torrent, perhaps showing a list of the connected peers, and how much they have downloaded. The tracker also needs to keep it's numbers over restarts, and should show more info on how much all peers have downloaded/uploaded for each torrent. Use python-debian for parsing RFC 822 files. There are already routines available in python-debian for parsing Packages files, so use them instead of writing new ones. The torrent creation routines and the hippy and uniquely programs proably need changes for this. Investigate using other archives Some investigation needs to be done of how other archives will work with the current setup (i.e. without sub-piece data or other addons). Though they should work fine (just less efficiently), others have reported problems with them. Some archives to try are security.d.o, volatile.d.o, debian-multimedia.org, and archive.ubuntu.com. Add timeout checks for the created threads The threads created to perform some tasks need to have timeout checks added for them. This is especially true for the cache download threads, which (if hung) will prevent any future cache downloads from the site. Check if the torrent is already running before getting from the cache When a Packages download results in a 304 not modified message because the file is in the cache and up to date with the server, the Packages file still needs to be uncompressed and turned into a torrent to determine if it is currently running or not. This is time consuming and unnecessary. A better solution is to store some information about which Packages files belong to which running torrents, and then check that. Check for problems with the experimental saveas_style 2 Using the saveas_style = 2 option saves the files downloaded from the torrents in the same directory structure as the files downloaded by proxy. This is more efficient, since the files can be retrieved by the cacher without having to check the running torrents for them. However, it could also cause problems if the file has not yet completed downloading (will fail hash check). Some investigation needs to be done of the possiblity of this and how best to check for it. AptListener queue waiting for downloads should be using callbacks AptListener currently queues requests for packages that have not completed downloading, and then checks this queue every 1 second. This could be made more efficient by adding callbacks to PiecePicker or StorageWrapper, so that when a piece comes in and passes the hash check, then the AptListener will process any queued requests for that piece. Different forms of HTTP Downloading may open too many connections There are a lot of different ways for the program to download from a mirror, including the HTTPDownloader for pieces (one per torrent), the HTTPCache downloader (one per program), etc... Most are threaded, and could end up opening a lot of connections to a single mirror. A better implementation would be to manage these connections all in one place, so that they can be controlled together and a max number of connections could be set. Don't ignore Packages.diff Currently requests for Packages.diff return a 404 not found error, as the AptListener doesn't yet know how to process these. This should be improved to store a cached copy of the Packages file, and apply the diffs to it as they download, then use the final file as the torrent file. Security checking of the hashes of the diffs or the final file may be necessary. Make the URL download better The URL download is currently a little hacky, and could be improved. The urlopen class could be made into an iterator, and could decompress files with the ".gz" extension itself. Pre-allocating files is necessary In order for the download to work properly, the allocation method must be set to pre-allocate (normal doesn't work, the others probably don't). This is due to the pieces no longer being the same size, and so data cannot be moved around between them like it was previously. This may not be an issue after a maximum piece size is introduced, though pre-allocation may still be necessary to serve downloaded files while other downloads continue. Pre-allocation with priorities enabled does not pre-allocate the entire archive. Later, when multiple pieces per file are used, an allocation method could be used within a file. Statistics for the swarm are incorrect The statistics for file completion in the swarm are incorrect. This is due to using the number of pieces each peer has as a measure of their completion. Since some pieces are very small, and some very large, this is no longer accurate. This may not need fixing though, as piece sizes will become more uniform in later versions, and these statistics may not be needed. Switch to using the poll() system call instead of the select module The poll() system call, supported on most Unix systems, provides better scalability for network servers that service many, many clients at the same time. poll() scales better because the system call only requires listing the file descriptors of interest, while select() builds a bitmap, turns on bits for the fds of interest, and then afterward the whole bitmap has to be linearly scanned again. select() is O(highest file descriptor), while poll() is O(number of file descriptors). Consider Sources Sources files contain 2 or 3 files for every source package, but no SHA1 sums for any of them. These sums would have to be generated, or preferably sub-file sums, before anything can be done.