File: TODO

package info (click to toggle)
debtorrent 0.1.9
  • links: PTS, VCS
  • area: main
  • in suites: lenny
  • size: 1,452 kB
  • ctags: 1,183
  • sloc: python: 13,526; sh: 274; makefile: 51
file content (137 lines) | stat: -rw-r--r-- 6,037 bytes parent folder | download | duplicates (2)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
Below are some things that still need to be done, in order of priority.


More/better statistics in the client and tracker

The statistics viewable on the tracker and client's web interfaces are 
not very complete or useful. The client's downloaded data numbers should 
not reset on every restart, and should show the amount downloaded from 
peers vs. the amount downloaded from the server. Also, more detailed 
info could be available by clicking on the torrent, perhaps showing a 
list of the connected peers, and how much they have downloaded. The 
tracker also needs to keep it's numbers over restarts, and should show 
more info on how much all peers have downloaded/uploaded for each 
torrent.


Use python-debian for parsing RFC 822 files.

There are already routines available in python-debian for parsing
Packages files, so use them instead of writing new ones. The torrent
creation routines and the hippy and uniquely programs proably need
changes for this. 


Investigate using other archives

Some investigation needs to be done of how other archives will work with 
the current setup (i.e. without sub-piece data or other addons). Though 
they should work fine (just less efficiently), others have reported 
problems with them. Some archives to try are security.d.o, volatile.d.o, 
debian-multimedia.org, and archive.ubuntu.com.


Add timeout checks for the created threads

The threads created to perform some tasks need to have timeout checks 
added for them. This is especially true for the cache download threads, 
which (if hung) will prevent any future cache downloads from the site.


Check if the torrent is already running before getting from the cache

When a Packages download results in a 304 not modified message because 
the file is in the cache and up to date with the server, the Packages 
file still needs to be uncompressed and turned into a torrent to 
determine if it is currently running or not. This is time consuming and 
unnecessary. A better solution is to store some information about which 
Packages files belong to which running torrents, and then check that.


Check for problems with the experimental saveas_style 2 

Using the saveas_style = 2 option saves the files downloaded from the 
torrents in the same directory structure as the files downloaded by 
proxy. This is more efficient, since the files can be retrieved by the 
cacher without having to check the running torrents for them. However, 
it could also cause problems if the file has not yet completed 
downloading (will fail hash check). Some investigation needs to be done 
of the possiblity of this and how best to check for it.


AptListener queue waiting for downloads should be using callbacks

AptListener currently queues requests for packages that have not 
completed downloading, and then checks this queue every 1 second. This 
could be made more efficient by adding callbacks to PiecePicker or 
StorageWrapper, so that when a piece comes in and passes the hash check, 
then the AptListener will process any queued requests for that piece.


Different forms of HTTP Downloading may open too many connections

There are a lot of different ways for the program to download from a 
mirror, including the HTTPDownloader for pieces (one per torrent), the 
HTTPCache downloader (one per program), etc... Most are threaded, and 
could end up opening a lot of connections to a single mirror. A better 
implementation would be to manage these connections all in one place, 
so that they can be controlled together and a max number of connections 
could be set.


Don't ignore Packages.diff

Currently requests for Packages.diff return a 404 not found error, as 
the AptListener doesn't yet know how to process these. This should be 
improved to store a cached copy of the Packages file, and apply the 
diffs to it as they download, then use the final file as the torrent 
file. Security checking of the hashes of the diffs or the final file may 
be necessary.


Make the URL download better

The URL download is currently a little hacky, and could be improved. 
The  urlopen class could be made into an iterator, and could decompress 
files with the ".gz" extension itself.


Pre-allocating files is necessary

In order for the download to work properly, the allocation method must 
be set to pre-allocate (normal doesn't work, the others probably don't). 
This is due to the pieces no longer being the same size, and so data 
cannot be moved around between them like it was previously. This may not 
be an issue after a maximum piece size is introduced, though 
pre-allocation may still be necessary to serve downloaded files while 
other downloads continue. Pre-allocation with priorities enabled does 
not pre-allocate the entire archive. Later, when multiple pieces per 
file are used, an allocation method could be used within a file.


Statistics for the swarm are incorrect

The statistics for file completion in the swarm are incorrect. This is 
due to using the number of pieces each peer has as a measure of their 
completion. Since some pieces are very small, and some very large, this 
is no longer accurate. This may not need fixing though, as piece sizes 
will become more uniform in later versions, and these statistics may not 
be needed.


Switch to using the poll() system call instead of the select module

The poll() system call, supported on most Unix systems, provides better 
scalability for network servers that service many, many clients at the 
same time. poll() scales better because the system call only requires 
listing the file descriptors of interest, while select() builds a 
bitmap, turns on bits for the fds of interest, and then afterward the 
whole bitmap has to be linearly scanned again. select() is O(highest 
file descriptor), while poll() is O(number of file descriptors).


Consider Sources

Sources files contain 2 or 3 files for every source package, but no 
SHA1 sums for any of them. These sums would have to be generated, or 
preferably sub-file sums, before anything can be done.