File: TODO

package info (click to toggle)
puf 1.0.0-7
links: PTS, VCS
area: main
in suites: bookworm, bullseye, buster, jessie, jessie-kfreebsd, stretch, wheezy
size: 836 kB
ctags: 688
sloc: ansic: 4,523; sh: 3,369; makefile: 56
file content (107 lines) | stat: -rw-r--r-- 3,881 bytes
parent folder | download | duplicates (3)
puf  --  TODO
-------------

Parts denoted with ! discuss (the benefit of) a feature.
This list is supposed to be sorted roughly by implementation order 
(not necessarily by importance).

+  lower memory usage
   - use a tree structure
   - optimize alignments
   - don't use sizeof, as it extends to struct alignment
+  redirs other than appending '/' should create a symlink
+  Wget-like -k switch. this should also rewrite file extensions - .php is
   pretty pointless in a local copy.
+  Wget-like -I & -X switches
+  make select() queues for -i files -> async url list input
+  spread requests over both -ib and -iy, not only -ib via -iy. opt -xm.
+  simply machine-parsable logging format
+  support multioffset
+  better Proxy support:
   - make CGI-proxies actually work; escape some chars ("@" is common)
   - proxy readiness wait queue
+  handle content-/transfer-encodings
+  header templates for client faking
+  cookie handling
+  robots.txt handling
+  support SSL
 ! does anybody do recursive or parallel fetching from secure sites?
+  support FTP
+  write better documentation

NOT TODO
--------

-  support gopher

IDEAS
-----

consider huffman-encoding url fragments.

nuke request queue. instead, put dirty marks on not-yet-fetched leaves and
dirty sub-node counts on non-leaf nodes. the tree would be traversed until
the dirty counts are all zero. there are not that many different url param
blocks, so put them in a hash indexed by "key nodes" and look them up while
traversing the nodes.
considerations:
- summed up, are lingering inlined urld_t's really more efficient than a
  separate queue that shrinks?
- if not, could file_t's be shrunk and relocated?
  - take care of references, like referer backlinks
  - freeing in pools does not work, so make per-directory pools which are
    shrunk at once

param block switches work in a separate linear queue as well; flag presence
of switch with a bit.

use fewer pointers in tree:
- no next ptrs: serialize lists into (fragmented) per-hierarchy-level pools.
  -> search/add performance will suffer, with name compression extremely.
  unless memory is reserved (=wasted), the pools would have to be constantly
  resized for additions -> relocation problem, again.
- per-hierarchy-level parent ptrs. determine current stream with binary search
  (cache last hit(s)).
  -> quite slow

(by default,) don't save referers for on-host refs - use parent dir instead.

possibly save redirection referers for cloaking redirs in -xO dump.

rethink where -xh headers are saved. consider partial downloads.

merge http_req.c & http_rsp.c to http.c.
extract stuff from http_conn.c & http_rsp.c to file.c.

decouple disposition from aurl. pool open multi-src dispositions.

ref-count all kinds of option sub-structs; dispose in time.

optimize adden() by "finalizing" options only if no urls ref it yet.

rethink path shortening magic - clashes when multiple urls on command line.

coalesce identical auths - there can be plenty of them from a huge -i file.

use threads instead of processes for dns helpers.
use an async dns resolver lib.
store dns ttls.

maybe ignore -l for requisites. otoh, frames are considered requisites as well,
and we certainly want neither no nor unlimited recursion for them.
the proper solution is to parse the tags properly to know what needs to be
recursed; this is important for -A as well.
frames should be considered both links and requisites, actually.

-l will lead to different results depending on which route was taken to a page.
fix: re-do the recursion decision on every addition attempt ...

anonymous index.html discovery; symlink to proper file. use ETag for this.

consider nuking au->reloc; use au->http_result_code directly.

add -xD switch: "Regard \"Disposition:\" HTTP headers"

rethink -Q: downloaded or written bytes?

Basic auth should be automatically sent for subdirs.