1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229
|
Priority: Best Effort
---------------------
o remind users to set ReportPefixFormat wisely. E.g. if you have a
class B network, and 256 subnets, configuring ReportPrefixFormat to
"%H:%M" in "SubNetIO.cf" (to preserve only one day's worth of
reports) will create 73,728 HTML files!
o Some of CampusIO's options can't be used properly with multiple
exporters. Perhaps the configuration should be changed so that
you must specify the export IP address with each value in
OutputIfIndexes and WebProxyIfIndex. E.g.
border1.our.domain:1, border1.our.domain:2, border2.our.domain:3
o For LFAP/slate/lfapd flows records:
o Add an option to lfapd to be able to specify how many bytes to
subtract per packet. This is necessary because LFAP appears
to include the layer to header/frame in the packet size
where-as NetFlow is just the IP header and payload.
o Have lfapd pay attention to the LFAP timestamps and discard
any flows that are not within a certain seconds tolerance of
the current time. (This is similar to Tobi's "X-Files-Factor"
with RRDTool.) Currently, if you shutdown sfas for a while,
it can write very old flows to the "flows.current" because the
router might send old flows couldn't be sent in a timely
fashion because sfas was down. I (During testing, I actually
got flows in "flows.current" that were about 24 hours old!)
o Figure out what's causing the huge spikes in FlowScan graphs
based on bytes when config changes are made while LFAP is
running. For instance, whenever I change the rate-limit on an
interface running LFAP, I get an huge spike of traffic. Since
there isn't a dramatic spike in the number of flows, it seems
as though LFAP might be sending some huge pkt/byte update
values?
o use SNMP_Session to collect the ifNames so that users can use
ifNames rather than ifIndexes to specify the OutputIfIndexes and
WebProxyIfIndex values in CampusIO.cf.
o If a large flood (such as a DoS) of TCP ACK packets with
dynamically forged src/dst addresses is destined for port 21 (ftp),
it causes %CampusIO::FTPSession to grow without bound. In one such
DoS, I saw the flowscan process grow to >300MB in size, and it
seemed to stopped functioning, blocked in an "uninterruptible
sleep" under Linux, e.g.:
2000/11/11 11:20:26 %CampusIO::FTPSession -> 683/65536
2000/11/11 11:25:02 %CampusIO::FTPSession -> 59362/131072
2000/11/11 11:25:03 %CampusIO::FTPSession -> 59227/131072
2000/11/11 11:32:13 %CampusIO::FTPSession -> 424790/1048576
2000/11/11 11:32:20 %CampusIO::FTPSession -> 424633/1048576
2000/11/11 11:46:50 %CampusIO::FTPSession -> 591817/1048576
2000/11/11 13:02:48 %CampusIO::FTPSession -> 591723/1048576
This needs to be addressed, perhaps by surpressing maintenance of
these hash/cache data objects once they reach a certain size, or
perhaps just invoking the purge algorithm from within CampusIO's
wanted function whenever the hash gets too large. (I don't think
Net::Patricia will really help here as a Patricia Trie, while
smaller than a hash, will become very large too.)
o Jeff B. suggested that maybe we can detect suspected TCP
retransmissions (due to packet drops from rate-limits) based on an
imbalance in the number of inbound and outbound packets in a TCP
flow.
Perhaps we can match up pairs of TCP flows (that occur in the same
5 minute flow file) that have the same address/port pairs.
Limiting this to just flows that have SYN|ACK|FIN is probably
sufficient, then report discrepancies between the # of packets in
one direction vs. the other. (This means retransmissions probably
happened and may be very interesting to correlate with droped
packets based no CAR stats.)
o Change graphs Makefile ("graphs.mf.in") to do calculations in
bits-per-second rather than megabits-per-second since RRDtool does
a nice job of displaying things with the appropriate metric
abbreviation on its own.
Priority: LBE
-------------
o Fix missing 554*.rrd problem that some folks saw with
FlowScan-1.005. (For the time being the workaround is to create it
manually with "rrdtool create" as posted to the mailing list.)
o Add ICMPTypes option in CampusIO?
This won't work with LFAP because it does not include ICMP
type/code info in its flows.
o Write a new AutoAS report. This will assume that peer-as is
configured (so that we won't get too many AS src/dst pairs) and
will automatically create RRD files for them. The list of RRD
files that are updated after processing each raw flow file should
be the entire set of all AS RRD files that exist, not just those
AS pairs for which traffic was seen during this sample. Then
we can use a utility like "maxfetch" to determine the most active
AS pairs and automagically graph them (without using the graphs.mf
Makefile technique). Perhaps the graph colors should be based on
the 8(?) gnuplot default colors.
o Add flowscan.rrd, flowscan_cpu.rrd functionality into "flowscan"
script. (This should be configurable via an option since it
requires that FlowScan needs RRDtool even if used w/o CampusIO.)
These RRD files contain performance info about FlowScan itself.
"flowscan.rrd" should contain:
bytes, pkts, flows,
and perhaps some stuff about caches such as:
realservers, napservers, ftppasv, etc.
"flowscan_cpu.rrd" should contain:
find_real, find_user, find_sys,
report_real, report_user, report_sys,
report_latesecs
o Attempt to identify other collaborative file sharing apps such
as: scour, or gnutella which have no central rendesvous server(s).
SX (Scour eXchange) - http://sx.scour.com/
SX spec: http://sx.scour.com/stp-1.0pre6.html
psx (Perl Scour eXchange) http://sixpak.cs.ucla.edu/psx/, http://psx.sourceforge.net
gnapster - http://download.sourceforge.net/gnapster/
Gnutella Homepage - http://gnutella.wego.com
gnutella protocol spec: http://gnutella.wego.com/go/wego.pages.page?groupId=116705&view=page&pageId=119598&folderId=116767&panelId=-1&action=view
Knowbuddy FAQ - http://www.rixsoft.com/Knowbuddy/gnutellafaq.html
o Make the "--step" time configurable (according to the flowscan wait
time). Currently, even though the "flowscan.cf" seems to indicate
that it's configurable, it probably makes absolutely no sense to
change the "WaitSeconds" (or with "-s" on the cflowd command line)
because the "--step 300" is hard-coded in "CampusIO.pm".
o Fix CampusIO.pm regarding ':' in ".rrd" file names
Perhaps this should be written as a patch to RRDTOOL so that it
handles ":" in file names?
Currently, RRD files for the configured ASPairs contain a ':' in
the file name. This is apparently a no-no with RRDTOOL since,
although it allows you create files with these names, it doesn't
let you graphs using them because of how the API uses ':' to
seperate arguments.
For the time being, if you want to graph AS information, you must
manually create symbolic links in your graphs sub-dir . i.e.
$ cd graphs
$ ln -s 0:42.rrd Us2Them.rrd
$ ln -s 42:0.rrd Them2Us.rrd
Perhaps the simple fix is to do what packages such as Cricket do,
i.e. change the ':' to '_'.
o Fix "flowscan" and its rc script so that "/etc/init.d/flowscan
stop" doesn't kill flowscan in a "critical section". Although I
haven't seen it happen, I think if the timing is off it could
kill(1) flowscan during RRD operations, possibly resulting in a
corrupt ".rrd" file. This should probably be implemented by having
the script "ask" flowscan to shutdown ASAP - possibly by
creat(2)ing a file or writing into a fifo. Then flowscan should
check for this signal before it starts RRD updates. It should also
be, of course, able to be interrupted for shutdown while it's
sleeping.
o Allow flowscan logfile to be specified in "flowscan.cf". e.g.:
LogFile /var/log/flowscan.log
Then have flowscan open this and dup/select it for both STDOUT and
STDERR to catch warnings from reporting packages. Have flowscan
periodically rename the log file, and open an new one (every day or
whatever) so that we don't have to shut-down flowscan to trim the
log file.
o ? Unify configuration files so that we don't need to redundantly
specify things like "OutputDir" in the configuration file for each
report class. Perhaps introducing a "FlowScan.cf" would suffice,
and it would be accessed in the report packages as
$self->{FlowScan}{OutputDir}.
o Add a "by Application" graphs (Mbps, pkts, flows) to "graphs.mf.in"
which show I/O by applications such as web client (http_src in +
http_dst out + https_src in + https_dst out), web server (http_src
out + http_dst in + https_src out + https_dst in), news (nntp),
file transfer (ftp (+nfs?)), email (smtp + pop + imap), Napster
(NapUser + NapUserMaybe), RealMedia (Real), MCAST, and unknown
(based on subtracting from total). It would be nice if this graph
split it out by in and out.
Once this graph is done, "RealServer I/O" should be taken out of
the "Well Known Services" graphs.
o Write a new "FlowDivert" report which controls how flows are saved
by diverting them to the files specified in this report's
configuration.
Note that Jay Ford <jay-ford@uiowa.edu> has essentially this. See
the discussion in the flowscan mailing list archive. (Nov 2, 2000)
If source and destination address was the only selection criteria
allowed, a sample "FlowDivert_subnets.boulder" file might look like
this (note that a specific host can be specified as "/32" subnet):
SUBNET=10.42.42.42/32
DESCRIPTION=our interesting host
SAVEDIR=saved/host/our_host
=
SUBNET=10.0.1.0/24
DESCRIPTION=our first subnet
SAVEDIR=saved/subnet/first
=
SUBNET=10.0.2.0/24
DESCRIPTION=our second subnet
SAVEDIR=saved/subnet/second
Alternatively, the entries in the configuration file could have
arbitrary bits of perl code to be evaluated (like the expression to
"flowdumper -e <expr>"), but I'm scared that that could be slow.
E.g. "FlowDivert.boulder":
SAVEDIR=saved/host/our_host
DESCRIPTION=our interesting host
EXPR=unpack("N", inet_aton("10.42.42.42")) == $srcaddr || unpack("N", inet_aton("10.42.42.42")) == $dstaddr
=
SAVEDIR=saved/subnet/our_subnet
DESCRIPTION=our subnet
EXPR=unpack("N", inet_aton("10.0.1.0")) == (0xffffff00 & $srcaddr) || unpack("N", inet_aton("10.0.1.0")) == (0xffffff00 & $dstaddr)
|