
|
<!--startcut ======================================================= -->
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<html>
<head>
<META NAME="generator" CONTENT="lgazmail v1.2M.b">
<TITLE>The Linux Gazette 43: The Answer Guy</TITLE>
</HEAD><BODY BGCOLOR="#FFFFFF" TEXT="#000000"
LINK="#3366FF" VLINK="#A000A0">
<!-- ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: -->
<H4>"The Linux Gazette...<I>making Linux just a little more fun!</I>"</H4>
<P> <hr> <P>
<!-- ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: -->
<center>
<H1><A NAME="answer">
<img src="./../gx/dennis/qbubble.gif" alt="(?)"
border="0" align="middle">
<font color="#B03060">The Answer Guy</font>
<img src="./../gx/dennis/bbubble.gif" alt="(!)"
border="0" align="middle">
</A></H1>
<BR>
<H4>By James T. Dennis,
<a href="mailto:answerguy@ssc.com">answerguy@ssc.com</a><BR>
LinuxCare,
<A HREF="http://www.linuxcare.com/">http://www.linuxcare.com/</A>
</H4>
</center>
<p><hr><p>
<!-- endcut ======================================================= -->
<H3>Contents:</H3>
<p><a href="#tag/greeting"
><img src="./../gx/dennis/bbub.gif" alt="(!)" border="0"
align="middle"><strong>Greetings From Jim Dennis</strong></A></p>
<DL>
<!-- index_text begins -->
<dt><A HREF="tag/1.html"
><img src="./../gx/dennis/qbub.gif" height="28" width="50"
alt="(?)" border="0"
><strong>Hey answer guy!!!</strong></a>
<dt><A HREF="tag/2.html"
><img src="./../gx/dennis/qbub.gif" height="28" width="50"
alt="(?)" border="0"
></a>One more thing. --or--
<dd><A HREF="tag/2.html"
><strong>Null Modems: Connecting MS-DOS to Linux as a
Serial Terminal</strong></a>
<dt><A HREF="tag/3.html"
><img src="./../gx/dennis/qbub.gif" height="28" width="50"
alt="(?)" border="0"
></a>RedHat 5.2 Kernel 2.0.36 --or--
<dd><A HREF="tag/3.html"
><strong>Upgrade Breaks Several Programs,
<TT>/proc</TT> Problems, BogoMIPS Discrepancies</strong></a>
<br>A visit to "Library Hell"
<dt><A HREF="tag/4.html"
><img src="./../gx/dennis/qbub.gif" height="28" width="50"
alt="(?)" border="0"
></a>Floppy/mount Problems: Disk Spins,
Lights are on, No one's Home? --or--
<dd><A HREF="tag/4.html"
><strong>Floppy Failure: mdir Works; mount Fails</strong></a>
<br>Found the culprit!
<dt><A HREF="tag/5.html"
><img src="./../gx/dennis/qbub.gif" height="28" width="50"
alt="(?)" border="0"
></a>need your help --or--
<dd><A HREF="tag/5.html"
><strong>Incompetance in Parenting</strong></a>
<dt><A HREF="tag/6.html"
><img src="./../gx/dennis/qbub.gif" height="28" width="50"
alt="(?)" border="0"
></a>bad clusters --or--
<dd><A HREF="tag/6.html"
><strong>Try Linux ... and Grammar</strong></a>
<dt><A HREF="tag/7.html"
><img src="./../gx/dennis/qbub.gif" height="28" width="50"
alt="(?)" border="0"
></a>Duplicating / --or--
<dd><A HREF="tag/7.html"
><strong>Out of Space....or Inodes? All Sparsity Lost?</strong></a>
<dt><A HREF="tag/8.html"
><img src="./../gx/dennis/qbub.gif" height="28" width="50"
alt="(?)" border="0"
></a>RAID 1 solutions --or--
<dd><A HREF="tag/8.html"
><strong>Arco Duplidisk: Disk Mirroring</strong></a>
<dt><A HREF="tag/9.html"
><img src="./../gx/dennis/qbub.gif" height="28" width="50"
alt="(?)" border="0"
></a>Modem Help --or--
<dd><A HREF="tag/9.html"
><strong>Searching for Days for a Linux Modem:
The Daze Continues</strong></a>
<!-- index_text ends -->
</DL>
<!-- .~~.~~.~~.~~.~~.~~.~~.~~.~~.~~.~~.~~.~~.~~.~~.~~.~~.~~. -->
<A NAME="tag/greeting"><HR WIDTH="75%" ALIGN="center"></A>
<H3 align="left"><img src="./../gx/dennis/bbubble.gif"
height="50" width="60" alt="(!) " border="0"
>Greetings from Jim Dennis</H3>
<!-- begin greeting -->
<p>So, my LG activity for this month is pretty sparse. Does
that mean that I haven't been involved in any Linux activity?
Does it mean that I'm not getting enough LG TAG e-mail?
</p><blockquote>
HARDLY!
</blockquote><p>
However, my work at <a href="http://www.linuxcare.com/">Linuxcare</a>
has been taking a pretty big bite
out of my time. In addition the long drive up to the city
(from my house in Campbell to Linuxcare's offices in San Francisco
is about 50 miles) keeps me away from the keyboard for far too
long. (Yes, I'm looking for <em>cheap</em> digs up in the city to keep
my up there during the week).
</p><p>
Mostly I've been working with our training department, presenting
classes on Linux Systems Administration to our customers and our
new employees, and helping develop and refine the courseware around
which the classes are built.
</p><p>
I've also been watching the Linux news on the 'net with my usual
zeal.
</p><p>
The leading story this month seems to be
"<a href="http://www.mindcraft.com/whitepapers/openbench1.html"
>Mindcraft III</a> --- The Return of the Benchmarkers."
The results of the benchmarking tests aren't surprising. NT
with IIS still fared better on this particular platform under
<a href="http://www.kegel.com/mindcraft_redux.html">these test conditions</a>
than the Linux+Apache+Samba combination. The Linux 2.2.9 kernel and
The Apache 1.3.6 release seems to have closed almost half of the gap.
</p><p>
As I suggested last month, the most interesting lessons from this
story have little to do with the programming and the numeric
results. There were technical issues in the 2.2.5 kernel that
were addressed by 2.2.9. I guess Apache was updated to use the
<tt>sendfile()</tt> system call. These are relatively minor tweaks.
</p><p>
Microsoft and Mindcraft collaborated for a significant amount
of time to find a set of conditions under which the Linux/Apache/
Samba combination would perform at a disadvantage to NT.
</p><p>
When MS and Mindcraft originally published their results the
suite of tests and the processes employeed were thoroughly
and quickly discredited. I've never seen such in-depth
analysis about the value (or lack thereof) of benchmarking in
the computing industry press.
</p><p>
Nonetheless, the developers of the involved open source packages
shrugged, analyzed the results, did some profiling of their own,
looked over their respective bits of code, devoted hours to
coding tweaks, a few days worth to tests, and spent some time
exchanging and debating different approaches to improving the code.
</p><p>
The important lessons from this are:
</p><ol>
<li> Just because a criticism is discredited, biased, and
possibly dishonest doesn't mean that we can't find
some clues to lead to real improvements. These
developers could have stuck their heads in the sand and
dismissed the whole topic as unimportant. They could
have felt that the PR and advocacy responses would
suffice.
<br> <br>
That "ostrich" approach is more commonly found in
corporate and government circles than among freeware
programmers. This is largely due to management. A
development manager at a large corporation will tend
to put as much energy into internal PR and "spin
control" as to any real improvement in the product.
Programmers often find themselves at odds with
their own management.
<br> <br>
<li> When we choose to attend to criticisms, it's vital
not to adopt their demonstration model as your objective.
We must stay true to our own requirements.
<br> <br>
It would be easy to focus on "beating the Mindcraft
benchmark" --- to insert special case code that exists
solely to produce superior results under the specific
conditions present in that suite of tests.
<br> <br>
This is referred to as "fraud."
<br> <br>
It would be technically easy for the kernel developers
to write the code for this. However, it would be
difficult to actually perpetrate this or any other fraud
in any open source project (since the code is there for
all to see --- and there are a number of people who
actually read that code).
<br> <br>
So, the Linux, Apache, and Samba developers showed
admirable focus on real improvements and seemed to
have eschewed any temptation to commit fraud.
<br> <br>
(We can't know whether the competition has rigged their
platform, since it is closed source and hasn't been
thoroughly audited by reputable independents).
</ol><p>
This leads us to a broader lesson. We can't properly evaluate any
statistics (benchmark results are statistics, after all) without
considering the source. What were the objectives (the
requirements) of the people involved? Are the objectives of the
people who took the measurements compatible with those of their
audience. In large part any statistic "means" what the presenter
intends it to "mean" (i.e. the number can only be applied to the
situation that was measured).
</p><p>
Benchmarks are employed primarily by two groups of people:
Software and hardware company marketeers, and computer periodical
writers, editors and publishers. Occasionally sysadmins and IT
people use statistics that are similar to benchmarks ---
simulations results --- for their performance tuning and
capacity planning work. Unfortunately these simulations are
often confused with benchmarks.
</p><ul>
<li> Interestingly the term benchmark probably stems from
physical "marks" (scratches or grooves), in work benches
used by woodworkers and other craftsmen to provide handy
measurements for their productions.
</ul><p>
Jim's first rule of requirements analysis is:
</p><blockquote>
Identify the involved parties.
</blockquote><p>
In this case we see two different producers of benchmarks and
a common audience (the potential customers, and the readership
are mostly the same). We also see that the real customers of
most periodicals are the advertisers --- which work for the same
corporations as the marketeers. This leads to a preference for
benchmarks that is bred of familiarity.
</p><p>
Most real people on the street don't "use" benchmarks. They may
be affected by them (as the opinions they form and get form others
are partially swayed by overall reputations of the organizations
that produce the benchmarks and those of the publications they read).
</p><p>
One of the best <a href="http://cs.alfred.edu/~lansdoct/mstest.html"
>responses</a> to the Mindcraft III results that I've
read is by Christopher Lansdown. Basically it turns the question around.
</p><p>
Instead of interpreting the top of the graphs as "how fast does
this go?" (a performance question) he looks at the bottom and the
"baseline" system configurations (intended for comparison) and
asks: "What is the most cost effective hardware and software
combination which will provide the optimal capacity?"
</p><p>
This is an objective which matches that of most IT directors,
sysadmins, webmasters and other people in the real world.
</p><p>
Let's consider the hypothetical question: Which is faster, an
ostrich or a penguin? Which is faster UNDERWATER?
</p><p>
What Christopher points out is that a single processor PC
with a couple hundred Mb of RAM and a single fast ethernet
card is adequate for serving simple, static HTML pages to
the web for any organization that has less than about 5 or
6 T1 (high speed) Internet lines. That is regardless of
the demand/load (millions of hits per day) since the webserver
will be idly waiting for the communications channels to clear
whenever the demand exceeds the channel capacity.
</p><p>
The Mindcraft benchmarks clearly demonstrate this fact.
You don't need NT with IIS and a 4 CPU SMP system with a
Gigabyte of RAM and four 100Mbps ethernet cards to provide
web services to the Internet. These results also suggest
rather strongly that you don't need that platform for
serving static HTML to your high speed Intranet.
</p><p>
Of course, the immediate retort is to question the applicability
of these results to <em>dynamic content</em>. The Mindcraft benchmark
design doesn't measure any form of dynamic content (but the
<a href="http://www.heise.de/ct/english//99/13/186-1/">c't magazine</a>
did - their article also has performance tuning hints for high-end hardware).
Given the obvious objectives of the designers of this benchmark
suite we can speculate that NT wouldn't fare as well in that scenario.
Other empirical and anecdotal evidence supports that hypothesis; most
users who have experience with Linux and NT webservers claim that
the Linux systems "seem" more responsive and more robust;
Microsoft uses about a half dozen separate NT webservers at their
site (which still "feels" slow to many users).
</p><p>
This brings us back to our key lesson. Selection of hardware and
software platforms should be based on requirements analysis.
Benchmarks serve the requirements of the people who produce and
disseminate them. Those requirements are unlikely to match those
of the people who will be ultimately selecting software and
hardware for real world deployment.
</p><p>
It is interesting to ask: "How does NT gain an advantage in this
situation?" and "What could Linux do to perform better under those
circumstances?"
</p><p>
From what I've read there are a few tricks that might help.
Apparently one of the issues in this scenario is the fact that
the system tested as four high speed ethernet cards.
</p><p>
Normally Linux (and other operating systems) are
"interrupt-driven" --- activity on an interface generates an
"interrupt" (a hardware event) which triggers some software
activity (to schedule a handler). This is normally a
efficient model. Most devices (network interfaces, hard disk
controllers, serial ports, keyboards, etc) only need to be
"serviced" occasionally (at rates that are glacial by
comparison to modern processors).
</p><p>
Apparently NT has some sort of option to disable interrupts on (at
least some) interfaces.
</p><p>
The other common model for handling I/O is called "polling." In
this case the CPU checks for new data as frequently as its
processing load allows. Polling is incredibly inefficient under
most circumstances.
</p><p>
However, under the conditions present in the Mindcraft survey
it can be more efficient and offer less latency than interrupt
driven techniques.
</p><p>
It would be sheer idiocy for Linux to adopt a straight polling
strategy for it's networking interfaces. However, it might be
possible to have a hybrid. If the interrupt frequency on a
given device exceeds one threshold the kernel might then switch
to polling on that device. When the polling shows that the
activity on that device as dropped back below another threshold it
might be able to switch back to interrupt-driven mode.
</p><p>
I don't know if this is feasible. I don't even know if it's
being considered by any Linux kernel developers. It might
involve some significant retooling of each of the ethernet
drivers. But, it is an interesting question. Other interesting
questions: Will this be of benefit to any significant number of
real world applications? Do those benefits outweigh the costs
of implementation (larger more complex kernels, more opportunities
for bugs, etc)?
</p><p>
Another obvious criticism of the whole Mindcraft scenario is the
use of Apache. The Apache team's priorities relate to correctness
(conformance to published standards), portability (the Apache
web server and related tools run on almost all forms of UNIX, not
just Linux; they even run on NT and its ilk), and features
(support for the many modules and forms of dynamic content, etc).
Note that performance isn't in the top three on this list.
</p><p>
Apache isn't the only web server available for Linux. It also
isn't the "vendor preferred" web server (whatever that would
mean!) So the primary justification for using it in these
benchmarks is that it is the dominant web server in the Linux
market. In fact Apache is the dominant web server on the Internet
as a whole. Over half of all publicly accessible web servers
run Apache or some derivative. (We might be tempted to draw a
conclusion from this. It might be that some features are more
important to more web masters than sheer performance speeds and
latencies. Of course that might be an erroneous conclusion ---
the dominance of Apache could be due to other factors. The
dominance of MS Windows is primarily and artifact of the PC
purchasing process --- MS Windows comes pre-installed, as did
MS-DOS before it).
</p><p>
So, what if we switch out Apache for some other web server.
</p><p>
Zeus (<a href="http://www.zeustech.net/products/zeus3/"
>http://www.zeustech.net/products/zeus3/</a>), a commercial
offering for Linux and other forms of UNIX, is probably the
fastest in existence.
</p><p>
thttpd (<a href="http://www.acme.com/software/thttpd/"
>http://www.acme.com/software/thttpd/</a>) is probably the
fastest in the "free" world. It's about as fast as the
experimental <a href="http://www.fenrus.demon.nl/">kHTTPd</a>
(an implementation of a web server that
runs directly in the kernel -- like the kNFSd that's available
for Linux 2.2.x).
</p><p>
Under many conditions thttpd (and probably kHTTPd) are a few
times faster than Apache. So they might beat NT + IIS by
about 100 to 200 per cent. Of course, performance analysis is
not that simple. If the kernel really is tied up in interrupt
processing for a major portion of it's time in the Mindcraft
scenario --- then the fast lightweight web server might offer
only marginal improvement FOR THAT TEST.
</p><p>
For us back in the real world the implication is clear, however.
If all you want to do is serve static pages with as little load
and delay as possible --- consider using a lightweight httpd.
</p><p>
Also back in the real world we get back to other questions.
How much does the hardware for a Mindcraft configuration cost?
How much would it cost for a normal corporation to
purchase/license the NT+IIS configuration that would be required
for that configuration? (If I recall correctly, Microsoft still
charges user licensing fees based on the desired capacity of
concurrent IIS processes/threads/connections. I don't know the
details, but I get the impression that you'd have to add a few
grand to the $900 copy of NT server to legally support a
"Mindcraft" configuration).
</p><p>
It's likely that a different test --- one whose objectives were
stated to more closely simulate a "real world" market might
give much different results.
</p><p>
Consider this:
</p><blockquote>
<strong>Objective:</strong> Build/configure a web service out of standard
commercially/freely available hardware and software components
such that the total cost of the installation/deployment would be
cost a typical customer less than $3000 outlay and no more than
$1000 per year of recurring expenses (not counting bandwidth and
ISP charges).
</blockquote><blockquote>
Participants will be free to bring any software and hardware that
conforms to these requirements and to perform any tuning or
optimizations they wish before and between scheduled executions
of the test suite.
</blockquote><blockquote>
<strong>Results:</strong> The competing configurations will be tested with a
mixture various sorts of common requests. The required responses
will include static and dynamic pages which will be checked for
correctness against a published baseline. Configurations
generating more than X errors will be disqualified. Response
times will be measured and graphed over a range of simulated
loads. Any service failures will be noted on the graph where they
occur. The graphs for each configuration will be computed based
on the averages over Y runs through the test suite.
</blockquote><blockquote>
The graphs will be published as the final results.
</blockquote><p>
The whole test could be redone for $5000 and $10000 price points
to give an overview of the scalability of each configuration.
</p><p>
Note that this proposed benchmark idea (it's not a complete
specification) doesn't generate a simple number. The graphs of
the entire performance are the result. This allows the potential
customer to gauge the configurations against their anticipated
requirements.
</p><p>
How would a team of Linux/Apache and Samba enthusiasts approach
this sort of contest? I'll save that question for next month.
</p><p>
Meanwhile, if you're enough of a glutton for my writing (an
odd form of PUNishment I'll admit) and my paltry selection of
answers, rants and ramblings for this month isn't enough then
take a look at a couple of my "Open Letters"
(<a href="http://www.starshine.org/jimd/openletters"
>http://www.starshine.org/jimd/openletters</a>).
By next month I hope
that my book (<a href="http://www.newriders.com/934-3.htm">Linux Systems
Administration</a>) will be off to the printers and my work at
Linuxcare will have reached a level where I can do MORE ANSWER GUY QUESTIONS!
</p>
<em><p>[ But not quite as many as January, ok? -- Heather ]</p></em>
<!-- end greeting -->
<!--startcut ======================================================= -->
<P> <hr> <P>
<H5 align="center"><a href="http://www.linuxgazette.com/ssc.copying.html"
>Copyright ©</a> 1999, James T. Dennis
<BR>Published in <I>The Linux Gazette</I> Issue 43 July 1999</H5>
<H6 ALIGN="center">HTML transformation by
<A HREF="mailto:star@starshine.org">Heather Stern</a> of
Starshine Techinical Services,
<A HREF="http://www.starshine.org/">http://www.starshine.org/</A>
</H6>
<P> <hr> <P>
<!-- begin lgnav ::::::::::::::::::::::::::::::::::::::::::::::::::: -->
<A HREF="./../lg_toc43.html"
><IMG SRC="./../gx/indexnew.gif" ALT="[ Table Of Contents ]"></A>
<A HREF="/index.html"
><IMG SRC="./../gx/homenew.gif" ALT="[ Front Page ]"></A>
<A HREF="./lg_bytes43.html"
><IMG SRC="./../gx/back2.gif" ALT="[ Previous Section ]"></A>
<A HREF="./lg_tips43.html"
><IMG SRC="./../gx/fwd.gif" ALT="[ Next Section ]"></A>
<!-- end lgnav ::::::::::::::::::::::::::::::::::::::::::::::::::::: -->
<!-- ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: -->
</BODY></HTML>
<!--endcut ========================================================= -->
|