File: lg_answer43.html

package info (click to toggle)
lg-issue43 2-5
links: PTS
area: main
in suites: woody
size: 1,708 kB
ctags: 191
sloc: makefile: 36; sh: 4
file content (488 lines) | stat: -rw-r--r-- 22,563 bytes
parent folder | download | duplicates (2)
<!--startcut ======================================================= -->
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<html>
<head>
<META NAME="generator" CONTENT="lgazmail v1.2M.b">
<TITLE>The Linux Gazette 43: The Answer Guy</TITLE>
</HEAD><BODY BGCOLOR="#FFFFFF" TEXT="#000000"
	LINK="#3366FF" VLINK="#A000A0">
<!-- ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: -->
<H4>"The Linux Gazette...<I>making Linux just a little more fun!</I>"</H4>
<P> <hr> <P>
<!-- ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: -->
<center>
<H1><A NAME="answer">
	<img src="./../gx/dennis/qbubble.gif" alt="(?)" 
		border="0" align="middle">
	<font color="#B03060">The Answer Guy</font>
	<img src="./../gx/dennis/bbubble.gif" alt="(!)" 
		border="0" align="middle">
</A></H1> 
<BR>
<H4>By James T. Dennis,
	<a href="mailto:answerguy@ssc.com">answerguy@ssc.com</a><BR>
	LinuxCare,
	<A HREF="http://www.linuxcare.com/">http://www.linuxcare.com/</A> 
</H4>
</center>

<p><hr><p>
<!--  endcut ======================================================= -->
<H3>Contents:</H3>
<p><a href="#tag/greeting"
	><img src="./../gx/dennis/bbub.gif" alt="(!)" border="0" 
	align="middle"><strong>Greetings From Jim Dennis</strong></A></p>

<DL>
<!-- index_text begins -->
<dt><A HREF="tag/1.html"
	><img src="./../gx/dennis/qbub.gif" height="28" width="50"
	  alt="(?)" border="0"
	><strong>Hey answer guy!!!</strong></a>

<dt><A HREF="tag/2.html"
	><img src="./../gx/dennis/qbub.gif" height="28" width="50"
	  alt="(?)" border="0"
	></a>One more thing. --or--
<dd><A HREF="tag/2.html"
	><strong>Null Modems: Connecting MS-DOS to Linux as a 
	Serial Terminal</strong></a>

<dt><A HREF="tag/3.html"
	><img src="./../gx/dennis/qbub.gif" height="28" width="50"
	  alt="(?)" border="0"
	></a>RedHat 5.2 Kernel 2.0.36 --or--
<dd><A HREF="tag/3.html"
	><strong>Upgrade Breaks Several Programs, 
	<TT>/proc</TT> Problems, BogoMIPS Discrepancies</strong></a>
<br>A visit to "Library Hell"

<dt><A HREF="tag/4.html"
	><img src="./../gx/dennis/qbub.gif" height="28" width="50"
	  alt="(?)" border="0"
	></a>Floppy/mount Problems: Disk Spins, 
	Lights are on, No one's Home? --or--
<dd><A HREF="tag/4.html"
	><strong>Floppy Failure: mdir Works; mount Fails</strong></a>
	<br>Found the culprit!

<dt><A HREF="tag/5.html"
	><img src="./../gx/dennis/qbub.gif" height="28" width="50"
	  alt="(?)" border="0"
	></a>need your help --or--
<dd><A HREF="tag/5.html"
	><strong>Incompetance in Parenting</strong></a>

<dt><A HREF="tag/6.html"
	><img src="./../gx/dennis/qbub.gif" height="28" width="50"
	  alt="(?)" border="0"
	></a>bad clusters --or--
<dd><A HREF="tag/6.html"
	><strong>Try Linux ... and Grammar</strong></a>

<dt><A HREF="tag/7.html"
	><img src="./../gx/dennis/qbub.gif" height="28" width="50"
	  alt="(?)" border="0"
	></a>Duplicating / --or--
<dd><A HREF="tag/7.html"
	><strong>Out of Space....or Inodes?  All Sparsity Lost?</strong></a>

<dt><A HREF="tag/8.html"
	><img src="./../gx/dennis/qbub.gif" height="28" width="50"
	  alt="(?)" border="0"
	></a>RAID 1 solutions --or--
<dd><A HREF="tag/8.html"
	><strong>Arco Duplidisk: Disk Mirroring</strong></a>

<dt><A HREF="tag/9.html"
	><img src="./../gx/dennis/qbub.gif" height="28" width="50"
	  alt="(?)" border="0"
	></a>Modem Help --or--
<dd><A HREF="tag/9.html"
	><strong>Searching for Days for a Linux Modem: 
	The Daze Continues</strong></a>

<!-- index_text ends -->
</DL>
<!--     .~~.~~.~~.~~.~~.~~.~~.~~.~~.~~.~~.~~.~~.~~.~~.~~.~~.~~.     -->
<A NAME="tag/greeting"><HR WIDTH="75%" ALIGN="center"></A>
<H3 align="left"><img src="./../gx/dennis/bbubble.gif" 
	height="50" width="60" alt="(!) " border="0"
	>Greetings from Jim Dennis</H3>
<!-- begin greeting -->
<p>So,  my LG activity for this month is pretty sparse.  Does
 that mean that I haven't been involved in any Linux activity?
 Does it mean that I'm not getting enough LG TAG e-mail?
</p><blockquote>
	HARDLY!
</blockquote><p>
 However, my work at <a href="http://www.linuxcare.com/">Linuxcare</a> 
 has been taking a pretty big bite 
 out of my time.  In addition the long drive up to the city 
 (from my house in Campbell to Linuxcare's offices in San Francisco
 is about 50 miles) keeps me away from the keyboard for far too
 long.  (Yes, I'm looking for <em>cheap</em> digs up in the city to keep
 my up there during the week).
</p><p>
 Mostly I've been working with our training department, presenting
 classes on Linux Systems Administration to our customers and our 
 new employees, and helping develop and refine the courseware around
 which the classes are built.
</p><p>
 I've also been watching the Linux news on the 'net with my usual
 zeal.  
</p><p>
 The leading story this month seems to be 
"<a href="http://www.mindcraft.com/whitepapers/openbench1.html"
	>Mindcraft III</a> --- The Return of the Benchmarkers."   
 The results of the benchmarking tests aren't surprising.  NT 
 with IIS still fared better on this particular platform under 
<a href="http://www.kegel.com/mindcraft_redux.html">these test conditions</a>
 than the Linux+Apache+Samba combination.  The Linux 2.2.9 kernel and 
 The Apache 1.3.6 release seems to have closed almost half of the gap.
</p><p>
 As I suggested last month, the most interesting lessons from this 
 story have little to do with the programming and the numeric 
 results.  There were technical issues in the 2.2.5 kernel that
 were addressed by 2.2.9.  I guess Apache was updated to use the 
 <tt>sendfile()</tt> system call.  These are relatively minor tweaks.  
</p><p>
 Microsoft and Mindcraft collaborated for a significant amount
 of time to find a set of conditions under which the Linux/Apache/
 Samba combination would perform at a disadvantage to NT.  
</p><p>
 When MS and Mindcraft originally published their results the 
 suite of tests and the processes employeed were thoroughly 
 and quickly discredited.  I've never seen such in-depth 
 analysis about the value (or lack thereof) of benchmarking in 
 the computing industry press.
</p><p>
 Nonetheless, the developers of the involved open source packages 
 shrugged, analyzed the results, did some profiling of their own,
 looked over their respective bits of code, devoted hours to 
 coding tweaks, a few days worth to tests, and spent some time 
 exchanging and debating different approaches to improving the code.
</p><p>
 The important lessons from this are:
</p><ol>
	<li> Just because a criticism is discredited, biased, and
	   possibly dishonest doesn't mean that we can't find
	   some clues to lead to real improvements.  These
	   developers could have stuck their heads in the sand and
	   dismissed the whole topic as unimportant.  They could 
	   have felt that the PR and advocacy responses would
	   suffice.
<br>&nbsp;<br>
	   That "ostrich" approach is more commonly found in
	   corporate and government circles than among freeware
	   programmers.  This is largely due to management.  A
	   development manager at a large corporation will tend
	   to put as much energy into internal PR and "spin
	   control" as to any real improvement in the product.
	   Programmers often find themselves at odds with 
	   their own management.
<br>&nbsp;<br>

	<li> When we choose to attend to criticisms, it's vital
	   not to adopt their demonstration model as your objective.
	   We must stay true to our own requirements.
<br>&nbsp;<br>
	   It would be easy to focus on "beating the Mindcraft
	   benchmark" --- to insert special case code that exists
	   solely to produce superior results under the specific
	   conditions present in that suite of tests.  
<br>&nbsp;<br>
	   This is referred to as "fraud."  
<br>&nbsp;<br>
	   It would be technically easy for the kernel developers 
	   to write the code for this.  However, it would be
	   difficult to actually perpetrate this or any other fraud 
	   in any open source project (since the code is there for
	   all to see --- and there are a number of people who
	   actually read that code).
<br>&nbsp;<br>
	   So, the Linux, Apache, and Samba developers showed 
	   admirable focus on real improvements and seemed to
	   have eschewed any temptation to commit fraud.
<br>&nbsp;<br>
	   (We can't know whether the competition has rigged their
	   platform, since it is closed source and hasn't been 
	   thoroughly audited by reputable independents).
</ol><p>
  This leads us to a broader lesson. We can't properly evaluate any
  statistics (benchmark results are statistics, after all) without
  considering the source.  What were the objectives (the
  requirements) of the people involved?  Are the objectives of the
  people who took the measurements compatible with those of their
  audience.  In large part any statistic "means" what the presenter
  intends it to "mean" (i.e. the number can only be applied to the
  situation that was measured).
</p><p>
  Benchmarks are employed primarily by two groups of people:
  Software and hardware company marketeers, and computer periodical
  writers, editors and publishers.  Occasionally sysadmins and IT
  people use statistics that are similar to benchmarks ---
  simulations results --- for their performance tuning and 
  capacity planning work.  Unfortunately these simulations are
  often confused with benchmarks.
</p><ul>
	<li> Interestingly the term benchmark probably stems from
	   physical "marks" (scratches or grooves), in work benches
	   used by woodworkers and other craftsmen to provide handy
	   measurements for their productions.
</ul><p>
  Jim's first rule of requirements analysis is:
</p><blockquote>
		Identify the involved parties.
</blockquote><p>
  In this case we see two different producers of benchmarks and
  a common audience (the potential customers, and the readership 
  are mostly the same).  We also see that the real customers of
  most periodicals are the advertisers --- which work for the same
  corporations as the marketeers.  This leads to a preference for
  benchmarks that is bred of familiarity.
</p><p>
  Most real people on the street don't "use" benchmarks.  They may
  be affected by them (as the opinions they form and get form others 
  are partially swayed by overall reputations of the organizations
  that produce the benchmarks and those of the publications they read).   
</p><p>
  One of the best <a href="http://cs.alfred.edu/~lansdoct/mstest.html"
  >responses</a> to the Mindcraft III results that I've
  read is by Christopher Lansdown.  Basically it turns the question around.
</p><p>
  Instead of interpreting the top of the graphs as "how fast does
  this go?" (a performance question) he looks at the bottom and the
  "baseline" system configurations (intended for comparison) and 
  asks:  "What is the most cost effective hardware and software 
  combination which will provide the optimal capacity?"
</p><p>
  This is an objective which matches that of most IT directors, 
  sysadmins, webmasters and other people in the real world. 
</p><p>
  Let's consider the hypothetical question:  Which is faster, an
  ostrich or a penguin?  Which is faster UNDERWATER?  
</p><p>
  What Christopher points out is that a single processor PC
  with a couple hundred Mb of RAM and a single fast ethernet
  card is adequate for serving simple, static HTML pages to 
  the web for any organization that has less than about 5 or 
  6 T1 (high speed) Internet lines.  That is regardless of 
  the demand/load (millions of hits per day) since the webserver
  will be idly waiting for the communications channels to clear
  whenever the demand exceeds the channel capacity.
</p><p>
  The Mindcraft benchmarks clearly demonstrate this fact.  
  You don't need NT with IIS and a 4 CPU SMP system with a 
  Gigabyte of RAM and four 100Mbps ethernet cards to provide
  web services to the Internet.  These results also suggest 
  rather strongly that you don't need that platform for 
  serving static HTML to your high speed Intranet.
</p><p>
  Of course, the immediate retort is to question the applicability
  of these results to <em>dynamic content</em>.  The Mindcraft benchmark
  design doesn't measure any form of dynamic content (but the 
  <a href="http://www.heise.de/ct/english//99/13/186-1/">c't magazine</a>
  did - their article also has performance tuning hints for high-end hardware).
  Given the obvious objectives of the designers of this benchmark 
  suite we can speculate that NT wouldn't fare as well in that scenario.  
  Other empirical and anecdotal evidence supports that hypothesis; most
  users who have experience with Linux and NT webservers claim that
  the Linux systems "seem" more responsive and more robust;
  Microsoft uses about a half dozen separate NT webservers at their
  site (which still "feels" slow to many users).
</p><p>
  This brings us back to our key lesson.  Selection of hardware and
  software platforms should be based on requirements analysis.
  Benchmarks serve the requirements of the people who produce and
  disseminate them.  Those requirements are unlikely to match those
  of the people who will be ultimately selecting software and
  hardware for real world deployment.
</p><p>
  It is interesting to ask:  "How does NT gain an advantage in this 
  situation?" and "What could Linux do to perform better under those
  circumstances?"
</p><p>
  From what I've read there are a few tricks that might help.  
  Apparently one of the issues in this scenario is the fact that
  the system tested as four high speed ethernet cards.  
</p><p>
  Normally Linux (and other operating systems) are
  "interrupt-driven" --- activity on an interface generates an
  "interrupt" (a hardware event) which triggers some software
  activity (to schedule a handler).   This is normally a 
  efficient model.  Most devices (network interfaces, hard disk
  controllers, serial ports, keyboards, etc) only need to be 
  "serviced" occasionally (at rates that are glacial by 
  comparison to modern processors).
</p><p>
  Apparently NT has some sort of option to disable interrupts on (at 
  least some) interfaces.  
</p><p>
  The other common model for handling I/O is called "polling."  In 
  this case the CPU checks for new data as frequently as its  
  processing load allows.  Polling is incredibly inefficient under
  most circumstances.  
</p><p>
  However, under the conditions present in the Mindcraft survey
  it can be more efficient and offer less latency than interrupt 
  driven techniques.
</p><p>
  It would be sheer idiocy for Linux to adopt a straight polling
  strategy for it's networking interfaces.  However, it might be
  possible to have a hybrid.  If the interrupt frequency on a 
  given device exceeds one threshold the kernel might then switch
  to polling on that device.  When the polling shows that the 
  activity on that device as dropped back below another threshold it 
  might be able to switch back to interrupt-driven mode.
</p><p>
  I don't know if this is feasible.  I don't even know if it's 
  being considered by any Linux kernel developers.  It might 
  involve some significant retooling of each of the ethernet 
  drivers.  But, it is an interesting question.  Other interesting 
  questions: Will this be of benefit to any significant number of
  real world applications?  Do those benefits outweigh the costs
  of implementation (larger more complex kernels, more opportunities
  for bugs, etc)?
</p><p>
  Another obvious criticism of the whole Mindcraft scenario is the
  use of Apache.  The Apache team's priorities relate to correctness
  (conformance to published standards), portability (the Apache 
  web server and related tools run on almost all forms of UNIX, not
  just Linux; they even run on NT and its ilk), and features
  (support for the many modules and forms of dynamic content, etc).
  Note that performance isn't in the top three on this list.
</p><p>
  Apache isn't the only web server available for Linux.  It also 
  isn't the "vendor preferred" web server (whatever that would
  mean!)  So the primary justification for using it in these 
  benchmarks is that it is the dominant web server in the Linux
  market.  In fact Apache is the dominant web server on the Internet 
  as a whole.  Over half of all publicly accessible web servers
  run Apache or some derivative. (We might be tempted to draw a 
  conclusion from this.  It might be that some features are more
  important to more web masters than sheer performance speeds and
  latencies.  Of course that might be an erroneous conclusion ---
  the dominance of Apache could be due to other factors.  The 
  dominance of MS Windows is primarily and artifact of the PC
  purchasing process --- MS Windows comes pre-installed, as did 
  MS-DOS before it).
</p><p>
  So, what if we switch out Apache for some other web server.  
</p><p>
  Zeus (<a href="http://www.zeustech.net/products/zeus3/"
  >http://www.zeustech.net/products/zeus3/</a>), a commercial
  offering for Linux and other forms of UNIX, is probably the
  fastest in existence.  
</p><p>
  thttpd (<a href="http://www.acme.com/software/thttpd/"
  >http://www.acme.com/software/thttpd/</a>) is probably the
  fastest in the "free" world.  It's about as fast as the 
  experimental <a href="http://www.fenrus.demon.nl/">kHTTPd</a> 
  (an implementation of a web server that 
  runs directly in the kernel -- like the kNFSd that's available
  for Linux 2.2.x).
</p><p>
  Under many conditions thttpd (and probably kHTTPd) are a few
  times faster than Apache.  So they might beat NT + IIS by
  about 100 to 200 per cent.  Of course, performance analysis is
  not that simple.  If the kernel really is tied up in interrupt
  processing for a major portion of it's time in the Mindcraft
  scenario --- then the fast lightweight web server might offer 
  only marginal improvement FOR THAT TEST.
</p><p>
  For us back in the real world the implication is clear, however.
  If all you want to do is serve static pages with as little load
  and delay as possible --- consider using a lightweight httpd.
</p><p>
  Also back in the real world we get back to other questions.
  How much does the hardware for a Mindcraft configuration cost?
  How much would it cost for a normal corporation to
  purchase/license the NT+IIS configuration that would be required
  for that configuration?  (If I recall correctly, Microsoft still
  charges user licensing fees based on the desired capacity of 
  concurrent IIS processes/threads/connections.  I don't know the 
  details, but I get the impression that you'd have to add a few
  grand to the $900 copy of NT server to legally support a 
  "Mindcraft" configuration).
</p><p>
  It's likely that a different test --- one whose objectives were
  stated to more closely simulate a "real world" market might
  give much different results.
</p><p>
  Consider this:
</p><blockquote>
  <strong>Objective:</strong> Build/configure a web service out of standard
  commercially/freely available hardware and software components
  such that the total cost of the installation/deployment would be
  cost a typical customer less than $3000 outlay and no more than
  $1000 per year of recurring expenses (not counting bandwidth and
  ISP charges).
</blockquote><blockquote>
  Participants will be free to bring any software and hardware that
  conforms to these requirements and to perform any tuning or
  optimizations they wish before and between scheduled executions
  of the test suite.
</blockquote><blockquote>
  <strong>Results:</strong>  The competing configurations will be tested with a 
  mixture various sorts of common requests.  The required responses
  will include static and dynamic pages which will be checked for
  correctness against a published baseline.  Configurations
  generating more than X errors will be disqualified.  Response
  times will be measured and graphed over a range of simulated
  loads. Any service failures will be noted on the graph where they
  occur.  The graphs for each configuration will be computed based
  on the averages over Y runs through the test suite.
</blockquote><blockquote>
  The graphs will be published as the final results.
 </blockquote><p> 
 The whole test could be redone for $5000 and $10000 price points
 to give an overview of the scalability of each configuration.
 </p><p> 
 Note that this proposed benchmark idea (it's not a complete 
 specification) doesn't generate a simple number.  The graphs of
 the entire performance are the result.  This allows the potential
 customer to gauge the configurations against their anticipated
 requirements.
</p><p>
 How would a team of Linux/Apache and Samba enthusiasts approach
 this sort of contest?  I'll save that question for next month.
</p><p>
  Meanwhile, if you're enough of a glutton for my writing (an
  odd form of PUNishment I'll admit) and my paltry selection of
  answers, rants and ramblings for this month isn't enough then
  take a look at a couple of my "Open Letters"
  (<a href="http://www.starshine.org/jimd/openletters"
  	>http://www.starshine.org/jimd/openletters</a>).  
  By next month I hope 
  that my book (<a href="http://www.newriders.com/934-3.htm">Linux Systems 
  Administration</a>) will be off to the printers and my work at 
  Linuxcare will have reached a level where I can do MORE ANSWER GUY QUESTIONS!
</p>
<em><p>[ But not quite as many as January, ok? -- Heather ]</p></em>

<!-- end greeting -->
<!--startcut ======================================================= -->
<P> <hr> <P>
<H5 align="center"><a href="http://www.linuxgazette.com/ssc.copying.html"
	>Copyright &copy;</a> 1999, James T. Dennis 
<BR>Published in <I>The Linux Gazette</I> Issue 43 July 1999</H5>
<H6 ALIGN="center">HTML transformation  by
	<A HREF="mailto:star@starshine.org">Heather Stern</a> of
	Starshine Techinical Services,
	<A HREF="http://www.starshine.org/">http://www.starshine.org/</A> 
</H6>
<P> <hr> <P>
<!-- begin lgnav ::::::::::::::::::::::::::::::::::::::::::::::::::: -->
<A HREF="./../lg_toc43.html"
	><IMG SRC="./../gx/indexnew.gif" ALT="[ Table Of Contents ]"></A>
<A HREF="/index.html"
	><IMG SRC="./../gx/homenew.gif" ALT="[ Front Page ]"></A>
<A HREF="./lg_bytes43.html"
	><IMG SRC="./../gx/back2.gif" ALT="[ Previous Section ]"></A>
<A HREF="./lg_tips43.html"
	><IMG SRC="./../gx/fwd.gif" ALT="[ Next Section ]"></A>
<!-- end lgnav ::::::::::::::::::::::::::::::::::::::::::::::::::::: -->
<!-- ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: -->
</BODY></HTML>
<!--endcut ========================================================= -->