1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461
|
<!--startcut ======================================================= -->
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<html>
<head>
<META NAME="generator" CONTENT="lgazmail v1.4F.u">
<TITLE>The Answer Gang 80: How to Investigate a System Lockup</TITLE>
</HEAD><BODY BGCOLOR="#FFFFFF" TEXT="#000000"
LINK="#3366FF" VLINK="#A000A0">
<!--endcut ========================================================= -->
<P> <hr>
<!--startcut ======================================================= -->
<CENTER>
<!-- *** BEGIN navbar *** -->
<!-- *** END navbar *** -->
</CENTER>
</p>
<!--endcut ========================================================= -->
<!--startcut ======================================================= -->
<P> <hr>
<!-- begin tagnav ::::::::::::::::::::::::::::::::::::::::::::::::::-->
<p align="center">
<table width="100%" border="0"><tr>
<td align="right" valign="center"
><IMG ALT="" SRC="../../gx/navbar/left.jpg"
WIDTH="14" HEIGHT="45" BORDER="0" ALIGN="middle" border="0"
><A HREF="../index.html"
><IMG SRC="../../gx/navbar/toc.jpg" align="middle"
ALT="[ Table Of Contents ]" border="0"></A
><A HREF="../lg_answer.html"
><IMG SRC="../../gx/dennis/answertoc.jpg" align="middle"
ALT="[ Answer Guy Current Index ]" border="0"></A></td>
<td align="center" valign="center"><A HREF="../lg_answer.html#greeting"><img align="middle"
src="../../gx/dennis/smily.gif" alt="greetings" border="0"></A>
<A HREF="../tag/bios.html">Meet the Gang</A>
<A HREF="1.html">1</A>
<A HREF="2.html">2</A>
<A HREF="3.html">3</A>
<A HREF="4.html">4</A>
<A HREF="5.html">5</A>
<A HREF="6.html">6</A>
<A HREF="7.html">7</A>
</td>
<td align="left" valign="center"><A HREF="../../tag/kb.html"
><IMG SRC="../../gx/dennis/answerpast.jpg" align="middle"
ALT="[ Index of Past Answers ]" border="0"></A
><IMG ALT="" SRC="../../gx/navbar/right.jpg" align="middle"
WIDTH="14" HEIGHT="45" BORDER="0"></td></tr></table>
</p>
<!-- end tagnav ::::::::::::::::::::::::::::::::::::::::::::::::::::-->
<!--endcut ========================================================= -->
<P> <hr> <P>
<!-- ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: -->
<center>
<H1><A NAME="answer">
<img src="../../gx/dennis/qbubble.gif" alt="(?)"
border="0" align="middle">
<font color="#B03060">The Answer Gang</font>
<img src="../../gx/dennis/bbubble.gif" alt="(!)"
border="0" align="middle">
</A></H1>
<BR>
<H4>By Jim Dennis, Ben Okopnik, Dan Wilder, Breen, Chris, and...
(<a href="tag/bios.html">meet the Gang</a>) ...
the Editors of Linux Gazette...
and You!
<br>Send questions (or interesting answers) to
The Answer Gang
for possible publication
(but read the <a href="../tag/ask-the-gang.html">guidelines</a> first)
</H4>
</center>
<!-- ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: -->
<p><hr><p>
<!-- begin 1 -->
<H3 align="left"><img src="../../gx/dennis/qbubble.gif"
height="50" width="60" alt="(?) " border="0"
>How to Investigate a System Lockup</H3>
<p><strong>From Chris Gianakopoulos
</strong></p>
<p align="right"><strong>Answered By Didier Heyden, Breen Mullins, Ben Okopnik, Jim Dennis, John Karns
<br>with tidbits by Robos, Heather Stern
</strong></p>
<P><STRONG>
Hi Gang,
</STRONG></P>
<blockQuote>
<IMG SRC="../../gx/dennis/bbub.gif" ALT="(!)"
HEIGHT="28" WIDTH="50" BORDER="0"
> [Didier]
Hello, Chris!
</blockQuote>
<blockQuote>
<IMG SRC="../../gx/dennis/bbub.gif" ALT="(!)"
HEIGHT="28" WIDTH="50" BORDER="0"
> [Robos]
Hi
</blockQuote>
<P><STRONG>
<IMG SRC="../../gx/dennis/qbub.gif" ALT="(?)"
HEIGHT="28" WIDTH="50" BORDER="0"
>
I was running X tonight (with the ICEWM window manager), I had a couple of
xterms running (one with kermit running), and I was using Acrobat Reader
Version 4.0.
</STRONG></P>
<P><STRONG>
As I was making a mouse movement, the my console locked up.
</STRONG></P>
<blockQuote>
<IMG SRC="../../gx/dennis/bbub.gif" ALT="(!)"
HEIGHT="28" WIDTH="50" BORDER="0"
> [Robos]
Don't you have to reboot when you make a mouse-movement? Oh, wait,
that's that other thing that claims to be an os...
<IMG SRC="../../gx/dennis/smily.gif" ALT=";-)"
height="24" width="20" align="middle">
</blockQuote>
<blockquote><em><font color="#000066">No less than 4 other gang members chimed in with some version of a
sigblock fortune cookie about this.
-- Heather</font></em></blockquote>
<P><STRONG>
<IMG SRC="../../gx/dennis/qbub.gif" ALT="(?)"
HEIGHT="28" WIDTH="50" BORDER="0"
>
I could not even
get a response, via the Ethernet, when trying to ping my crippled Linux
system.
</STRONG></P>
<P><STRONG>
Which log files could I look at to try to determine what the impending
disaster could have been? I have included the tail portion of
<TT>/var/log/messages.</TT> I have included extra stuff, I suspect. I'm curious
what those entries that say "MARK" mean. Could that be related to my
lockup?
</STRONG></P>
<blockQuote>
<IMG SRC="../../gx/dennis/bbub.gif" ALT="(!)"
HEIGHT="28" WIDTH="50" BORDER="0"
> [Didier]
Nope. From the `syslogd' man page:
</blockQuote>
<blockquote><pre> -m interval
The syslogd logs a mark timestamp regularly. The
default interval between two -- MARK -- lines is 20
minutes. This can be changed with this option.
</pre></blockquote>
<blockQuote>
(However it seems that this feature is disabled in some versions of the
syslog daemon -- maybe through a compile-time option?).
</blockQuote>
<P><STRONG>
<IMG SRC="../../gx/dennis/qbub.gif" ALT="(?)"
HEIGHT="28" WIDTH="50" BORDER="0"
>
Okay. I'll investigate other stuff.
</STRONG></P>
<blockQuote>
<IMG SRC="../../gx/dennis/bbub.gif" ALT="(!)"
HEIGHT="28" WIDTH="50" BORDER="0"
> [Breen]
At least on <A HREF="http://www.redhat.com/">Red Hat</A> it's through a run-time argument.
</blockQuote>
<blockQuote>
The init script for syslogd reads <TT>/etc/sysconfig/syslog</TT> for its
arguments:
</blockQuote>
<blockquote><pre># Options to syslogd
# -m 0 disables 'MARK' messages.
# -r enables logging from remote machines
# -x disables DNS lookups on messages recieved with -r
# See syslogd(8) for more details
SYSLOGD_OPTIONS="-m 0 -r -x"
</pre></blockquote>
<blockQuote>
"-m 0"
is the default; I added
"-r -x"
on this machine.
</blockQuote>
<blockQuote>
<IMG SRC="../../gx/dennis/bbub.gif" ALT="(!)"
HEIGHT="28" WIDTH="50" BORDER="0"
> [Didier]
Fairly redhat-ish, indeed. My own system is based on an antediluvian
RH 5.2 distro. I'm usually not too impatient to upgrade with a full new
distro install (preferring recompiling packages from source -- RPM'ed
or not -- iff I can't no longer avoid it). Believe it or not, I haven't
drowned yet in the resulting mess
<IMG SRC="../../gx/dennis/smily.gif" ALT=":)"
height="24" width="20" align="middle">
</blockQuote>
<blockQuote>
By that time they just had no such configuration file, and the syslog
daemon was run without <EM>any</EM> argument by default. But somehow the `--
MARK --' feature was... erm, is still in my case... <EM>totally</EM> disabled:
whatever -m xx option I try no timestamp appears in the logs.
</blockQuote>
<blockQuote>
<IMG SRC="../../gx/dennis/bbub.gif" ALT="(!)"
HEIGHT="28" WIDTH="50" BORDER="0"
> [JimD]
Actually I think this was a bug. I reported it to the upstream
maintainer a few years ago (when I was running RH5.2) and he pointed
me to the updated version that worked.
</blockQuote>
<blockQuote>
Naturally I'd advise that you simply fetch the latest version
(in source form if you don't want to get trapped in RPM dependency
upgrade hell) and build/install that.
</blockQuote>
<blockQuote>
<IMG SRC="../../gx/dennis/bbub.gif" ALT="(!)"
HEIGHT="28" WIDTH="50" BORDER="0"
> [Didier]
Thank you very much for your suggestions, Jim. Now I know what package
to download next.
</blockQuote>
<blockQuote>
Regarding the RPM dependency hell, IIRC I once experienced core dumps
from the `rpm' program itself after having fiddled with the `--nodeps'
option (I was supposed to know what I was doing
<IMG SRC="../../gx/dennis/smily.gif" ALT=":)"
height="24" width="20" align="middle"> The problem was
(hopefully) fixed with this simple command:
</blockQuote>
<blockQuote><CODE>
rpm --rebuilddb
</CODE></blockQuote>
<blockQuote>
I'm not sure it would have worked in all situations, though. And
unfortunately I don't remember the exact version that was then installed
on my system. In fact this has most probably been fixed <EM>ages</EM> ago...
</blockQuote>
<blockQuote>
<IMG SRC="../../gx/dennis/bbub.gif" ALT="(!)"
HEIGHT="28" WIDTH="50" BORDER="0"
> [Didier]
Note that I also have a couple of problems with the associated `klogd'
daemon, as indicated by the last two lines of the following excerpt:
</blockQuote>
<blockquote><pre>Jun 4 14:13:56 wallace kernel: klogd 1.3-3, log source = /proc/kmsg started.
Jun 4 14:13:57 wallace kernel: Loaded 15309 symbols from /boot/System.map.
Jun 4 14:13:57 wallace kernel: Symbols match kernel version 2.4.17.
Jun 4 14:13:57 wallace kernel: Error seeking in /dev/kmem
Jun 4 14:13:57 wallace kernel: Error adding kernel module table entry.
</pre></blockquote>
<blockQuote>
The other weird thing is that that ancient kernel log daemon cannot be
stopped by anything but a plain SIGKILL. Doesn't prevent me from having
nice dreams, however.
</blockQuote>
<blockQuote>
<IMG SRC="../../gx/dennis/bbub.gif" ALT="(!)"
HEIGHT="28" WIDTH="50" BORDER="0"
> [Didier]
Unfortunately, when one experiences such brutal lockups, the logs are
often not of much use: the whole system freezes before the daemon is
given a chance to write anything in them -- even if some kernel oops
actually occurred. The only way to see this happening would be to have
the kernel writing directly to the console (assuming you're currently
viewing the console output, but it won't do in a X session unless,
maybe, console output has been redirected to a serial port at boot
time?)
</blockQuote>
<blockQuote>
Upgrading your kernel might help, provided the lockup was not caused
by some hardware (RAM?) failure.
</blockQuote>
<blockQuote>
<IMG SRC="../../gx/dennis/bbub.gif" ALT="(!)"
HEIGHT="28" WIDTH="50" BORDER="0"
> [Ben]
That's pretty much what I would suspect - hardware. The only times I've
seen Linux hang has been hardware-related stuff. In one very annoying
case, my laptop would hang for a number of seconds, several times per
day - and I had to live with it, because the PCMCIA card causing it was
my wireless modem which was on 24x7. AFAICT, it took a huge chunk of CPU
when it switched channels (sometimes the CPU load meter would actually
catch the spike before everything froze); fortunately, it didn't do that
very often.
</blockQuote>
<blockQuote>
<IMG SRC="../../gx/dennis/bbub.gif" ALT="(!)"
HEIGHT="28" WIDTH="50" BORDER="0"
> [Didier]
Another example is an IDE cd-rom or cd-writer device buggy enough to
suck up every possible CPU clock cycle whenever it fails to read or
burn the medium, the system thus becoming almost unusable -- especially
in the case where the application which makes use of it is run with a
static real-time priority (cf. `cdrecord'). Actually I've never figured
out whether some ill-written code in the IDE <TT>/</TT> IDE-SCSI driver could be
held responsible for such a misbehavior or if it was simply inevitable
on this kind of architecture.
</blockQuote>
<blockQuote>
Real-time constraints in a multi-tasking operating system are often
very difficult to deal with anyway.
</blockQuote>
<blockQuote>
<IMG SRC="../../gx/dennis/bbub.gif" ALT="(!)"
HEIGHT="28" WIDTH="50" BORDER="0"
> [Ben]
<sigh> Hardware stuff like PCMCIA has root-level access -
has to, to access privileged ports, etc. - and unfortunately I know of
no way to mitigate that. I wish there was a "nice" utility for
hardware...
</blockQuote>
<blockquote><IMG SRC="../../gx/dennis/bbub.gif" ALT="(!)"
HEIGHT="28" WIDTH="50" BORDER="0"
> [Heather] ACPI might like to be that, someday.
</blockquote>
<blockQuote>
<IMG SRC="../../gx/dennis/bbub.gif" ALT="(!)"
HEIGHT="28" WIDTH="50" BORDER="0"
> [Didier]
I once read that running a shell with a posix real-time scheduling
policy could help in some situations. Unfortunately I've never heard
either of a `nice'-like utility which could be used to launch `bash',
`csh', etc. this way. I assume that in fact you must have a special
version of your favorite shell, containing direct calls to
<TT>sched_setscheduler()</TT>, in order to do that -- but I'm not sure.
</blockQuote>
<P><STRONG>
<IMG SRC="../../gx/dennis/qbub.gif" ALT="(?)"
HEIGHT="28" WIDTH="50" BORDER="0"
>
RAM is always a possibility. The system seems awful reliable, though.
Maybe it IS time to upgrade to a new distro just for the fun of it.
I say distro rather than kernel so that I can use XFree86 version 4.x. My
friends at work keep offering <A HREF="http://www.suse.com/">SuSE</A> 8.0. I believe that the S3 Trio64v+ is
supported, so nothing is really stopping me from going to the new distro.
</STRONG></P>
<P><STRONG>
I am guessing that it is related to whatever applications might have been
running under X in combination with Acrobat (if not a hardware problem).
Dynamic systems are always the most difficult to troubleshoot.
</STRONG></P>
<p align="center">See attached <tt><a href="../misc/tag/chrisg.logfile.txt">chrisg.logfile.txt</a></tt></p>
<blockQuote>
<IMG SRC="../../gx/dennis/bbub.gif" ALT="(!)"
HEIGHT="28" WIDTH="50" BORDER="0"
> [John]
Continuing on the kernel side of the issue, a thread on a related subject
just came up on a LUG list I'm on:
</blockQuote>
<TABLE WIDTH="95%" BORDER="1" BGCOLOR="#FFFFCC"><TR><TD>
<p align="center">...............</p>
<blockQuote><BLOCKQuote>
Various applications find System.map themselves, based on a standardized
search path and name scheme. The non-specific name version "System.map"
is the last taken, first it tries to find it as:
System.map-${uname -r}
</BLOCKQuote></blockQuote>
<blockQuote>
Now if you have "System.map", and multiple kernels, without specifically
named System.map files, then only one boot kernel will find the right
System.map. Not everything needs kernel symbols to work right, but some
do, those are the ones that will have problems. Perhaps even with
different kernels, the symbol search scheme will still find the right
place for the symbol it needs (I'm not sure what scheme it uses, e.g.,
it might be a simple offset). Lilo itself does not have any knowledge of
System.map, as far as I know (I'm not 100% certain, but probably about
90% certain). Now one place that is searched is the standard kernel
build source location, <TT>/usr/src/linux/</TT> (or maybe <TT>/usr/src/linux-2.4/</TT> in
some cases), and so if you install from that, and do not alter
System.map in that directory, then you symbols should be resolved until
you build a new kernel and overwrite the old one.
</blockQuote><p align="center">...............</p>
</TD></TR></TABLE>
<P><STRONG>
<IMG SRC="../../gx/dennis/qbub.gif" ALT="(?)"
HEIGHT="28" WIDTH="50" BORDER="0"
>
Thank you all (there are so many names to list!) for your quick responses
to my question. I'm gonna do some detective work. My perception was that
the system locked up. The only thing that I really know is that the console
and the network did not respond. I got two serial ports on my system. I
dedicate one to the modem, and I use the other for kermitting around. I
think that I am going to use my nonmodem serial port for a login session.
Would it not be funny if the system was still running and only my network
stuff failed as a result of an X lockup?
</STRONG></P>
<P><STRONG>
That would seem odd, though. Since I was running X via my local console
(you know -- with the keyboard and display), I would expect Unix domain
sockets to be used, thus, bypassing TCP (the network stream stuff).
</STRONG></P>
<P><STRONG>
You all gave me lots of good ideas, and thanks much again. This email
response is like a broadcast thanks to all of you!
</STRONG></P>
<!-- end 1 -->
<P> <hr> </p>
<!-- *** BEGIN copyright *** -->
<H5 align="center">This page edited and maintained by the Editors
of <I>Linux Gazette</I>
<a href=""
>Copyright ©</a> 2002
<BR>Published in issue 80 of <I>Linux Gazette</I> July 2002</H5>
<H6 ALIGN="center">HTML script maintained by
<A HREF="mailto:star@starshine.org">Heather Stern</a> of
Starshine Technical Services,
<A HREF="http://www.starshine.org/">http://www.starshine.org/</A>
</H6>
<!-- *** END copyright *** -->
<!--startcut ======================================================= -->
<P> <hr>
<!-- begin tagnav ::::::::::::::::::::::::::::::::::::::::::::::::::-->
<p align="center">
<table width="100%" border="0"><tr>
<td align="right" valign="center"
><IMG ALT="" SRC="../../gx/navbar/left.jpg"
WIDTH="14" HEIGHT="45" BORDER="0" ALIGN="middle" border="0"
><A HREF="../index.html"
><IMG SRC="../../gx/navbar/toc.jpg" align="middle"
ALT="[ Table Of Contents ]" border="0"></A
><A HREF="../lg_answer.html"
><IMG SRC="../../gx/dennis/answertoc.jpg" align="middle"
ALT="[ Answer Guy Current Index ]" border="0"></A></td>
<td align="center" valign="center"><A HREF="../lg_answer.html#greeting"><img align="middle"
src="../../gx/dennis/smily.gif" alt="greetings" border="0"></A>
<A HREF="../tag/bios.html">Meet the Gang</A>
<A HREF="1.html">1</A>
<A HREF="2.html">2</A>
<A HREF="3.html">3</A>
<A HREF="4.html">4</A>
<A HREF="5.html">5</A>
<A HREF="6.html">6</A>
<A HREF="7.html">7</A>
</td>
<td align="left" valign="center"><A HREF="../../tag/kb.html"
><IMG SRC="../../gx/dennis/answerpast.jpg" align="middle"
ALT="[ Index of Past Answers ]" border="0"></A
><IMG ALT="" SRC="../../gx/navbar/right.jpg" align="middle"
WIDTH="14" HEIGHT="45" BORDER="0"></td></tr></table>
</p>
<!-- end tagnav ::::::::::::::::::::::::::::::::::::::::::::::::::::-->
<!--endcut ========================================================= -->
<P> <hr>
<!--startcut ======================================================= -->
<CENTER>
<!-- *** BEGIN navbar *** -->
<!-- *** END navbar *** -->
</CENTER>
</p>
<!--endcut ========================================================= -->
<!--startcut ======================================================= -->
</BODY></HTML>
<!--endcut ========================================================= -->
|