1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598
|
<!doctype html public "-//W3C//DTD HTML 4.01 Transitional//EN"
"https://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<title> Postfix Debugging Howto </title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<link rel='stylesheet' type='text/css' href='postfix-doc.css'>
</head>
<body>
<h1><img src="postfix-logo.jpg" width="203" height="98" ALT="">Postfix Debugging Howto</h1>
<hr>
<h2>Purpose of this document</h2>
<p> This document describes how to debug parts of the Postfix mail
system when things do not work according to expectation. The methods
vary from making Postfix log a lot of detail, to running some daemon
processes under control of a call tracer or debugger. </p>
<p> The text assumes that the Postfix main.cf and master.cf
configuration files are stored in directory /etc/postfix. You can
use the command "<b>postconf config_directory</b>" to find out the
actual location of this directory on your machine. </p>
<p> Listed in order of increasing invasiveness, the debugging
techniques are as follows: </p>
<ul>
<li><a href="#logging">Look for obvious signs of trouble</a>
<li><a href="#trace_mail">Debugging Postfix from inside</a>
<li><a href="#no_chroot">Try turning off chroot operation in
master.cf</a>
<li><a href="#debug_peer">Verbose logging for specific SMTP
connections</a>
<li><a href="#sniffer">Record the SMTP session with a network
sniffer</a>
<li><a href="#verbose">Making Postfix daemon programs more verbose</a>
<li><a href="#man_trace">Manually tracing a Postfix daemon process</a>
<li><a href="#auto_trace">Automatically tracing a Postfix daemon
process</a>
<li><a href="#ddd">Running daemon programs with the interactive
ddd debugger</a>
<li><a href="#screen">Running daemon programs with the interactive
gdb debugger</a>
<li><a href="#gdb">Running daemon programs under a non-interactive
debugger</a>
<li><a href="#unreasonable">Unreasonable behavior</a>
<li><a href="#mail">Reporting problems to postfix-users@postfix.org</a>
</ul>
<h2><a name="logging">Look for obvious signs of trouble</a></h2>
<p> Postfix logs all failed and successful deliveries to a logfile. </p>
<ul>
<li> <p> When Postfix uses syslog logging (the default), the file
is usually called /var/log/maillog, /var/log/mail, or something
similar; the exact pathname is configured in a file called
/etc/syslog.conf, /etc/rsyslog.conf, or something similar. </p>
<li> <p> When Postfix uses its own logging system (see MAILLOG_README),
the location of the logfile is configured with the Postfix maillog_file
parameter. </p>
</ul>
<p> When Postfix does not receive or deliver mail, the first order
of business is to look for errors that prevent Postfix from working
properly: </p>
<blockquote>
<pre>
% <b>grep -E '(warning|error|fatal|panic):' /some/log/file | more</b>
</pre>
</blockquote>
<p> Note: the most important message is near the BEGINNING of the
output. Error messages that come later are less useful. </p>
<p> The nature of each problem is indicated as follows: </p>
<ul>
<li> <p> "<b>panic</b>" indicates a problem in the software itself
that only a programmer can fix. Postfix cannot proceed until this
is fixed. </p>
<li> <p> "<b>fatal</b>" is the result of missing files, incorrect
permissions, incorrect configuration file settings that you can
fix. Postfix cannot proceed until this is fixed. </p>
<li> <p> "<b>error</b>" reports an error condition. For safety
reasons, a Postfix process will terminate when more than 13 of these
happen. </p>
<li> <p> "<b>warning</b>" indicates a non-fatal error. These are
problems that you may not be able to fix (such as a broken DNS
server elsewhere on the network) but may also indicate local
configuration errors that could become a problem later. </p>
</ul>
<h2><a name="trace_mail">Debugging Postfix from inside</a> </h2>
<p> Postfix version 2.1 and later can
produce mail delivery reports for debugging purposes. These reports
not only show sender/recipient addresses after address rewriting
and alias expansion or forwarding, they also show information about
delivery to mailbox, delivery to non-Postfix command, responses
from remote SMTP servers, and so on.
</p>
<p> Postfix can produce two types of mail delivery reports for
debugging: </p>
<ul>
<li> <p> What-if: report what would happen, but do not actually
deliver mail. This mode of operation is requested with: </p>
<pre>
% <b>/usr/sbin/sendmail -bv address...</b>
Mail Delivery Status Report will be mailed to <your login name>.
</pre>
<li> <p> What happened: deliver mail and report successes and/or
failures, including replies from remote SMTP servers. This mode
of operation is requested with: </p>
<pre>
% <b>/usr/sbin/sendmail -v address...</b>
Mail Delivery Status Report will be mailed to <your login name>.
</pre>
</ul>
<p> These reports contain information that is generated by Postfix
delivery agents. Since these run as daemon processes that cannot
interact with users directly, the result is sent as mail to the
sender of the test message. The format of these reports is practically
identical to that of ordinary non-delivery notifications. </p>
<p> For a detailed example of a mail delivery status report, see
the <a href="ADDRESS_REWRITING_README.html#debugging"> debugging</a>
section at the end of the ADDRESS_REWRITING_README document. </p>
<h2><a name="no_chroot">Try turning off chroot operation in master.cf</a></h2>
<p> A common mistake is to turn on chroot operation in the master.cf
file without going through all the necessary steps to set up a
chroot environment. This causes Postfix daemon processes to fail
due to all kinds of missing files. </p>
<p> The example below shows an SMTP server that is configured with
chroot turned off: </p>
<blockquote>
<pre>
/etc/postfix/master.cf:
# =============================================================
# service type private unpriv <b>chroot</b> wakeup maxproc command
# (yes) (yes) <b>(yes)</b> (never) (100)
# =============================================================
smtp inet n - <b>n</b> - - smtpd
</pre>
</blockquote>
<p> Inspect master.cf for any processes that have chroot operation
not turned off. If you find any, save a copy of the master.cf file,
and edit the entries in question. After executing the command
"<b>postfix reload</b>", see if the problem has gone away. </p>
<p> If turning off chrooted operation made the problem go away,
then congratulations. Leaving Postfix running in this way is
adequate for most sites. If you prefer chrooted operation, see
the Postfix <a href="BASIC_CONFIGURATION_README.html#chroot_setup">
BASIC_CONFIGURATION_README</a> file for information about how to
prepare Postfix for chrooted operation. </p>
<h2><a name="debug_peer">Verbose logging for specific SMTP
connections</a></h2>
<p> In /etc/postfix/main.cf, list the remote site name or address
in the debug_peer_list parameter. For example, in order to make
the software log a lot of information to the syslog daemon for
connections from or to the loopback interface: </p>
<blockquote>
<pre>
/etc/postfix/main.cf:
debug_peer_list = 127.0.0.1
</pre>
</blockquote>
<p> You can specify one or more hosts, domains, addresses or
net/masks. To make the change effective immediately, execute the
command "<b>postfix reload</b>". </p>
<h2><a name="sniffer">Record the SMTP session with a network sniffer</a></h2>
<p> This example uses <b>tcpdump</b>. In order to record a conversation
you need to specify a large enough buffer with the "<b>-s</b>"
option or else you will miss some or all of the packet payload.
</p>
<blockquote>
<pre>
# <b>tcpdump -w /file/name -s 0 host example.com and port 25</b>
</pre>
</blockquote>
<p> Older tcpdump versions don't support "<b>-s 0</b>"; in that case,
use "<b>-s 2000</b>" instead. </p>
<p> Run this for a while, stop with Ctrl-C when done. To view the
data use a binary viewer, <b>ethereal</b>, or good old <b>less</b>.
</p>
<h2><a name="verbose">Making Postfix daemon programs more verbose</a></h2>
<p> Append one or more "<b>-v</b>" options to selected daemon
definitions in /etc/postfix/master.cf and type "<b>postfix reload</b>".
This will cause a lot of activity to be logged to the syslog daemon.
For example, to make the Postfix SMTP server process more verbose: </p>
<blockquote>
<pre>
/etc/postfix/master.cf:
smtp inet n - n - - smtpd -v
</pre>
</blockquote>
<p> To diagnose problems with address rewriting specify a "<b>-v</b>"
option for the cleanup(8) and/or trivial-rewrite(8) daemon, and to
diagnose problems with mail delivery specify a "<b>-v</b>"
option for the qmgr(8) or oqmgr(8) queue manager, or for the lmtp(8),
local(8), pipe(8), smtp(8), or virtual(8) delivery agent. </p>
<h2><a name="man_trace">Manually tracing a Postfix daemon process</a></h2>
<p> Many systems allow you to inspect a running process with a
system call tracer. For example: </p>
<blockquote>
<pre>
# <b>trace -p process-id</b> (SunOS 4)
# <b>strace -p process-id</b> (Linux and many others)
# <b>truss -p process-id</b> (Solaris, FreeBSD)
# <b>ktrace -p process-id</b> (generic 4.4BSD)
</pre>
</blockquote>
<p> Even more informative are traces of system library calls.
Examples: </p>
<blockquote>
<pre>
# <b>ltrace -p process-id</b> (Linux, also ported to FreeBSD and BSD/OS)
# <b>sotruss -p process-id</b> (Solaris)
</pre>
</blockquote>
<p> See your system documentation for details. </p>
<p> Tracing a running process can give valuable information about
what a process is attempting to do. This is as much information as
you can get without running an interactive debugger program, as
described in a later section. </p>
<h2><a name="auto_trace">Automatically tracing a Postfix daemon
process</a></h2>
<p> Postfix can attach a call tracer whenever a daemon process
starts. Call tracers come in several kinds. </p>
<ol>
<li> <p> System call tracers such as <b>trace</b>, <b>truss</b>,
<b>strace</b>, or <b>ktrace</b>. These show the communication
between the process and the kernel. </p>
<li> <p> Library call tracers such as <b>sotruss</b> and <b>ltrace</b>.
These show calls of library routines, and give a better idea of
what is going on within the process. </p>
</ol>
<p> Append a <b>-D</b> option to the suspect command in
/etc/postfix/master.cf, for example: </p>
<blockquote>
<pre>
/etc/postfix/master.cf:
smtp inet n - n - - smtpd -D
</pre>
</blockquote>
<p> Edit the debugger_command definition in /etc/postfix/main.cf
so that it invokes the call tracer of your choice, for example:
</p>
<blockquote>
<pre>
/etc/postfix/main.cf:
debugger_command =
PATH=/bin:/usr/bin:/usr/local/bin;
(truss -p $process_id 2>&1 | logger -p mail.info) & sleep 5
</pre>
</blockquote>
<p> Type "<b>postfix reload</b>" and watch the logfile. </p>
<h2><a name="ddd">Running daemon programs with the interactive
ddd debugger</a></h2>
<p> If you have X Windows installed on the Postfix machine, then
an interactive debugger such as <b>ddd</b> can be convenient.
</p>
<p> Edit the debugger_command definition in /etc/postfix/main.cf
so that it invokes <b>ddd</b>: </p>
<blockquote>
<pre>
/etc/postfix/main.cf:
debugger_command =
PATH=/bin:/usr/bin:/usr/local/bin:/usr/X11R6/bin
ddd $daemon_directory/$process_name $process_id & sleep 5
</pre>
</blockquote>
<p> Be sure that <b>gdb</b> is in the command search path, and
export <b>XAUTHORITY</b> so that X access control works, for example:
</p>
<blockquote>
<pre>
% <b>setenv XAUTHORITY ~/.Xauthority</b> (csh syntax)
$ <b>export XAUTHORITY=$HOME/.Xauthority</b> (sh syntax)
</pre>
</blockquote>
<p> Append a <b>-D</b> option to the suspect daemon definition in
/etc/postfix/master.cf, for example: </p>
<blockquote>
<pre>
/etc/postfix/master.cf:
smtp inet n - n - - smtpd -D
</pre>
</blockquote>
<p> Stop and start the Postfix system. This is necessary so that
Postfix runs with the proper <b>XAUTHORITY</b> and <b>DISPLAY</b>
settings. </p>
<p> Whenever the suspect daemon process is started, a debugger
window pops up and you can watch in detail what happens. </p>
<h2><a name="screen">Running daemon programs with the interactive
gdb debugger</a></h2>
<p> If you have the screen command installed on the Postfix machine, then
you can run an interactive debugger such as <b>gdb</b> as follows. </p>
<p> Edit the debugger_command definition in /etc/postfix/main.cf
so that it runs <b>gdb</b> inside a detached <b>screen</b> session:
</p>
<blockquote>
<pre>
/etc/postfix/main.cf:
debugger_command =
PATH=/bin:/usr/bin:/sbin:/usr/sbin; export PATH; HOME=/root;
export HOME; screen -e^tt -dmS $process_name gdb
$daemon_directory/$process_name $process_id & sleep 2
</pre>
</blockquote>
<p> Be sure that <b>gdb</b> is in the command search path. </p>
<p> Append a <b>-D</b> option to the suspect daemon definition in
/etc/postfix/master.cf, for example: </p>
<blockquote>
<pre>
/etc/postfix/master.cf:
smtp inet n - n - - smtpd -D
</pre>
</blockquote>
<p> Execute the command "<b>postfix reload</b>" and wait until a
daemon process is started (you can see this in the maillog file).
</p>
<p> Then attach to the screen, and debug away: </p>
<blockquote>
<pre>
# HOME=/root screen -r
gdb) continue
gdb) where
</pre>
</blockquote>
<h2><a name="gdb">Running daemon programs under a non-interactive
debugger</a></h2>
<p> If you do not have X Windows installed on the Postfix machine,
or if you are not familiar with interactive debuggers, then you
can try to run <b>gdb</b> in non-interactive mode, and have it
print a stack trace when the process crashes. </p>
<p> Edit the debugger_command definition in /etc/postfix/main.cf
so that it invokes the <b>gdb</b> debugger: </p>
<blockquote>
<pre>
/etc/postfix/main.cf:
debugger_command =
PATH=/bin:/usr/bin:/usr/local/bin; export PATH; (echo cont; echo
where; sleep 8640000) | gdb $daemon_directory/$process_name
$process_id 2>&1
>$config_directory/$process_name.$process_id.log & sleep 5
</pre>
</blockquote>
<p> Append a <b>-D</b> option to the suspect daemon in
/etc/postfix/master.cf, for example: </p>
<blockquote>
<pre>
/etc/postfix/master.cf:
smtp inet n - n - - smtpd -D
</pre>
</blockquote>
<p> Type "<b>postfix reload</b>" to make the configuration changes
effective. </p>
<p> Whenever a suspect daemon process is started, an output file
is created, named after the daemon and process ID (for example,
smtpd.12345.log). When the process crashes, a stack trace (with
output from the "<b>where</b>" command) is written to its logfile.
</p>
<h2><a name="unreasonable">Unreasonable behavior</a></h2>
<p> Sometimes the behavior exhibited by Postfix just does not match the
source code. Why can a program deviate from the instructions given
by its author? There are two possibilities. </p>
<ul>
<li> <p> The compiler has erred. This rarely happens. </p>
<li> <p> The hardware has erred. Does the machine have ECC memory? </p>
</ul>
<p> In both cases, the program being executed is not the program
that was supposed to be executed, so anything could happen. </p>
<p> There is a third possibility: </p>
<ul>
<li> <p> Bugs in system software (kernel or libraries). </p>
</ul>
<p> Hardware-related failures usually do not reproduce in exactly
the same way after power cycling and rebooting the system. There's
little Postfix can do about bad hardware. Be sure to use hardware
that at the very least can detect memory errors. Otherwise, Postfix
will just be waiting to be hit by a bit error. Critical systems
deserve real hardware. </p>
<p> When a compiler makes an error, the problem can be reproduced
whenever the resulting program is run. Compiler errors are most
likely to happen in the code optimizer. If a problem is reproducible
across power cycles and system reboots, it can be worthwhile to
rebuild Postfix with optimization disabled, and to see if optimization
makes a difference. </p>
<p> In order to compile Postfix with optimizations turned off: </p>
<blockquote>
<pre>
% <b>make tidy</b>
% <b>make makefiles OPT=</b>
</pre>
</blockquote>
<p> This produces a set of Makefiles that do not request compiler
optimization. </p>
<p> Once the makefiles are set up, build the software: </p>
<blockquote>
<pre>
% <b>make</b>
% <b>su</b>
Password:
# <b>make install</b>
</pre>
</blockquote>
<p> If the problem goes away, then it is time to ask your vendor
for help. </p>
<h2><a name="mail">Reporting problems to postfix-users@postfix.org</a></h2>
<p> The people who participate on postfix-users@postfix.org
are very helpful, especially if YOU provide them with sufficient
information. Remember, these volunteers are willing to help, but
their time is limited. </p>
<p> When reporting a problem, be sure to include the following
information. </p>
<ul>
<li> <p> A summary of the problem. Please do not just send some
logging without explanation of what YOU believe is wrong. </p>
<li> <p> Complete error messages. Please use cut-and-paste, or use
attachments, instead of reciting information from memory.
</p>
<li> <p> Postfix logging. See the text at the top of the DEBUG_README
document to find out where logging is stored. Please do not frustrate
the helpers by word wrapping the logging. If the logging is more
than a few kbytes of text, consider posting an URL on a web or ftp
site. </p>
<li> <p> Consider using a test email address so that you don't have
to reveal email addresses or passwords of innocent people. </p>
<li> <p> If you can't use a test email address, please anonymize
email addresses and host names consistently. Replace each letter
by "A", each digit
by "D" so that the helpers can still recognize syntactical errors.
</p>
<li> <p> Command output from:</p>
<ul>
<li> <p> "<b>postconf -n</b>". Please do not send your main.cf file,
or 1000+ lines of <b>postconf</b> command output. </p>
<li> <p> "<b>postconf -Mf</b>" (Postfix 2.9 or later). </p>
</ul>
<li> <p> Better, provide output from the <b>postfinger</b> tool.
This can be found at https://github.com/ford--prefect/postfinger. </p>
<li> <p> If the problem is SASL related, consider including the
output from the <b>saslfinger</b> tool. This can be found at
https://packages.debian.org/search?keywords=sasl2-bin. </p>
<li> <p> If the problem is about too much mail in the queue, consider
including output from the <b>qshape</b> tool, as described in the
QSHAPE_README file. </p>
<li> <p> If the problem is protocol related (connections time out,
or an SMTP server complains about syntax errors etc.) consider
recording a session with <b>tcpdump</b>, as described in the <a
href="#sniffer">DEBUG_README</a> document. </ul>
</body>
</html>
|