1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845
|
<html>
<body BGCOLOR="FFFFFF">
<h1>Docs: Troubleshooting</h1>
<p align="left">Doing a search below will usually
lead straight
to the problem.</p>
<ul>
</ul>
<p>We continually update this guide; so please click here
to get
the <a href="http://www.mcs.anl.gov/petsc/petsc-as/documentation/troubleshooting.html">most
recent version.of the troubleshooting guide</a>.</p>
</td>
</tr>
</tbody>
</table>
<hr>
<ol start="1" type="1">
<li><font color="#ff0000"><strong>Symptom: </strong></font><font face="Terminal">No or little speedup: </font>
<p><font color="#ff0000"><strong>Problem: </strong><span style="color: rgb(0, 0, 0);">This can be a result of not
using a</span> <a href="faq.html#computers">parallel
system that is suitable for sparse linear solvers.</a> </font></p>
</li>
<li><font color="#ff0000"><strong><a name="PetscSplitOwnership"></a>Symptom: </strong></font><font face="Terminal">Error detected in
PetscSplitOwnership() about "sum of local lengths ...": </font>
<p><font color="#ff0000"><strong>Problem: </strong><span style="color: rgb(0, 0, 0);">In
a previous call to VecSetSizes(), MatSetSizes(), VecCreateXXX() or
MatCreateXXX() you passed in local and global sizes that do not make
sense for the correct number of processors. For example if you pass in
a local size of 2 and a global size of 100 and run on two processors,
this cannot work since the sum of the local sizes is 4, not 100.</span>
</font></p>
</li>
<li><a name="Corrupt"></a><font color="#ff0000"><strong>Symptom: </strong></font><font face="Terminal">Corrupt argument: </font>
<p><font color="#ff0000"><strong>Problem: </strong><span style="color: rgb(0, 0, 0);">
An argument to
a function is invalid. In Fortran this may be caused by forgeting to
list an argument in the call, especially the final ierr.<br>
<br>
Otherwise it
is usually caused by memory corruption; that is somewhere the code is
writing out of array bounds. To track this down rerun the debug version
of the code with the option -malloc_debug. Occasionally the
code may crash only with the optimized version, in that case
run the optimized version with -malloc_debug. If you determine the
problem
is from memory corruption you can put the macro CHKMEMQ in the code
near the crash to determine exactly what line is causing the
problem.<br>
<br>
If -malloc_debug does not help for GNU/Linux you can try using <a href="http://valgrind.org">http://valgrind.org </a>to look for memory corruption,
on the Apple do "man libgmalloc" to see how to detect memory corruption.</span></font></p>
</li>
<li><font color="#ff0000"><span style="color: rgb(0, 0, 0);"><span style="font-weight: bold;"></span></span></font><font color="#ff0000"><strong><a name="signal"></a>Symptom</strong></font><font color="#ff0000"><span style="color: rgb(0, 0, 0);">: Caught signal</span></font><font face="Terminal">: </font>
<p><font color="#ff0000"><strong>Problem: </strong></font><font color="#ff0000"><span style="color: rgb(0, 0, 0);">this is most likely due to memory corruption, see <a href="#corrupt">Corrupt Argument</a></span></font><font color="#ff0000"><strong><span style="color: rgb(0, 0, 0);"></span><br>
</strong><span style="color: rgb(0, 0, 0);"></span></font></p>
</li>
<li><a name="ZeroPivot"></a><font color="#ff0000"><strong>Symptom: </strong></font><font face="Terminal">Detected zero pivot in LU factorization: </font>
<p><font color="#ff0000"><strong>Problem: </strong><span style="color: rgb(0, 0, 0);">
A zero pivot
in LU, ILU, Cholesky, or ICC sparse factorization does not always mean
that the matrix is singular. You can use -pc_factor_shift_nonzero or
-pc_factor_shift_positive_definite,
-[level]_pc_factor_shift_nonzero, </span></font><font style="color: rgb(0, 0, 0);" color="#ff0000">-[level]_pc_factor_shift_postive_definite
</font><font style="color: rgb(0, 0, 0);" color="#ff0000">to prevent the zero pivot. For lu, ilu,
cholesky, or icc and [level] is sub is for a block in the bjacobi or
ASM preconditioner and -mg_levels and -mg_coarse are for inside
multigrid
smoothers or the coarse grid solver). See PCFactorSetShiftNonzero(),
PCFactorSetShiftPd().</font></p>
<font style="color: rgb(0, 0, 0);" color="#ff0000">
</font>
<p style="color: rgb(0, 0, 0);">This error can also
happen if your matrix
is <strong>singular </strong>, see KSPSetNullSpace() for
how to
handle this.</p>
<font style="color: rgb(0, 0, 0);" color="#ff0000">
</font>
<p><font color="#ff0000"><span style="color: rgb(0, 0, 0);">If this error occurs in
the </span><strong style="color: rgb(0, 0, 0);">zeroth
row </strong><span style="color: rgb(0, 0, 0);"> of
the matrix it is likely you have an error in
the code
that generates the matrix.</span> </font></p>
</li>
<li><font color="#ff0000"><strong>Symptom: </strong></font><font face="Terminal">alder>make<br>
make: Warning: Can't find `../../bmake/': <br>
</font>
<p><font face="Terminal"><font color="#ff0000"><strong>Problem:
</strong></font>You
have not set the variable PETSC_ARCH to the architecture of your
machine (e.g., sun4, rs6000). </font></p>
<font face="Terminal"> </font>
<p><font face="Terminal"><font color="#ff0000"><strong>Cure:
</strong></font>Include
in your .cshrc file some code to set it automatically. Or remember to
include the PETSC_ARCH in the
command line every time you use make. For instance,
PETSC_ARCH=sun4 example4 </font></p>
<font face="Terminal"> </font></li>
<font face="Terminal"> <li><font face="Terminal"><strong>Symptom:
alder> make</strong> </font>
<p><font face="Terminal"> makefile:12:
/bmake/common/base: No such file or directory<br>
makefile:13: /bmake/common/test: No such file or directory<br>
make: *** No rule to make target `/bmake/common/test'. Stop.</font>
</p>
or makefile:12:
home/joe/bmake/common/base: No such file or directory<br>
makefile:13: home/joe/bmake/common/test: No such file or directory<br>
make: *** No rule to make target
`home/joe/bmake/common/test'. Stop.<br>
<p><font color="#ff0000"><strong>Problem: </strong></font>The
variable PETSC_DIR is not set or does not point to the PETSc
directory; in this case it points to the directory /home/joe. </p>
<p><font color="#ff0000"><strong>Cure: </strong></font>Make
sure the variable PETSC_DIR in the makefile points to the PETSc
directory. Be aware that at many sites, your home directory may
have different names on different machines so it is usually better to
make the path relative, rather than absolute. That is, use
PETSC_DIR = ../../petsc rather than PETSC_DIR =
/c/cafa/username/petsc. </p>
</li>
<li><font color="#ff0000"><strong>Symptom: </strong></font><font face="Terminal">alder> make <br>
ex1 f77 -g -o ex1 ex1.o affine2d.o \ ../../libs/libsg/sun4/domain.a
../../libs/libsg/sun4/Xtools.a ../../libs/libsg/sun4/tools.a
../../libs/libsg/sun4/liblapack.a ../../libs/libsg/sun4/blas.a
../../libs/libsg/sun4/system.a -lX11 -lm<br>
ld: ex1.o: bad magic number Compilation failed *** Error code 4 </font>
<p><font color="#ff0000"><strong>Problem:</strong></font>
The
file ex1.o was compiled on a different architecture or with a
different compiler. </p>
<p><font color="#ff0000"><strong>Cure:</strong></font>
Remove
all .o files and recompile from scatch</p>
</li>
<li>using
MPICH [0] Truncated message (in CHK_MSGLEN) <br>
[0] Aborting program! <br>
p0_8959: p4_error: (null): 1
<p><font color="#ff0000"><strong>Problem:</strong></font>
this
is due to some bug in a call to an MPI routine. </p>
<p><font color="#ff0000"><strong>Cure: </strong></font>Run
the program with the option -start_in_debugger. In the
debugger, type "break p4_error" (or "stop in p4_error" for
dbx); then type "cont". When the program aborts, use debugger commands
such as "where" to track down the problem with the call.</p>
</li>
<li><font color="#ff0000"><strong>Symptom:</strong></font>
on
HP-UX <br>
Make: Unknown flag argument -. Stop. <br>
Make: Unknown flag argument -. Stop. <br>
Make: Unknown flag argument -. Stop.
<p>We have gotten this on the HP-UX using the native (vendor
provided) make. </p>
<p><font color="#ff0000"><strong>Cure: </strong></font>Install
and use Gnu make. To force PETSc to use an alternative make,
edit the file petsc/bmake/$PETSC_ARCH/base and change OMAKE to
your alternative. </p>
</li>
<li><font color="#ff0000"><strong>Symptom:</strong>
</font>on
IBM SP<br>
Could not load program
/afs/rpi.edu/big/00/0000/hongwl/petsc/petsc/src/ksp/examp les/ex1<br>
Symbol XSetWMProperties in pmd2 is undefined Symbol XSetWMName in pmd2
is undefined <br>
Error was: Exec format error
<p><font color="#ff0000"><strong>Problem:</strong></font>
The
libraries on the IBM SP front-end for X may be different than on the
nodes. </p>
<p><font color="#ff0000"><strong>Cure: </strong></font>Get
your system administrator to make sure the dynamic libraries on
the nodes are IDENTICAL to those on the compiler server. </p>
</li>
<li><font color="#ff0000"><strong>Symptom:</strong>
</font>using
Fortran While using VecGetArray(), MatGetArray(), ISGetArray() <br>
/usr/local/mpi/bin/mpirun.ch_p4: 17545 Breakpoint then program stops
<p><font color="#ff0000"><strong>Problem:</strong></font>
You
have compiled some of your code with the option to check for
arrays out of bound. (on the IBM rs6000 this is the -C option) </p>
<p><font color="#ff0000"><strong>Cure: </strong></font>Recompile
all code making sure it does not check for arrays out of bound.
The use of VecGetArray(), etc. requires accessing arrays out of
bounds; this is done safely. -</p>
</li>
<li><font color="#ff0000"><strong>Symptom:</strong>
</font>You
create Draw windows or ViewerDraw windows or use options
-ksp_xmonitor or -snes_xmonitor and the program seems to run OK
but windows never open.
<p><font color="#ff0000"><strong>Problem:</strong></font>
The
libraries were compiled without support for X windows. </p>
<p><font color="#ff0000"><strong>Cure: </strong></font>Make
sure that config/configure.py was run with the option --with-x=1 </p>
</li>
<li><font color="#ff0000"><strong>Symptom:</strong>
</font>
<p><font color="#ff0000"><strong>Problem:</strong>
</font>PETSc
cannot work on a machine where the length of C integers does not equal
the length of Fortran integers. </p>
<p><font color="#ff0000"><strong>Cure: </strong></font>Change
your compilers so that you use ones that have the same length
for integers. Or check compiler flags to see if you can change
the default integer lengths to match.</p>
</li>
<li>On
DEDC alpha Unaligned access pid=15199 va=140021674 pc=12001e8d8
ra=12001e8c0 type=ldt
<p><font color="#ff0000"><strong>Problem: </strong></font>The
system has detected an unaligned variable. This is usually an unaligned
double. </p>
<p><font color="#ff0000"><strong>Cure: </strong></font>Make
sure in Fortran that you always write double precision numbers
as 10.d0 etc not just as 10. cause then it will be stored as a
single precision number and may not be properly aligned. -</p>
</li>
<li><font color="#ff0000"><strong>Symptom: </strong></font>PetscScalarAddressToFortran:C
and Fortran arrays are not commonly aligned <br>
or are too far apart to be indexed by an integer. <br>
Locations: C 1920156 Fortran 2438656 [<br>
0] MPI Abort by user Aborting program ! <br>
[0] Aborting program!
<p><font color="#ff0000"><strong>Problem:</strong></font>
This
occurs when trying to access a PETSc array from Fortran. The array may
have been obtained with VecGetArray(), MatGetArray(), etc. On
the IRIX64 this is because the Fortran address's are so far
away from the C address that you cannot move between them with an
integer offset (integers are just not big enough). On other machines
this is because the distance between the Fortran array starting
point and the C array starting point is not divisible by the
length of a double (or complex). This one cannot access the other with
an integer offset. </p>
<p><font color="#ff0000"><strong>Cure: </strong></font>1)
Rewrite Fortran code to not use the particular XXXGetXXX()
routine. For example, use VecSetValues() instead of directly stuffing
the values into the array. 2) Determine how to force the
Fortran and or C compiler to commonly align doubles or complex
numbers. That is, if all doubles are double aligned then this
won't be a problem, if all complex are quad aligned then it is not a
problem. If you determine how to do this for a particular machine,
please let use know so we can add it to PETSc. </p>
</li>
<li><font color="#ff0000"><strong>Symptom: </strong></font>On
rs6000 machines, the program encounters a segmentation fault
when initializing MPI. <br>
[light] mpirun ex1 <br>
/light_home2/lmcinnes/mpich/lib/rs6000/ch_p4/mpirun: 23817 Memory fault
<br>
See, e.g., the following debugger session:<br>
[light] 525%gdb ex1 <br>
GDB is free software and you are welcome to distribute copies of it
under certain conditions; type "show copying" to see the
conditions. There is absolutely no warranty for GDB; type "show
warranty" for details. GDB 4.13 (rs6000-ibm-aix3.2), Copyright
1994 Free Software Foundation, Inc... <br>
(gdb) run -p4pg joe <br>
Starting program:
/light_home2/lmcinnes/petsc-2.0.15/src/sles/examples/tutorials/ex1
-p4pg joe <br>
Program received signal SIGSEGV, Segmentation fault. 0x10003750 in
getenv () <br>
(gdb) where <br>
#0 0x10003750 in getenv () <br>
#1 0x10001438 in MPIR_Init (=0x2ff7f630, =0x2ff7f634) <br>
#2 0x10001384 in MPI_Init (=0x2ff7f630, =0x2ff7f634) <br>
#3 0x100004d8 in main (argc=3, args=0x2ff7f65c) at ex1.c:37 <br>
#4 0x10000430 in __start ()
<p><font color="#ff0000"><strong>Problem: </strong></font>As
shown
below, libxlf.a contains the Fortran routine getenv(), which is
being used instead of the UNIX routine that we really need.
This seems to occur when using gcc/g++ instead of xlc. <br>
</p>
<p><font color="#ff0000"><strong>Cure: </strong></font>Edit
the file petsc/bmake/rs6000/bpackages and define FC_LIB as as
follows, making sure to list "-lbsd -lc" BEFORE libxlf.a and
any other Fortran libraries. FC_LIB = -lbsd -lc /usr/lib/libxlf.a</p>
</li>
<li><font color="#ff0000"><strong>Symptom:</strong>
</font>The
program seems to use more and more memory as it runs, even
though you don't think you are allocating more memory.
<p><font color="#ff0000"><strong>Problem:</strong></font>
Possibly some of the following:</p>
<ol>
<li>You are creating new PETSc objects but never freeing
them.</li>
<li>There is a memory leak in PETSc or your code. </li>
<li>Something much more subtle: (if you are using Fortran).
When you declare a large array in Fortran, the operating
system does not allocate all the memory pages for that array until you
start using the different locations in the array. Thus, in a
code, if at each step you start using later values in the
array your virtual memory usage will "continue" to increase
as measured by ps or top. </li>
<li>You are running with the -log, -log_mpe, or -log_all
option. He a great deal of logging information is stored in
memory until the conclusion of the run.</li>
<li>You are linking with the MPI profiling libraries; these
cause logging of all MPI activities. Another <font color="#ff0000"><strong>Symptom</strong>
</font>is at the conclusion of the run it may print some
message about writing log
files. </li>
</ol>
<p><font color="#ff0000"><strong>Cures:</strong></font></p>
<ol>
<li>Run with the -malloc_debug option and -malloc_dump. Or
use
the
commands PetscMallocDump() and PetscMallocLogDump() sprinkled in
your code to track memory that is allocated and not later freed. Use
the commands PetscMallocSpace() and PetscGetResidentSetSize()
to monitor memory allocated and total memory used as the
code progresses. </li>
<li>This is just the way Unix works and is harmless.</li>
<li>Do not use the -log, -log_mpe, or -log_all option, or
use PLogEventDeactivate() or PLogEventDeactivateClass(),
PLogEventMPEDeactivate() to turn off logging of specific events. </li>
<li>Make sure you do not link with the MPI profiling
libraries. </li>
<li></li>
</ol>
</li>
<li><font color="#ff0000"><strong>Symptom:</strong>
</font>Under
Windows when Installing using g++ libfast in:
/users/petsc/petsc_prj/petsc/src/sys/src str.c: In function `void
PetscStrncpy(char *, char *, int)': str.c:36: warning: implicit
declaration of function `int strncpy(...)' ... ...
<p><font color="#ff0000"><strong>Problem:</strong></font>
This is
due to the case insensitivity of Windows file systems. Instead of using
string.h , the compiler is picking up String.h - a C++
include-file, causing these errors. </p>
<p><font color="#ff0000"><strong>Cure: </strong></font>In
the gcc include dir do "cp string.h string_bak.h" - Edit
petsc/src/sys/src/str.c replace string.h with string_bak.h -
Edit petsc/src/sys/src/memc.c replace memory.h with string_bak.h -
recompile. </p>
</li>
<li><font color="#ff0000"><strong>Symptom:</strong>
</font>on
SGI (or Origin) using SGI MPI version 2.0 MPI Error, rank:0,
function:MPI_ERRHANDLER_SET, Invalid communicator MPI_Abort()
called, aborting program! Other, random crashes in MPI.
<p><font color="#ff0000"><strong>Problem: </strong></font>bug
in SGI's implementation of MPI called version 2.0 (confirmed by
SGI) </p>
<p><font color="#ff0000"><strong>Cure: </strong></font>Upgrade
to SGI's version 3.0 of MPI. </p>
</li>
<li><font color="#ff0000"><strong>Symptom:</strong>
</font>on
SGI Powerchallenge running 6.2 ld64: WARNING 134: weak
definition of __dcis in /usr/lib64/mips4/libftn.so preempts that weak
definition in /usr/lib64/mips4/libm.so. ld64: WARNING 134: weak
definition of __rcis in /usr/lib64/mips4/libftn.so preempts
that weak definition in /usr/lib64/mips4/libm.so.
<p><font color="#ff0000"><strong>Problem:</strong></font>
Message seems harmless </p>
<p><font color="#ff0000"><strong>Cure: </strong></font>Change
the CLINKER and FLINKER in bmake/IRIX64/base to <br>
CLINKER = cc -64 ${COPTFLAGS} -Wl,-woff,84,-woff,85,-woff,134 -rpath
${LDIR}:${DYLIBPATH} <br>
FLINKER = f77 -64 ${FOPTFLAGS} -Wl,-woff,84,-woff,85,-woff,134 -rpath
${LDIR}:${DYLIBPATH}</p>
</li>
<li><font color="#ff0000"><strong>Symptom:</strong> </font>when
calling MatPartitioningApply() you get a message Error! Key 16615 not
found
<p><font color="#ff0000"><strong>Problem: </strong></font>the
graph of the matrix you are using is not symmetric </p>
<p><font color="#ff0000"><strong>Cure:</strong></font>
you must use symmetric matrices for partitioning </p>
</li>
<li><font color="#ff0000"><strong>Symptom:</strong>
</font>With
GMRES At restart the second residual norm printed does not
match the first <br>
26 KSP Residual norm 3.421544615851e-04 <br>
27 KSP Residual norm 2.973675659493e-04 <br>
28 KSP Residual norm 2.588642948270e-04 <br>
29 KSP Residual norm 2.268190747349e-04 <br>
30 KSP Residual norm 1.977245964368e-04<br>
30 KSP Residual norm 1.994426291979e-04 <----- At restart the
residual norm is printed a second time
<p><font color="#ff0000"><strong>Problem:</strong></font>
Actually this is not surprising. GMRES computes the norm of the
residual at each iteration via a recurrence relation between
the norms of the residuals at the previous iterations and quantities
computed at the current iteration; it does not compute it via directly
|| b - A x^{n} ||. Sometimes, especially with an
ill-conditioned matrix, or computation of the matrix-vector product via
differencing, the residual norms computed by GMRES start to
"drift" from the correct values. At the restart, we compute the
residual norm directly, hence the "strange stuff," the
difference printed. The drifting, if it remains small, is harmless
(doesn't effect the accuracy of the solution that GMRES
computes). </p>
<p><font color="#ff0000"><strong>Cure:</strong></font>
There
realy isn't a cure, but if you use a more powerful
preconditioner the drift will often be smaller and less noticeable. Of
if you are running matrix-free you may need to tune the
matrix-free parameters.</p>
</li>
<li><font color="#ff0000"><strong>Symptom:</strong>
</font>on
Cray T3E/T3D My mixed Fortran/(C or C++) code works fine on
other machines but does not link (or links but crashes on) on
the Cray T3D or T3E.
<p><font color="#ff0000"><strong>Probable
problems</strong></font>:</p>
<ol>
<li>NOT LINKING: the Cray Fortran compiler changes all
Fortran routine names to all caps, so when you call them
from C/C++ with all small letters, the linker cannot find the. </li>
<li>STRANGE CRASHING: the Cray Fortran compiler uses double
precision to denote quad precision and single precision to
denote "regular" double precision. </li>
</ol>
<p><font color="#ff0000"><strong>Cures:</strong></font>
</p>
<ol>
<li>You must make sure that when you call Fortran routines
from C/C++ the name of the routine called (in C/C++) is in
all caps. The PETSc macro HAVE_FORTRAN_CAPS is defined on machines like
the Cray so you can use it in your C/C++ like this #if
defined(HAVE_FORTRAN_CAPS) #define myfortranroutine_ MYFORTRANROUTINE
#elif !defined(HAVE_FORTRAN_UNDERSCORE) #define
myfortranroutine_ #define myfortranroutine #endif /* some C
code that calls Fortran */ myfortranroutine_(.....). See
src/fortran/custom/zoptions.c for examples. </li>
<li>To get the Fortran compiler to to behave like a normal
Unix Fortran compiler you must make sure that all of your
Fortran routines are compiled with the -dp flag. If you use the PETSc
makefiles and macro FC to compile your Fortran code this will
handle this automatically. </li>
</ol>
</li>
<li><font color="#ff0000"><strong>Symptom:</strong>
</font>[0]
PETSC ERROR: MatAssemblyBegin() line 1858 in
src/mat/interface/matrix.c Not for factored matrix
<p><font color="#ff0000"><strong>Problem: </strong></font>You
are trying to assemble a matrix that has been factored.
Normally this does not make sense, unless you are using an implace
factorization and want to reuse the space. </p>
<p><font color="#ff0000"><strong>Cure: </strong></font>Call
MatSetUnfactored(Mat); before calling the MatSetValues()
routines.</p>
</li>
<li><strong><font color="#ff0000">Symptom:</font></strong>
Error
while running program<br>
<br>
> mpirun ex1<br>
p0_27137: p4_error: semget failed for setnum=%d: 0<br>
<br>
<strong><font color="#ff0000">Problem:</font></strong>
Inproperly installed or configured MPICH. Often this results
from compiling the socket based version of MPICH, with device ch_p4 but
using the mpirun associated with the shared memory version or
the other way around.<br>
<br>
<font color="#ff0000"><strong>Cure:</strong>
</font>First,
make sure that you can run plain old MPI programs (those
without PETSC). Make sure you are using the correct version of
mpirun for the installed version of MPICH or reinstall MPICH<br>
.<br>
</li>
<li><strong><font color="#ff0000">Symptom</font></strong>:
Error when compiling PETSc examples<br>
<br>
> ld : -lg2c no such file<br>
<br>
<font color="#ff0000"><strong>Problem:</strong></font>
Your
fortran compiler is probably using libf2c.a instead of libg2c.a<br>
<br>
<font color="#ff0000"><strong>Cure:</strong></font>
Edit
bmake/${PETSC_ARCH}/petscconf and replace -lg2c with -lf2c<br>
<br>
</li>
<li>
<p> <font color="#ff0000"><strong>Symptom</strong>:</font>
Get the following errors when using PETSc graphics on
windows/cygwin-X11 <br>
<font face="Terminal"><br>
X Error of failed request: BadMatch (invalid parameter attributes)<br>
Major opcode of failed request: 78 (X_CreateColormap)<br>
Serial number of failed request: 8<br>
Current serial number in output stream: 9<br>
<br>
</font></p>
<p><font face="Terminal"><font color="#ff0000"><strong>Problem:</strong></font>
This
problem might occur when using 25 color mode or 32bit color
mode on windows. </font></p>
<font face="Terminal"> </font>
<p><font face="Terminal"><font color="#ff0000"><strong>Cure:</strong></font>
This
can be fixed by changing the display settings on windows to 16
bit colors or 24 bit colors.<br>
</font></p>
<font face="Terminal"> </font>
<p></p>
</li>
<li><font face="Terminal"><font color="#ff0000"><strong>Symptom</strong>:</font>
Some
Krylov methods seem to print two residual norms per iteration,
for example <br>
<font face="Terminal"><br>
> 1198 KSP Residual norm 1.366052062216e-04<br>
> 1198 KSP Residual norm 1.931875025549e-04<br>
> 1199 KSP Residual norm 1.366026406067e-04<br>
> 1199 KSP Residual norm 1.931819426344e-04 </font></font>
<p><font color="#ff0000"><strong>Problem:</strong></font>
Some Krylov methods, for example tfqmr, actually have a
"sub-iteration"<br>
of size 2 inside the loop; each of the two substeps has its own matrix
vector<br>
product and application of the preconditioner and updates the residual<br>
approximations. This is why you get this "funny" output where it looks
like <br>
there are two residual norms per iteration. You can also think of it as
twice<br>
as many iterations. </p>
<p></p>
</li>
<li><font color="#ff0000"><strong>Symptom</strong>:</font>
The
example compiles fine - but at runtime gives the following
error:
<p>[0]PETSC ERROR: PetscInitialize_DynamicLibraries() line 63
in src/sys/src/dll/reg.c<br>
[0]PETSC ERROR: Unable to locate PETSc dynamic library
/home/balay/spetsc/lib/libg/linux/libpetsc <br>
You cannot move the dynamic libraries!<br>
<br>
</p>
<p><font color="#ff0000"><strong>Problem:</strong></font>
When
using DYNAMIC libraries - the libraries cannot be moved after
they are installed. This could also happen on clusters - where
the paths are different on the (run) nodes - than on the
(compile) front-end. </p>
<p><font color="#ff0000"><strong>Cure:</strong></font>
Do not use dynamic libraries & shared libraries. Run
config/configure.py with --with-shared=0 --with-dynamic=0</p>
</li>
<li><font color="#ff0000"><strong>Symptom</strong>:</font>
When
running with -start_in_debugger one gets the error message
<p>PETSC: Attaching gdb to
/opt/procast_mpich/procast051003/./procast of pid 31603 on display
linux.:0.\
0 on machine linux.
: Can't get address for linux. Xt error: Can't open display: linux.:0.0
</p>
<p><font color="#ff0000"><strong>Problem:</strong></font>
The
remote nodes
do not know where to display the debugger window. </p>
<p><font color="#ff0000"><strong>Cure: </strong></font>Run
with
the additional option
-display displayname where displayname is something like mymachine.0.0 </p>
</li>
<li><font color="#ff0000"><strong></strong></font><font color="#ff0000"><strong>Symptom: </strong></font><br>
# make PETSC_ARCH=asterix-mpd test<br>
Running test examples to verify correct installation<br>
Possible error running C/C++ src/snes/examples/tutorials/ex19 with 1
MPI process<br>
See
http://www.mcs.anl.gov/petsc/petsc-as/documentation/troubleshooting.html<br>
mpdrun_asterix: cannot connect to local mpd (/tmp/mpd2.console_balay);
possible causes:<br>
1. no mpd is running on this host<br>
2. an mpd is running but was started without a "console" (-n
option)<br>
<p><font color="#ff0000"><strong>Problem:</strong></font>
As
the error message indicates - 'mpd' - required for the
version of
MPICH you've installed isn't started </p>
<p><font color="#ff0000"><strong>Cure:</strong></font><span style="font-weight: bold;"> </span> Start the
mpd daemon [should
be at MPI_DIR/bin/mpdboot].<br>
</p>
</li>
<li><font color="#ff0000"><strong>Symptom: </strong></font>
<p><font color="#ff0000"><strong>Problem: </strong></font>
</p>
<p><font color="#ff0000"><strong>Cure: </strong></font>
</p>
</li>
<li>
<p> </p>
</li>
</font>
</ol>
</body>
</html>
</body>
</html>
|