1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787
|
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<!--Converted with LaTeX2HTML 2019.2 (Released June 5, 2019) -->
<HTML lang="EN">
<HEAD>
<TITLE>5 Troubleshooting</TITLE>
<META NAME="description" CONTENT="5 Troubleshooting">
<META NAME="keywords" CONTENT="user_guide">
<META NAME="resource-type" CONTENT="document">
<META NAME="distribution" CONTENT="global">
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=utf-8">
<META NAME="viewport" CONTENT="width=device-width, initial-scale=1.0">
<META NAME="Generator" CONTENT="LaTeX2HTML v2019.2">
<LINK REL="STYLESHEET" HREF="user_guide.css">
<LINK REL="next" HREF="node23.html">
<LINK REL="previous" HREF="node14.html">
<LINK REL="next" HREF="node22.html">
</HEAD>
<BODY >
<!--Navigation Panel-->
<A
HREF="node22.html">
<IMG WIDTH="37" HEIGHT="24" ALT="next" SRC="next.png"></A>
<A
HREF="user_guide.html">
<IMG WIDTH="26" HEIGHT="24" ALT="up" SRC="up.png"></A>
<A
HREF="node20.html">
<IMG WIDTH="63" HEIGHT="24" ALT="previous" SRC="prev.png"></A>
<A ID="tex2html210"
HREF="node1.html">
<IMG WIDTH="65" HEIGHT="24" ALT="contents" SRC="contents.png"></A>
<BR>
<B> Next:</B> <A
HREF="node22.html">5.1 Compilation problems with</A>
<B> Up:</B> <A
HREF="user_guide.html">User's Guide for the</A>
<B> Previous:</B> <A
HREF="node20.html">4.6 Restarting</A>
<B> <A ID="tex2html211"
HREF="node1.html">Contents</A></B>
<BR>
<BR>
<!--End of Navigation Panel-->
<H1><A ID="SECTION00060000000000000000">
5 Troubleshooting</A>
</H1>
<P>
<H4><A ID="SECTION00060010000000000000">
5.0.0.1 pw.x says 'error while loading shared libraries' or
'cannot open shared object file' and does not start</A>
</H4>
Possible reasons:
<UL>
<LI>If you are running on the same machines on which the code was
compiled, this is a library configuration problem. The solution is
machine-dependent. On Linux, find the path to the missing libraries;
then either add it to file <TT>/etc/ld.so.conf</TT> and run <TT>ldconfig</TT>
(must be
done as root), or add it to variable LD_LIBRARY_PATH and export
it. Another possibility is to load non-shared version of libraries
(ending with .a) instead of shared ones (ending with .so).
</LI>
<LI>If you are <EM>not</EM> running on the same machines on which the
code was compiled: you need either to have the same shared libraries
installed on both machines, or to load statically all libraries
(using appropriate <TT>configure</TT> or loader options). The same applies to
Beowulf-style parallel machines: the needed shared libraries must be
present on all PCs.
</LI>
</UL>
<P>
<H4><A ID="SECTION00060020000000000000">
5.0.0.2 errors in examples with parallel execution</A>
</H4>
<P>
If you get error messages in the example scripts – i.e. not errors in
the codes – on a parallel machine, such as e.g.:
<EM>run example: -n: command not found</EM>
you may have forgotten to properly define PARA_PREFIX and PARA_POSTFIX.
<P>
<H4><A ID="SECTION00060030000000000000">
5.0.0.3 pw.x prints the first few lines and then nothing happens
(parallel execution)</A>
</H4>
If the code looks like it is not reading from input, maybe
it isn't: the MPI libraries need to be properly configured to accept input
redirection. Use <TT>pw.x -i</TT> and the input file name (see
Sec.<A HREF="node18.html#SubSec:badpara">4.4</A>), or inquire with
your local computer wizard (if any). Since v.4.2, this is for sure the
reason if the code stops at <EM>Waiting for input...</EM>.
<P>
<H4><A ID="SECTION00060040000000000000">
5.0.0.4 pw.x stops with error while reading data</A>
</H4>
There is an error in the input data, typically a misspelled namelist
variable, or an empty input file.
Unfortunately with most compilers the code just reports <EM>Error while
reading XXX namelist</EM> and no further useful information.
Here are some more subtle sources of trouble:
<UL>
<LI>Out-of-bound indices in dimensioned variables read in the namelists;
</LI>
<LI>Input data files containing ^M (Control-M) characters at the end
of lines, or non-ASCII characters (e.g. non-ASCII quotation marks,
that at a first glance may look the same as the ASCII
character). Typically, this happens with files coming from Windows
or produced with "smart" editors.
</LI>
</UL>
Both may cause the code to crash with rather mysterious error messages.
If none of the above applies and the code stops at the first namelist
(&CONTROL) and you are running in parallel, see the previous item.
<P>
<H4><A ID="SECTION00060050000000000000">
5.0.0.5 pw.x mumbles something like <EM>cannot recover</EM> or
<EM>error reading recover file</EM></A>
</H4>
You are trying to restart from a previous job that either
produced corrupted files, or did not do what you think it did. No luck: you
have to restart from scratch.
<P>
<H4><A ID="SECTION00060060000000000000">
5.0.0.6 pw.x stops with <EM>inconsistent DFT</EM> error</A>
</H4>
As a rule, the flavor of DFT used in the calculation should be the
same as the one used in the generation of pseudopotentials, which
should all be generated using the same flavor of DFT. This is actually enforced: the
type of DFT is read from pseudopotential files and it is checked that the same DFT
is read from all PPs. If this does not hold, the code stops with the
above error message. Use – at your own risk – input variable
<TT>input_dft</TT> to force the usage of the DFT you like.
<P>
<H4><A ID="SECTION00060070000000000000">
5.0.0.7 pw.x stops with error in cdiaghg or rdiaghg</A>
</H4>
Possible reasons for such behavior are not always clear, but they
typically fall into one of the following cases:
<UL>
<LI>serious error in data, such as bad atomic positions or bad
crystal structure/supercell;
</LI>
<LI>a bad pseudopotential, typically with a ghost, or a USPP giving
non-positive charge density, leading to a violation of positiveness
of the S matrix appearing in the USPP formalism;
</LI>
<LI>a failure of the algorithm performing subspace
diagonalization. The LAPACK algorithms used by <TT>cdiaghg</TT>
(for generic k-points) or <TT>rdiaghg</TT> (for <I>Γ</I> -only case)
are
very robust and extensively tested. Still, it may seldom happen that
such algorithms fail. Try to use conjugate-gradient diagonalization
(<TT>diagonalization='cg'</TT>), a slower but very robust algorithm, and see
what happens.
</LI>
<LI>buggy libraries. Machine-optimized mathematical libraries are
very fast but sometimes not so robust from a numerical point of
view. Suspicious behavior: you get an error that is not
reproducible on other architectures or that disappears if the
calculation is repeated with even minimal changes in
parameters. Known cases: HP-Compaq alphas with cxml libraries, Mac
OS-X with system BLAS/LAPACK. Try to use compiled BLAS and LAPACK
(or better, ATLAS) instead of machine-optimized libraries.
</LI>
</UL>
<P>
<H4><A ID="SECTION00060080000000000000">
5.0.0.8 pw.x crashes with no error message at all</A>
</H4>
This happens quite often in parallel execution, or under a batch
queue, or if you are writing the output to a file. When the program
crashes, part of the output, including the error message, may be lost,
or hidden into error files where nobody looks into. It is the fault of
the operating system, not of the code. Try to run interactively
and to write to the screen. If this doesn't help, move to next point.
<P>
<H4><A ID="SECTION00060090000000000000">
5.0.0.9 pw.x crashes with <EM>segmentation fault</EM> or similarly
obscure messages</A>
</H4>
Possible reasons:
<UL>
<LI>too much RAM memory or stack requested (see next item).
</LI>
<LI>if you are using highly optimized mathematical libraries, verify
that they are designed for your hardware.
</LI>
<LI>If you are using aggressive optimization in compilation, verify
that you are using the appropriate options for your machine
</LI>
<LI>The executable was not properly compiled, or was compiled on
a different and incompatible environment.
</LI>
<LI>buggy compiler or libraries: this is the default explanation if you
have problems with the provided tests and examples.
</LI>
</UL>
<P>
<H4><A ID="SECTION000600100000000000000">
5.0.0.10 pw.x works for simple systems, but not for large systems
or whenever more RAM is needed</A>
</H4>
Possible solutions:
<UL>
<LI>Increase the amount of RAM you are authorized to use (which may
be much smaller than the available RAM). Ask your system
administrator if you don't know what to do. In some cases the
stack size can be a source of problems: if so, increase it with command
<TT>limits</TT> or <TT>ulimit</TT>).
</LI>
<LI>Reduce <TT>nbnd</TT> to the strict minimum (for insulators, the
default is already the minimum, though).
</LI>
<LI>Reduce the work space for Davidson diagonalization to the minimum
by setting <TT>diago_david_ndim=2</TT>; also consider using conjugate
gradient diagonalization (<TT>diagonalization='cg'</TT>), slow but very
robust, which requires almost no work space.
</LI>
<LI>If the charge density takes a significant amount of RAM, reduce
<TT>mixing_ndim</TT> from its default value (8) to 4 or so.
</LI>
<LI>In parallel execution, use more processors, or use the same
number of processors with less pools. Remember that parallelization
with respect to k-points (pools) does not distribute memory:
only parallelization with respect to R- (and G-) space does.
</LI>
<LI>If none of the above is sufficient or feasible, you have to either
reduce the cutoffs and/or the cell size, or to use a machine with
more RAM.
</LI>
</UL>
<P>
<H4><A ID="SECTION000600110000000000000">
5.0.0.11 pw.x crashes with <EM>error in davcio</EM></A>
</H4>
<TT>davcio</TT> is the routine that performs most of the I/O operations (read
from disk and write to disk) in <TT>pw.x</TT>; <EM>error in davcio</EM> means a
failure of an I/O operation.
<UL>
<LI>If the error is reproducible and happens at the beginning of a
calculation: check if you have read/write permission to the scratch
directory specified in variable <TT>outdir</TT>. Also: check if there is
enough free space available on the disk you are writing to, and
check your disk quota (if any).
</LI>
<LI>If the error is irreproducible: your might have flaky disks; if
you are writing via the network using NFS (which you shouldn't do
anyway), your network connection might be not so stable, or your
NFS implementation is unable to work under heavy load
</LI>
<LI>If it happens while restarting from a previous calculation: you
might be restarting from the wrong place, or from wrong data, or
the files might be corrupted. Note that, since QE 5.1, restarting from
arbitrary places is no more supported: the code must terminate cleanly.
</LI>
<LI>If you are running two or more instances of <TT>pw.x</TT> at
the same time, check if you are using the same file names in the
same temporary directory. For instance, if you submit a series of
jobs to a batch queue, do not use the same <TT>outdir</TT> and
the same <TT>prefix</TT>, unless you are sure that one job doesn't
start before a preceding one has finished.
</LI>
</UL>
<P>
<H4><A ID="SECTION000600120000000000000">
5.0.0.12 pw.x crashes in parallel execution with an obscure message
related to MPI errors</A>
</H4>
Random crashes due to MPI errors have often been reported, typically
in Linux PC clusters. We cannot rule out the possibility that bugs in
Q<SMALL>UANTUM </SMALL>ESPRESSO cause such behavior, but we are quite confident that
the most likely explanation is a hardware problem (defective RAM
for instance) or a software bug (in MPI libraries, compiler, operating
system).
<P>
Debugging a parallel code may be difficult, but you should at least
verify if your problem is reproducible on different
architectures/software configurations/input data sets, and if
there is some particular condition that activates the bug. If this
doesn't seem to happen, the odds are that the problem is not in
Q<SMALL>UANTUM </SMALL>ESPRESSO. You may still report your problem,
but consider that reports like <EM>it crashes with...(obscure MPI error)</EM>
contain 0 bits of information and are likely to get 0 bits of answers.
<P>
<H4><A ID="SECTION000600130000000000000">
5.0.0.13 pw.x stops with error message <EM>the system is metallic,
specify occupations</EM></A>
</H4>
You did not specify state occupations, but you need to, since your
system appears to have an odd number of electrons. The variable
controlling how metallicity is treated is <TT>occupations</TT> in namelist
&SYSTEM. The default, <TT>occupations='fixed'</TT>, occupies the lowest
(N electrons)/2 states and works only for insulators with a gap. In all other
cases, use <TT>'smearing'</TT> (<TT>'tetrahedra'</TT> for DOS calculations).
See input reference documentation for more details.
<P>
<H4><A ID="SECTION000600140000000000000">
5.0.0.14 pw.x stops with <EM>internal error: cannot bracket Ef</EM></A>
</H4>
Possible reasons:
<UL>
<LI>serious error in data, such as bad number of electrons,
insufficient number of bands, absurd value of broadening;
</LI>
<LI>the Fermi energy is found by bisection assuming that the
integrated DOS N(E ) is an increasing function of the energy. This
is not guaranteed for Methfessel-Paxton smearing of order 1 and can
give problems when very few k-points are used. Use some other
smearing function: simple Gaussian broadening or, better,
Marzari-Vanderbilt-DeVita-Payne 'cold smearing'.
</LI>
</UL>
<P>
<H4><A ID="SECTION000600150000000000000">
5.0.0.15 pw.x yields <EM>internal error: cannot bracket Ef</EM> message
but does not stop</A>
</H4>
This may happen under special circumstances when you are calculating
the band structure for selected high-symmetry lines. The message
signals that occupations and Fermi energy are not correct (but
eigenvalues and eigenvectors are). Remove <TT>occupations='tetrahedra'</TT>
in the input data to get rid of the message.
<P>
<H4><A ID="SECTION000600160000000000000">
5.0.0.16 pw.x runs but nothing happens</A>
</H4>
Possible reasons:
<UL>
<LI>in parallel execution, the code died on just one
processor. Unpredictable behavior may follow.
</LI>
<LI>in serial execution, the code encountered a floating-point error
and goes on producing NaNs (Not a Number) forever unless exception
handling is on (and usually it isn't). In both cases, look for one
of the reasons given above.
</LI>
<LI>maybe your calculation will take more time than you expect.
</LI>
</UL>
<P>
<H4><A ID="SECTION000600170000000000000">
5.0.0.17 pw.x yields weird results</A>
</H4>
If results are really weird (as opposed to misinterpreted):
<UL>
<LI>if this happens after a change in the code or in compilation or
preprocessing options, try <TT>make clean</TT>, recompile. The <TT>make</TT>
command should take care of all dependencies, but do not rely too
heavily on it. If the problem persists, recompile with
reduced optimization level.
</LI>
<LI>maybe your input data are weird.
</LI>
</UL>
<P>
<H4><A ID="SECTION000600180000000000000">
5.0.0.18 FFT grid is machine-dependent</A>
</H4>
Yes, they are! The code automatically chooses the smallest grid that
is compatible with the
specified cutoff in the specified cell, and is an allowed value for the FFT
library used. Most FFT libraries are implemented, or perform well, only
with dimensions that factors into products of small numbers (2, 3, 5 typically,
sometimes 7 and 11). Different FFT libraries follow different rules and thus
different dimensions can result for the same system on different machines (or
even on the same machine, with a different FFT). See function allowed in
<TT>Modules/fft_scalar.f90</TT>.
<P>
As a consequence, the energy may be slightly different on different machines.
The only piece that explicitly depends on the grid parameters is
the XC part of the energy that is computed numerically on the grid. The
differences should be small, though, especially for LDA calculations.
<P>
Manually setting the FFT grids to a desired value is possible, but slightly
tricky, using input variables <TT>nr1</TT>, <TT>nr2</TT>, <TT>nr3</TT> and
<TT>nr1s</TT>, <TT>nr2s</TT>, <TT>nr3s</TT>. The
code will still increase them if not acceptable. Automatic FFT grid
dimensions are slightly overestimated, so one may try <EM>very carefully</EM>
to reduce
them a little bit. The code will stop if too small values are required, it will
waste CPU time and memory for too large values.
<P>
Note that in parallel execution, it is very convenient to have FFT grid
dimensions along <I>z</I> that are a multiple of the number of processors.
<P>
<H4><A ID="SECTION000600190000000000000">
5.0.0.19 pw.x does not find all the symmetries you expected</A>
</H4>
<TT>pw.x</TT> determines first the symmetry operations (rotations) of the
Bravais lattice; then checks which of these are symmetry operations of
the system (including if needed fractional translations). This is done
by rotating (and translating if needed) the atoms in the unit cell and
verifying if the rotated unit cell coincides with the original one.
<P>
Assuming that your coordinates are correct (please carefully check!),
you may not find all the symmetries you expect because:
<UL>
<LI>the number of significant figures in the atomic positions is not
large enough. In file <TT>PW/src/eqvect.f90</TT>, the variable <TT>accep</TT> is used to
decide whether a rotation is a symmetry operation. Its current value
(10<SUP>-5</SUP>), set in module <TT>PW/src/symm_base.f90</TT>,
is quite strict: a rotated atom must coincide with
another atom to 5 significant digits. You may change the value of
<TT>accep</TT> and recompile.
</LI>
<LI>they are not acceptable symmetry operations of the Bravais
lattice. This is the case for C<SUB>60</SUB>, for instance: the <I>I</I><SUB>h</SUB>
icosahedral group of C<SUB>60</SUB> contains 5-fold rotations that are
incompatible with translation symmetry.
</LI>
<LI>the system is rotated with respect to symmetry axis. For
instance: a C<SUB>60</SUB> molecule in the fcc lattice will have 24
symmetry operations (<I>T</I><SUB>h</SUB> group) only if the double bond is
aligned along one of the crystal axis; if C<SUB>60</SUB> is rotated
in some arbitrary way, <TT>pw.x</TT> may not find any symmetry, apart from
inversion.
</LI>
<LI>they contain a fractional translation that is incompatible with
the FFT grid (see next paragraph). Note that if you change cutoff or
unit cell volume, the automatically computed FFT grid changes, and
this may explain changes in symmetry (and in the number of k-points
as a consequence) for no apparent good reason (only if you have
fractional translations in the system, though).
</LI>
<LI>a fractional translation, without rotation, is a symmetry
operation of the system. This means that the cell is actually a
supercell. In this case, all symmetry operations containing
fractional translations are disabled. The reason is that in this
rather exotic case there is no simple way to select those symmetry
operations forming a true group, in the mathematical sense of the
term.
</LI>
</UL>
<P>
<H4><A ID="SECTION000600200000000000000">
5.0.0.20 <EM>Warning: symmetry operation # N not allowed</EM></A>
</H4>
This is not an error. If a symmetry operation contains a fractional
translation that is incompatible with the FFT grid, it is discarded in
order to prevent problems with symmetrization. Typical fractional
translations are 1/2 or 1/3 of a lattice vector. If the FFT grid
dimension along that direction is not divisible respectively by 2 or
by 3, the symmetry operation will not transform the FFT grid into
itself. Solution: you can either force your FFT grid to be
commensurate with fractional translation (set variables
<TT>nr1</TT>, <TT>nr2</TT>, <TT>nr3</TT> to suitable values), or
set variable <TT>use_all_frac</TT> to <TT>.true.</TT>, in namelist
&SYSTEM. Note however that the latter is incompatible with
hybrid functionals and with phonon calculations.
<P>
<H4><A ID="SECTION000600210000000000000">
5.0.0.21 Self-consistency is slow or does not converge at all</A>
</H4>
<P>
Bad input data will often result in bad scf convergence. Please
carefully check your structure first, e.g. using XCrySDen.
<P>
Assuming that your input data is sensible :
<OL>
<LI>Verify if your system is metallic or is close to a metallic
state, especially if you have few k-points. If the highest occupied
and lowest unoccupied state(s) keep exchanging place during
self-consistency, forget about reaching convergence. A typical sign
of such behavior is that the self-consistency error goes down, down,
down, than all of a sudden up again, and so on. Usually one can
solve the problem by adding a few empty bands and a small
broadening.
</LI>
<LI>Reduce <TT>mixing_beta</TT> to <!-- MATH
$\sim 0.3\div
0.1$
-->
∼0.3÷0.1 or smaller. Try the <TT>mixing_mode</TT> value that is more
appropriate for your problem. For slab geometries used in surface
problems or for elongated cells, <TT>mixing_mode='local-TF'</TT>
should be the better choice, dampening "charge sloshing". You may
also try to increase <TT>mixing_ndim</TT> to more than 8 (default
value). Beware: this will increase the amount of memory you need.
</LI>
<LI>Specific to USPP: the presence of negative charge density
regions due to either the pseudization procedure of the augmentation
part or to truncation at finite cutoff may give convergence
problems. Raising the <TT>ecutrho</TT> cutoff for charge density will
usually help.
</LI>
</OL>
<P>
<H4><A ID="SECTION000600220000000000000">
5.0.0.22 I do not get the same results in different machines!</A>
</H4>
If the difference is small, do not panic. It is quite normal for
iterative methods to reach convergence through different paths as soon
as anything changes. In particular, between serial and parallel
execution there are operations that are not performed in the same
order. As the numerical accuracy of computer numbers is finite, this
can yield slightly different results.
<P>
It is also normal that the total energy converges to a better accuracy
than its terms, since only the sum is variational, i.e. has a minimum
in correspondence to ground-state charge density. Thus if the
convergence threshold is for instance 10<SUP>-8</SUP>, you get 8-digit
accuracy on the total energy, but one or two less on other terms
(e.g. XC and Hartree energy). It this is a problem for you, reduce the
convergence threshold for instance to 10<SUP>-10</SUP> or 10<SUP>-12</SUP>. The
differences should go away (but it will probably take a few more
iterations to converge).
<P>
<H4><A ID="SECTION000600230000000000000">
5.0.0.23 Execution time is time-dependent!</A>
</H4>
Yes it is! On most machines and on
most operating systems, depending on machine load, on communication load
(for parallel machines), on various other factors (including maybe the phase
of the moon), reported execution times may vary quite a lot for the same job.
<P>
<H4><A ID="SECTION000600240000000000000">
5.0.0.24 <EM>Warning : N eigenvectors not converged</EM></A>
</H4>
This is a warning message that can be safely ignored if it is not
present in the last steps of self-consistency. If it is still present
in the last steps of self-consistency, and if the number of
unconverged eigenvector is a significant part of the total, it may
signal serious trouble in self-consistency (see next point) or
something badly wrong in input data.
<P>
<H4><A ID="SECTION000600250000000000000">
5.0.0.25 <EM>Warning : negative or imaginary charge...</EM>, or
<EM>...core charge ...</EM>, or <EM>npt with rhoup< 0...</EM> or <EM>rho dw< 0...</EM></A>
</H4>
These are warning messages that can be safely ignored unless the
negative or imaginary charge is sizable, let us say of the order of
0.1. If it is, something seriously wrong is going on. Otherwise, the
origin of the negative charge is the following. When one transforms a
positive function in real space to Fourier space and truncates at some
finite cutoff, the positive function is no longer guaranteed to be
positive when transformed back to real space. This happens only with
core corrections and with USPPs. In some cases it
may be a source of trouble (see next point) but it is usually solved
by increasing the cutoff for the charge density.
<P>
<H4><A ID="SECTION000600260000000000000">
5.0.0.26 Structural optimization is slow or does not converge or ends
with a mysterious bfgs error</A>
</H4>
Typical structural optimizations, based on the BFGS algorithm,
converge to the default thresholds ( etot_conv_thr and
forc_conv_thr ) in 15-25 BFGS steps (depending on the
starting configuration). This may not happen when your
system is characterized by "floppy" low-energy modes, that make very
difficult (and of little use anyway) to reach a well converged structure, no
matter what. Other possible reasons for a problematic convergence are listed
below.
<P>
Close to convergence the self-consistency error in forces may become large
with respect to the value of forces. The resulting mismatch between forces
and energies may confuse the line minimization algorithm, which assumes
consistency between the two. The code reduces the starting self-consistency
threshold conv thr when approaching the minimum energy configuration, up
to a factor defined by <TT>upscale</TT>. Reducing <TT>conv_thr</TT>
(or increasing <TT>upscale</TT>)
yields a smoother structural optimization, but if <TT>conv_thr</TT> becomes too small,
electronic self-consistency may not converge. You may also increase variables
<TT>etot_conv_thr</TT> and <TT>forc_conv_thr</TT> that determine the threshold for
convergence (the default values are quite strict).
<P>
A limitation to the accuracy of forces comes from the absence of perfect
translational invariance. If we had only the Hartree potential, our PW
calculation would be translationally invariant to machine
precision. The presence of an XC potential
introduces Fourier components in the potential that are not in our
basis set. This loss of precision (more serious for gradient-corrected
functionals) translates into a slight but detectable loss
of translational invariance (the energy changes if all atoms are displaced by
the same quantity, not commensurate with the FFT grid). This sets a limit
to the accuracy of forces. The situation improves somewhat by increasing
the <TT>ecutrho</TT> cutoff.
<P>
<H4><A ID="SECTION000600270000000000000">
5.0.0.27 pw.x stops during variable-cell optimization in
checkallsym with <EM>non orthogonal operation</EM> error</A>
</H4>
Variable-cell optimization may occasionally break the starting
symmetry of the cell. When this happens, the run is stopped because
the number of k-points calculated for the starting configuration may
no longer be suitable. Possible solutions:
<UL>
<LI>start with a nonsymmetric cell;
</LI>
<LI>use a symmetry-conserving algorithm: the Wentzcovitch algorithm
(<TT>cell dynamics='damp-w'</TT>) should not break the symmetry.
</LI>
</UL>
<P>
<BR><HR>
<!--Table of Child-Links-->
<A ID="CHILD_LINKS"><STRONG>Subsections</STRONG></A>
<UL>
<LI><UL>
<LI><UL>
<LI><A ID="tex2html212"
HREF="node21.html#SECTION00060010000000000000">5.0.0.1 pw.x says 'error while loading shared libraries' or
'cannot open shared object file' and does not start</A>
<LI><A ID="tex2html213"
HREF="node21.html#SECTION00060020000000000000">5.0.0.2 errors in examples with parallel execution</A>
<LI><A ID="tex2html214"
HREF="node21.html#SECTION00060030000000000000">5.0.0.3 pw.x prints the first few lines and then nothing happens
(parallel execution)</A>
<LI><A ID="tex2html215"
HREF="node21.html#SECTION00060040000000000000">5.0.0.4 pw.x stops with error while reading data</A>
<LI><A ID="tex2html216"
HREF="node21.html#SECTION00060050000000000000">5.0.0.5 pw.x mumbles something like <EM>cannot recover</EM> or
<EM>error reading recover file</EM></A>
<LI><A ID="tex2html217"
HREF="node21.html#SECTION00060060000000000000">5.0.0.6 pw.x stops with <EM>inconsistent DFT</EM> error</A>
<LI><A ID="tex2html218"
HREF="node21.html#SECTION00060070000000000000">5.0.0.7 pw.x stops with error in cdiaghg or rdiaghg</A>
<LI><A ID="tex2html219"
HREF="node21.html#SECTION00060080000000000000">5.0.0.8 pw.x crashes with no error message at all</A>
<LI><A ID="tex2html220"
HREF="node21.html#SECTION00060090000000000000">5.0.0.9 pw.x crashes with <EM>segmentation fault</EM> or similarly
obscure messages</A>
<LI><A ID="tex2html221"
HREF="node21.html#SECTION000600100000000000000">5.0.0.10 pw.x works for simple systems, but not for large systems
or whenever more RAM is needed</A>
<LI><A ID="tex2html222"
HREF="node21.html#SECTION000600110000000000000">5.0.0.11 pw.x crashes with <EM>error in davcio</EM></A>
<LI><A ID="tex2html223"
HREF="node21.html#SECTION000600120000000000000">5.0.0.12 pw.x crashes in parallel execution with an obscure message
related to MPI errors</A>
<LI><A ID="tex2html224"
HREF="node21.html#SECTION000600130000000000000">5.0.0.13 pw.x stops with error message <EM>the system is metallic,
specify occupations</EM></A>
<LI><A ID="tex2html225"
HREF="node21.html#SECTION000600140000000000000">5.0.0.14 pw.x stops with <EM>internal error: cannot bracket Ef</EM></A>
<LI><A ID="tex2html226"
HREF="node21.html#SECTION000600150000000000000">5.0.0.15 pw.x yields <EM>internal error: cannot bracket Ef</EM> message
but does not stop</A>
<LI><A ID="tex2html227"
HREF="node21.html#SECTION000600160000000000000">5.0.0.16 pw.x runs but nothing happens</A>
<LI><A ID="tex2html228"
HREF="node21.html#SECTION000600170000000000000">5.0.0.17 pw.x yields weird results</A>
<LI><A ID="tex2html229"
HREF="node21.html#SECTION000600180000000000000">5.0.0.18 FFT grid is machine-dependent</A>
<LI><A ID="tex2html230"
HREF="node21.html#SECTION000600190000000000000">5.0.0.19 pw.x does not find all the symmetries you expected</A>
<LI><A ID="tex2html231"
HREF="node21.html#SECTION000600200000000000000">5.0.0.20 <EM>Warning: symmetry operation # N not allowed</EM></A>
<LI><A ID="tex2html232"
HREF="node21.html#SECTION000600210000000000000">5.0.0.21 Self-consistency is slow or does not converge at all</A>
<LI><A ID="tex2html233"
HREF="node21.html#SECTION000600220000000000000">5.0.0.22 I do not get the same results in different machines!</A>
<LI><A ID="tex2html234"
HREF="node21.html#SECTION000600230000000000000">5.0.0.23 Execution time is time-dependent!</A>
<LI><A ID="tex2html235"
HREF="node21.html#SECTION000600240000000000000">5.0.0.24 <EM>Warning : N eigenvectors not converged</EM></A>
<LI><A ID="tex2html236"
HREF="node21.html#SECTION000600250000000000000">5.0.0.25 <EM>Warning : negative or imaginary charge...</EM>, or
<EM>...core charge ...</EM>, or <EM>npt with rhoup< 0...</EM> or <EM>rho dw< 0...</EM></A>
<LI><A ID="tex2html237"
HREF="node21.html#SECTION000600260000000000000">5.0.0.26 Structural optimization is slow or does not converge or ends
with a mysterious bfgs error</A>
<LI><A ID="tex2html238"
HREF="node21.html#SECTION000600270000000000000">5.0.0.27 pw.x stops during variable-cell optimization in
checkallsym with <EM>non orthogonal operation</EM> error</A>
</UL>
</UL>
<BR>
<LI><A ID="tex2html239"
HREF="node22.html">5.1 Compilation problems with <TT>PLUMED</TT></A>
<UL>
<LI><A ID="tex2html240"
HREF="node22.html#SECTION00061010000000000000">5.1.0.1 xlc compiler</A>
<LI><A ID="tex2html241"
HREF="node22.html#SECTION00061020000000000000">5.1.0.2 Calling C from fortran</A>
</UL></UL>
<!--End of Table of Child-Links-->
<HR>
<!--Navigation Panel-->
<A
HREF="node22.html">
<IMG WIDTH="37" HEIGHT="24" ALT="next" SRC="next.png"></A>
<A
HREF="user_guide.html">
<IMG WIDTH="26" HEIGHT="24" ALT="up" SRC="up.png"></A>
<A
HREF="node20.html">
<IMG WIDTH="63" HEIGHT="24" ALT="previous" SRC="prev.png"></A>
<A ID="tex2html210"
HREF="node1.html">
<IMG WIDTH="65" HEIGHT="24" ALT="contents" SRC="contents.png"></A>
<BR>
<B> Next:</B> <A
HREF="node22.html">5.1 Compilation problems with</A>
<B> Up:</B> <A
HREF="user_guide.html">User's Guide for the</A>
<B> Previous:</B> <A
HREF="node20.html">4.6 Restarting</A>
<B> <A ID="tex2html211"
HREF="node1.html">Contents</A></B>
<!--End of Navigation Panel-->
</BODY>
</HTML>
|