1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993
|
<html>
<head>
<title>BSE tutorial</title>
</head>
<body bgcolor="#ffffff">
<hr>
<h1>ABINIT, tutorial on Bethe-Salpeter calculations :</h1>
<h2>absorption spectra including excitonic effects. </h2>
<hr>
<p>
This lesson discusses how to calculate the macroscopic dielectric function including excitonic effects
within the Bethe-Salpeter (BS) approach.
Crystalline silicon is used as test case.
A brief description of the formalism can be found in the <a href="theory_bse.html">BS_notes</a>.
<p>
The user should be familiarized with the four basic lessons of ABINIT
and the first lesson of the GW tutorial, see the <a href="welcome.html">tutorial home page</a>.
<p>This lesson should take about one hour to be completed.
<h5>Copyright (C) 2006-2014 ABINIT group (MG,MS)
<br> This file is distributed under the terms of the GNU General Public License, see
~abinit/COPYING or <a href="http://www.gnu.org/copyleft/gpl.txt">
http://www.gnu.org/copyleft/gpl.txt </a>.
<br> For the initials of contributors, see ~abinit/doc/developers/contributors.txt .
</h5>
<script type="text/javascript" src="list_internal_links.js"> </script>
<h3><b>Content of the lesson</b></h3>
<p>
<ul>
<li><a href="#kss_scr" >1.</a> Preparatory steps (generating the KSS and the SCR file)
<li><a href="#first_bsrun" >2.</a> Computing the absorption spectrum with different approximations
<li><a href="#conv_on_bands" >3.</a> Convergence with respect to the number of bands in the transition space
<li><a href="#conv_on_ecuteps" >4.</a> Convergence with respect to the number of planewaves in the screening
<li><a href="#conv_on_k" >5.</a> Convergence with respect to the number of k-points
<li><a href="#additional_exercises" >6.</a> Additional exercises (optional)
<li><a href="#BS_MPI" >7.</a> Notes on the MPI implementation
</ul>
<p>
<p><a name="kss_scr"></a><br>
<h3><b> 1. Preparatory steps (generating the KSS and the SCR file).</b></h3>
<p><i>Before beginning, you might consider to work in a different subdirectory
as for the other lessons. Why not "Work_bs" ?</i>
<p>
In the directory ~abinit/tests/tutorial/Input/Work_bs, copy the files file ~abinit/tests/tutorial/Input/tbs_1.files.
Now run immediately the calculation with the command:
<pre>
abinit < tbs_1.files >& tbs_1.log &
</pre>
so that we can analyze the input file while the code is running.
<p>
The input file is located in ~abinit/tests/tutorial/Input/tbs_1.in.
The header reports a brief description of the calculation so read it carefully.
Don't worry if some parts are not clear to you as we are going to discuss the calculation in step by step fashion.
<p>
This input file generates the two KSS files and the screening file needed for the subsequent Bethe-Salpeter computations.
The first dataset performs a rather standard ground-state calculation
on an unshifted 4x4x4 grid (64 k points in the full Brillouin Zone, folding to 8 k points in the irreducible wedge).
Then the ground-state density is used in dataset 2 and 3
to generate two KSS files with a standard NSCF cycle, solved
with the conjugate-gradient method (<a href="../input_variables/varfil.html#kssform" target="kwimg">kssform</a>=3).
<p>
<!--
with <a href="../input_variables/varbas.html#tolvrs" target="kwimg">tolvrs</a>=1.0d-8 as stopping
We use <a href="../input_variables/varbas.html#tolvrs" target="kwimg">tolvrs</a> as stopping
criterion since we want to obtain an accurate potential for our band structure calculation.
The density file is then used in dataset 2 and 3 to produce two different KSS files that
only differ for the k-point mesh and the number of bands stored on file.
-->
Note that the KSS file computed in dataset 2 contains 100 bands on the 4x4x4 gamma-centered k-mesh whereas
the KSS file produced in dataset 3 has only 10 bands on a 4x4x4 k-mesh that has been shifted
along the direction
<pre>
shiftk3 0.11 0.21 0.31 # This shift breaks the symmetry of the k-mesh.
</pre>
The gamma-centered k-mesh contains 8 points in the irreducible zone
while the shifted k-mesh breaks the symmetry of the crystal leading to 64 points in the IBZ
(actually the IBZ now coincides with the full Brillouin zone).
The second mesh is clearly inefficient, so you might wonder why we are using such a bizarre sampling
and, besides, why we need to generate two different KSS files!
<p>
Indeed this approach strongly differs from the one we followed in the GW tutorials, but there is
a good reason for doing so.
It is anticipated that optical spectra converge slowly with the Brillouin zone sampling,
and that symmetry-breaking k-meshes lead to faster convergence in
<a href="../input_variables/varbas.html#nkpt" target="kwimg">nkpt</a>
than the standard symmetric k-meshes commonly used for ground-state or GW calculations.
<p>
This explains the bizarre shift but still why two KSS files?
Why don't we simply use the KSS file on the shifted k-mesh to compute the screening?
<p>
The reason is that a screening calculation done with many empty bands on the shifted k-mesh
would be very memory demanding as the code should allocate a huge portion of memory whose size scales with
(<a href="../input_variables/varbas.html#nband" target="kwimg">nband</a> *
<a href="../input_variables/varbas.html#nkpt" target="kwimg">nkpt</a>),
and no symmetry can be used to reduce the number of k-points.
<p>
To summarize:
the KSS with the symmetric k-point sampling and 100 bands will be used to compute the screening, while the
KSS file with the shifted k-mesh and 10 bands will be used to construct the transition space employed for
solving the Bethe-Salpeter equation.
The two k-meshes differ just for the shift thus they produce the same set of q-points (the list of
q-points in the screening is defined as all the possible differences between the k-points of the KSS file).
This means that, in the BS run, we can use the SCR file generated with the symmetric mesh even though
the transition space is constructed with the shifted k-mesh.
<p>
After this lengthy discussion needed to clarify this rather technical point,
we can finally proceed to analyze the screening computation performed in the last dataset of tbs_1.in.
<p>
The SCR file is calculated in dataset 4 using
<a href="../input_variables/varbas.html#nband" target="kwimg">nband</a>=100
and
<a href="../input_variables/vargw.html#ecuteps" target="kwimg">ecuteps</a> 6.0 Ha.
In the <a href="lesson_gw1.html">first lesson</a> of the GW tutorial,
these values were found to give QP energies converged within 0.01 eV,
so we are confident that our SCR file is well converged and it can be safely used for performing
convergence tests in the Bethe-Salpeter part.
<p>
Note that, for efficiency reasons, only the static limit of W is computed:
<pre>
nfreqre4 1 # Only the static limit of W is needed for standard BSE calculations.
nfreqim4 0
</pre>
Indeed, in the standard formulation of the Bethe-Salpeter equation, only the static limit
of the screened interaction is needed to construct the Coulomb term of the BS Hamiltonian.
Using a single frequency allows us to save some CPU time in the screening part, but keep in mind
that this SCR file can only be used either for Bethe-Salpeter computations or for
GW calculations employing the plasmon-pole models corresponding to
<a href="../input_variables/vargw.html#ppmodel" target="kwimg">ppmodel</a>=3,4.
<!--
<p>
Note also that the k point grid is quite rough,
<a href="../input_variables/varbas.html#ngkpt" target="kwimg">ngkpt</a>=4 4 4).
Converged results would need a much denser sampling of the Brillouin zone.
We postpone the discussion on the converge study in the number of k-point to the last section of this lesson.
-->
<p>
At this point the calculation should have completed, but there's still one thing that we have
to do before moving to the next paragraph.
<p>
As we said, we will need the KSS file on the shifted k-mesh and the SCR file for our BS
calculations so do not delete them!
It is also a good idea to rename these precious files using more meaningful names e.g.:
<pre>
mv tbs_1o_DS2_KSS 444_gamma_KSS
mv tbs_1o_DS3_KSS 444_shifted_KSS
mv tbs_1o_DS4_SCR 444_SCR
</pre>
Keep in mind that henceforth the k-point sampling cannot be changed anymore:
the list of k-points specified in the BS input files MUST equal the one used to generate the KSS file.
Two new KSS files and a new SCR file must be generated from scrath if we want to change the k-point
sampling used to construct the transition space.
<hr>
<h3><p><b><a name="first_bsrun">2.</a>
Computing the absorption spectrum within the Tamm-Dancoff approximation.
</b></h3>
<p>
This section is intended to show how to perform a standard excitonic calculation within the Tamm-Dancoff approximation (TDA)
using the Haydock iterative technique.
The input file is ~abinit/tests/tutorial/Input/tbs_2.in.
<p>
Before running the job, we have to connect this calculation with the output results produced in tbs_1.in.
<p>
Use the Unix commands:
<pre>
ln -s 444_shifted_KSS tbs_2i_KSS
ln -s 444_SCR tbs_2i_SCR
</pre>
to create two symbolic links for the shifted KSS and the SCR file.
The reason for doing so will be clear afterwards once we discuss the input file.
<p>
This job lasts 1-2 minutes on a modern machine so it is worth running it before inspecting the input file.
<p>
Copy the files file ~abinit/tests/tutorial/Input/tbs_2.files
in the working directory and issue
<pre>
abinit < tbs_2.files >& tbs_2.log &
</pre>
<p>
to put the job in background so that we can examine tbs_2.in.
<p>
Now open ~abinit/tests/tutorial/Input/tbs_2.in
in your preferred editor and go to the next section where we discuss
the most important variables governing a typical BS computation.
<h4><a name="2a"></a></h4>
<h4><b> 2.a The structure of the input file.</b></h4>
<p>
First we need to set <a href="../input_variables/vargs.html#optdriver" target="kwimg">optdriver</a>=99
to call the BSE routines
<pre>
optdriver 99 # BS calculation
</pre>
The variables
<a href="../input_variables/varfil.html#irdkss" target="kwimg">irdkss</a> and
<a href="../input_variables/varfil.html#irdscr" target="kwimg">irdscr</a>
are similar to other "ird" variables of ABINIT and are used to read the files produced in the
previous paragraph
<pre>
irdkss 1 # Read the KSS file produced in tbs_1
irdscr 1 # Read the SCR file produced in tbs_1
</pre>
The code expects to find an input KSS file and an input SCR file whose name is constructed
according to prefix specified in the files file tbs_2.files
(see <a href="../users/abinit_help.html#intro1" target="helpsimg">section 1.1</a> of the abinit_help file).
This is the reason why we had to create the two symbolic links before running the code.
<p>
Then we have a list of five variables specifying how to construct the excitonic Hamiltonian.
<pre>
bs_calctype 1 # L0 is constructed with KS orbitals and energies.
soenergy 0.8 eV # Scissors operator used to correct the KS band structure.
bs_exchange_term 1 # Exchange term included.
bs_coulomb_term 11 # Coulomb term included using the full matrix W_GG'
bs_coupling 0 # Tamm-Dancoff approximation.
</pre>
The value <a href="../input_variables/vargw.html#bs_calctype" target="kwimg">bs_calctype</a>=1 specifies that
the independent-particle polarizability should be costructed with the Kohn-Sham orbitals and energies read from the KSS file.
To simulate the self-energy correction, the KS energies are corrected with a scissors operator
of energy <a href="../input_variables/vargw.html#soenergy" target="kwimg">soenergy</a>= 0.8 eV.
This permits us to avoid a cumbersome GW calculation for each state
included in our transition space. The use of the scissors operator is a reasonable approximation for silicon but
it might fail in more complicated systems in which the GW corrections cannot be simulated in terms of a simple
rigid shift of the initial KS bands structure.
<p>
The remaining three variables specify how to construct the excitonic Hamiltonian.
<a href="../input_variables/vargw.html#bs_exchange_term" target="kwimg">bs_exchange_term</a>=1 tells
the program to calculate the exchage part of the kernel, hence this calculation includes local-field effects.
The variable <a href="../input_variables/vargw.html#bs_coulomb_term" target="kwimg">bs_coulomb_term</a>
is used to select among different options that are available for the Coulomb term
(please take some time to read the description of the variable and the relevant equations in
the <a href="theory_bse.html">BS_notes</a>).
Finally <a href="../input_variables/vargw.html#bs_coupling" target="kwimg">bs_coupling</a>=0
specifies that the off-diagonal coupling blocks should be neglected (Tamm-Dancoff approximation).
This particular combination of parameters thus corresponds to a Bethe-Salpeter calculation within
the Tamm-Dancoff approximation with local field effects included.
<p>
Then we have the specification of the bands used to construct the transition space:
<pre>
bs_loband 2
nband 8
</pre>
In this case all the bands around the gap whose index is between 2 and 8 are included in the basis set.
<p>
The frequency mesh for the the macroscopic dielectric function is specified through
<a href="../input_variables/vargw.html#bs_freq_mesh" target="kwimg">bs_freq_mesh</a>
<pre>
bs_freq_mesh 0 6 0.02 eV # Frequency mesh.
</pre>
This triplet of real values defines a linear mesh that covers the range [0,6] eV with a step of 0.02 eV.
The number of frequency points in the mesh does not have any significant effect on the CPU time,
but it is important to stress that the number of bands included in the transition space defines, in
conjunction with the number of k-points, the frequency range that can be described.
As a consequence
<a href="../input_variables/vargw.html#bs_loband" target="kwimg">bs_loband</a> and
<a href="../input_variables/varbas.html#nband" target="kwimg">nband</a>
should be subject to an accurate converge study.
<p>
Then we have the parameters that define and control the algorithm employed to calculate the macroscopic dielectric function
<pre>
bs_algorithm 2 # Haydock method.
bs_haydock_niter 100 # Max number of iterations for the Haydock method.
bs_haydock_tol 0.05 # Tolerance for the iterative method.
zcut 0.15 eV # Complex shift to avoid divergences in the continued fraction.
</pre>
<a href="../input_variables/vargw.html#bs_algorithm" target="kwimg">bs_algorithm</a> specifies
the algorithm used to calculate the macroscopic dielectric function.
In this case we use the iterative Haydock technique whose maximum number of
iterations is given by <a href="../input_variables/vargw.html#bs_haydock_niter" target="kwimg">bs_haydock_niter</a>.
The iterative algorithm stops when the difference between two consecutive evaluations of the optical
spectra is less than <a href="../input_variables/vargw.html#bs_haydock_tol">bs_haydock_tol</a>.
The input variable <a href="../input_variables/vargw.html#zcut" target="kwimg">zcut</a> gives the complex shift to avoid divergences
in the continued fraction.
From a physical point of view, this parameters mimics the experimental broadening of the absorption peaks.
In this test, due to the coarseness of the k-mesh, we have to use a value slightly larger than the default one (0.1 eV)
in order to facilitate the convergence of the Haydock algorithm.
Ideally, one should make a convergence study decreasing
the value of <a href="../input_variables/vargw.html#zcut" target="kwimg">zcut</a> for increasing number of k-points.
<p>
The k-point sampling is specified by the set of variables.
<pre>
# Definition of the k-point grid
kptopt 1 # Option for the automatic generation of k points,
ngkpt 4 4 4 # This mesh is too coarse for optical properties.
nshiftk 1
shiftk 0.11 0.21 0.31 # This shift breaks the symmetry of the k-mesh.
chksymbreak 0 # Mandatory for using symmetry-breaking k-meshes in the BS code.
</pre>
The values of
<a href="../input_variables/varbas.html#kptopt" target="kwimg">kptopt</a>,
<a href="../input_variables/varbas.html#ngkpt" target="kwimg">ngkpt</a>,
<a href="../input_variables/varbas.html#nshiftk" target="kwimg">nshiftk</a>,
and
<a href="../input_variables/varbas.html#shiftk" target="kwimg">shiftk</a>
MUST equal the ones used to specify the grid for the KSS file.
<a href="../input_variables/vargs.html#chksymbreak" target="kwimg">chksymbreak</a>=0
is used to bypass the check on symmetry breaking that, otherwise, would make the code stop.
<p>
<!--
TODO: add a check in the code
Do not use <a href="../input_variables/vargs.html#chksymbreak" target="kwimg">chksymbreak</a>=0
in the Bethe-Salpeter run if the mesh preserves the symmetry
-->
<p>
The last section of the input file
<pre>
ecutwfn 8.0 # Cutoff for the wavefunction.
ecuteps 2.0 # Cutoff for W and /bare v used to calculate the BS matrix elements.
inclvkb 2 # The Commutator for the optical limit is correctly evaluated.
</pre>
specifies the parameters used to calculate the kernel matrix elements and the matrix elements of the dipole operator.
We have already encoutered these variables in the <a href="lesson_gw1.html">first lesson</a> of the
GW tutorial so their meaning is (hopefully) familiar to you.
A more detailed discussion of the role played by these variables in the BS code can be found in the <a href="theory_bse.html">BS_notes</a>.
<h4><a name="2c"></a></h4>
<h4><b> 2.c Output files.</b></h4>
<p>
The output file, tbs_2.out, reports the basic parameters of the calculation
and eventual WARNINGs that are issued if the iterative method does not converge.
Please take some time to understand its structure.
<p>
Could you answer the the following questions?
<ol>
<li>How many transitions are included in the basis set?
<li>How many directions are used to evaluate the optical limit?
<li>What is the value of the Lorentzian broadening used in the continued fraction?
</ol>
<p>
After this digression on the main output file, we can finally proceed to analyse the output data of the computation.
<p>
The most important results are stored in five different files:
<ul>
<li>tbs_2o_BSR
<li>tbs_2o_HAYDR_SAVE
<li>tbs_2o_RPA_NLF_MDF
<li>tbs_2o_GW_NLF_MDF
<li>tbs_2o_EXC_MDF
</ul>
In what follows, we provide a brief description of the format and of the content of each output file.
<p>
<ul><li>
tbs_2o_BSR:
</li></ul>
<p>
This binary file stores the upper triangle of the resonant block (the matrix is Hermitian
hence only the non-redundant part is computed and saved on file).
The BSR file can be used to restart the run from a previous computation using the variables
<a href="../input_variables/varfil.html#getbsreso" target="kwimg">getbsreso</a>
or <a href="../input_variables/varfil.html#irdbsreso" target="kwimg">irdbsreso</a>.
This restart capability is useful for restarting the Haydock method if convergence was not achieved
or to execute Haydock computations with different values of
<a href="../input_variables/vargw.html#zcut" target="kwimg">zcut</a>.
<a href="../input_variables/varfil.html#getbsreso" target="kwimg">getbsreso</a>
and <a href="../input_variables/varfil.html#irdbsreso" target="kwimg">irdbsreso</a>
are also handy if one wants to include the coupling on top of a pre-existing TDA calculation
since the code uses two different files to store the resonant and the coupling block (BSC is the
prefix used for the files storing the coupling term).
<p>
<ul><li>
tbs_2o_HAYDR_SAVE:
</li></ul>
<p>
It is a binary file containing the results of the Haydock method: the coefficient of the tridiagonal
matrix and the three vectors employed in the iterative algorithm.
It is usually used to restart the algorithm if convergence has not been achieved
(see the related input variables
<a href="../input_variables/varfil.html#gethaydock">gethaydock</a> and <a href="../input_variables/varfil.html#irdhaydock">irdhaydock</a>).
<p>
<ul><li>
tbs_2o_RPA_NLF_MDF and tbs_2o_GW_NLF_MDF
</li></ul>
<p>
The RPA spectrum without local field effects obtained with KS energies and the GW energies, respectively
(mnemonics: NLF stands for No Local Field, while MDF stands for Macroscopic Dielectric Function).
<p>
<ul><li>
tbs_2o_EXC_MDF:
</li></ul>
<p>
Formatted file reporting the macroscopic dielectric function with excitonic effects.
Since this file contains the most important results of our calculation it is worth spending some time to
discuss its format.
<p>
First we have a header reporting the basic parameters of the calculation
<pre>
# Macroscopic dielectric function obtained with the BS equation.
# RPA L0 with KS energies and KS wavefunctions LOCAL FIELD EFFECTS INCLUDED
# RESONANT-ONLY calculation
# Coulomb term constructed with full W(G1,G2)
# Scissor operator energy = 0.8000 [eV]
# Tolerance = 0.0500
# npweps = 27
# npwwfn = 283
# nbands = 8
# loband = 2
# nkibz = 64
# nkbz = 64
# Lorentzian broadening = 0.1500 [eV]
</pre>
then the list of q-points giving the direction of the incident photon:
<pre>
# List of q-points for the optical limit:
# q = 0.938821, 0.000000, 0.000000, [Reduced coords]
# q = 0.000000, 0.938821, 0.000000, [Reduced coords]
# q = 0.000000, 0.000000, 0.938821, [Reduced coords]
# q = 0.000000, 0.813043, 0.813043, [Reduced coords]
# q = 0.813043, 0.000000, 0.813043, [Reduced coords]
# q = 0.813043, 0.813043, 0.000000, [Reduced coords]
</pre>
By default the code calculates the macroscopic dielectric function considering six different
directions in q-space (the three basis vectors of the reciprocal lattice and the three Cartesian
axis). It is possible to specify custom directions using the input variables
<a href="../input_variables/vargw.html#gw_nqlwl" target="kwimg">gw_nqlwl</a> and
<a href="../input_variables/vargw.html#gw_qlwl" target="kwimg">gw_qlwl</a>.
<p>
Then comes the section with the real and the imaginary part of the macroscopic dielectric as a
function of frequency for the different directions:
<pre>
# omega [eV] RE(eps(q=1)) IM(eps(q=1) RE(eps(q=2) ) ...
0.000 1.8026E+01 0.0000E+00 1.7992E+01 0.0000E+00 1.4292E+01 0.0000E+00 1.3993E+01 0.0000E+00 1.7117E+01 0.0000E+00 1.7080E+01 0.0000E+00
.... .... ...
</pre>
You can visualize the data using your preferred software. For instance,
<pre>
$ gnuplot
gnuplot> p "tbs_2o_EXC_MDF" u 1:3 w l
</pre>
will plot the imaginary part of the macroscopic dielectric function (the absorption spectrum) for the first q-point.
You should obtain a graphic similar to the one reported below
<p align="center"><img src=./lesson_bse/tbs2_1.png></p>
<p>
Please note that these results are not converged, we postpone the discussion about convergence tests
to the next paragraphs of this tutorial.
<!--
For the moment, we are mainly interested in discussing the physi meaning of our results.
-->
<p>
The most important feature of the spectrum is the presence of two peaks located at around 3.4 and 4.3 eV.
To understand the nature of these peaks and the role played by the BS kernel, it is useful to compare
the excitonic spectra with the RPA results obtained without local field effects.
<p>
Use the sequence of gnuplot command
<pre>
$ gnuplot
gnuplot> p "tbs_2o_EXC_MDF" u 1:3 w l
gnuplot> rep "tbs_2o_RPA_NLF_MDF" u 1:3 w l
gnuplot> rep "tbs_2o_GW_NLF_MDF" u 1:3 w l
</pre>
to plot the absorption spectrum obtained with the three different approaches.
The final result is reported in the figure below.
<p align="center"><img src=./lesson_bse/tbs2_2.png></p>
<p>
The RPA-KS spectrum underestimates the experimental optical threshold due to the well know band gap problem of DFT.
Most importantly, the amplitude of the first peak is underestimated, a problem than
is not solved when local-field effects are correctly included in the calculation.
<p>
The RPA-GW results with QP corrections simulated with <a href="../input_variables/vargw.html#soenergy" target="kwimg">soenergy</a>
does not show any significant improvement over RPA-KS:
the RPA-GW spectrum is just shifted towards higher frequencies due to opening of the gap,
but the shape of the two spectra is very similar, in particular the amplitude of the first peak is
still underestimated.
<p>
On the contrary, the inclusion of the BS kernel leads to important changes both in the optical threshold
as well as in the amplitude of the first peak.
This simple analysis tells us that the first peak in the absorption spectrum of silicon has a strong
excitonic character that is not correctly described within the RPA.
Our first BS spectrum is not converged at all and it barely resembles the experimental result,
nevertheless this unconverged calculation is already able to capture the most important physics.
<h4><a name="2c"></a></h4>
<h4><b> 2.c Optional Exercises.</b></h4>
<ul>
<li>
<p>
Change the value of the Lorentzian broadening <a href="../input_variables/vargw.html#zcut" target="kwimg">zcut</a>
used to avoid divergences in the continued fraction.
Then restart the Haydock algorithm from the _BSR and _HAYDR_SAVE files using the appropriate variables.
What is the main effect of the broadening on the final spectrum. Does the number of iterations
needed to converge depend on the broadening?
<li>
<p>
Use the appropriate values for <a href="../input_variables/vargw.html#bs_exchange_term" target="kwimg">bs_exchange_term</a> and
<a href="../input_variables/vargw.html#bs_coulomb_term" target="kwimg">bs_coulomb_term</a>
to calculate the BS spectrum without local field effects.
Compare the results obtained with and without local field effects.
<li>
<p>
Modify the input file tbs_2.in so that the code reads in the resonant block produced in the previous run
and calculates the spectrum employing the method based on the direct diagonalization
(use <a href="../input_variables/varfil.html#irdbsreso" target="kwimg">irdbsreso</a> to
restart the run but remember to rename the file with the resonant block). Compare the CPU time
needed by the two algorithms as a function of the number of transitions in the transition space.
Which one has the best scaling?
</ul>
<h4><p><b><a name="discussion_on_conv">2.d</a>
Preliminary discussion about convergence studies
</b></h4>
<p>
Converging the excitonic spectrum requires a careful analysis of many different parameters:
<ul>
<li> <a href="../input_variables/vargw.html#bs_loband" target="kwimg">bs_loband</a>
<li> <a href="../input_variables/varbas.html#nband" target="kwimg">nband</a>
<li> <a href="../input_variables/vargw.html#ecutwfn" target="kwimg">ecutwfn</a>
<li> <a href="../input_variables/vargw.html#ecuteps" target="kwimg">ecuteps</a>
<li> <a href="../input_variables/varbas.html#ngkpt" target="kwimg">ngkpt</a>
<li> <a href="../input_variables/varbas.html#nshiftk" target="kwimg">nshiftk</a>
<li> <a href="../input_variables/varbas.html#shiftk" target="kwimg">shiftk</a>
</ul>
Since the memory requirements scale quadratically with the number of k-points in the <i><b>full</b></i> Brillouin zone
<i><b>times</b></i> the number of valence bands <i><b>times</b></i> the number of conduction bands included in the transition space,
it is very important to find a good compromise between accuracy and computational efficiency.
<p>
First of all one should select the frequency range of interest since this choice has an important
effect on the number of valence and conduction states that have to be included in the transition space.
The optical spectrum is expected to converge faster in the number of bands than the GW corrections since
only those transitions whose energy is "close" to the frequency range under investigation are
expected to contribute.
<p>
<a href="../input_variables/vargw.html#ecutwfn" target="kwimg">ecutwfn</a> usually plays a secondary
role since it only affects the accuracy of the oscillator matrix elements.
We suggest avoiding any truncation of the initial basis set by setting
<a href="../input_variables/vargw.html#ecutwfn" target="kwimg">ecutwfn</a> to a value slightly
larger than the value of <a href="../input_variables/varbas.html#ecut" target="kwimg">ecut</a> used
to generate the KSS file.
One should truncate the initial planewave basis set only when experiencing memory problems although this kind of
problems can be usually solved by just increasing the number of processors or, alternatively,
with an appropriate choice of <a href="../input_variables/vargw.html#gwmem" target="kwimg">gwmem</a>.
<p>
The value of <a href="../input_variables/vargw.html#ecuteps" target="kwimg">ecuteps</a> affects the
accuracy of the matrix elements of the Coulomb term, the fundamental term that drives the creation
of the excitons.
As a consequence <a href="../input_variables/vargw.html#ecuteps" target="kwimg">ecuteps</a> should be subject
to an accurate convergence test.
As a rule of thumb, <a href="../input_variables/vargw.html#ecuteps" target="kwimg">ecuteps</a> can be
chosen equal or, sometimes, even smaller than the value needed to converge the GW corrections.
<p>
As already stated: optical spectra converge slowly with the Brillouin zone sampling.
The convergence in the number of k-points thus represents the most important and tedious part of our convergence study.
For this reason, this study should be done once converged values for the other parameters have been already found.
<hr>
<h3><p><b><a name="conv_on_bands">3.</a>
Convergence with respect to the number of bands in the transition space
</b></h3>
In this section we take advantage of the multi-dataset capabilities of ABINIT to perform calculations
with different values for
<a href="../input_variables/vargw.html#bs_loband" target="kwimg">bs_loband</a>
and
<a href="../input_variables/varbas.html#nband" target="kwimg">nband</a>
<p>
Before running the test take some time to read the input file ~abinit/tests/tutorial/Input/tbs_3.in.
<p>
The convergence in the number of transitions is performed by defining five datasets
with different values for <a href="../input_variables/varbas.html#nband" target="kwimg">nband</a>
and <a href="../input_variables/vargw.html#bs_loband" target="kwimg">bs_loband</a>
<pre>
ndtset 5
bs_loband1 4 nband1 5
bs_loband2 3 nband2 6
bs_loband3 2 nband3 7
bs_loband4 2 nband4 8
bs_loband5 1 nband5 8
</pre>
<!--
<p>
In our case we are only interested in the absorption spectrum for frequencies up to 8 eV hence
we use reasonable values for the k-points sampling and the other parameters and we monitor
the convergence of the spectrum with respect to the number of conduction and valence states used to
construct our basis set.
Also in this case we use the Haydock technique.
-->
<p>
The parameters defining how to build the excitonic Hamiltonian are similar to the ones used in tbs_2.in.
The only difference is in the value used for <a href="../input_variables/vargw.html#bs_coulomb_term" target="kwimg">bs_coulomb_term</a>, i.e.
<pre>
bs_coulomb_term 10 # Coulomb term evaluated within the diagonal approximation.
</pre>
that allows us to save some CPU time during the computation of the Coulomb term.
<p>
Also in this case, before running the test, we have to connect tbs_3.in to the KSS and the SCR file
produced in the first step.
Note that tbs_3.in uses <a href="../input_variables/varfil.html#irdkss" target="kwimg">irdkss</a> and
<a href="../input_variables/varfil.html#irdscr" target="kwimg">irdscr</a> to read the external files,
hence we have to create symbolic links for each dataset:
<pre>
ln -s 444_SCR tbs_3i_DS1_SCR
ln -s 444_SCR tbs_3i_DS2_SCR
ln -s 444_SCR tbs_3i_DS3_SCR
ln -s 444_SCR tbs_3i_DS4_SCR
ln -s 444_SCR tbs_3i_DS5_SCR
ln -s 444_shifted_KSS tbs_3i_DS1_KSS
ln -s 444_shifted_KSS tbs_3i_DS2_KSS
ln -s 444_shifted_KSS tbs_3i_DS3_KSS
ln -s 444_shifted_KSS tbs_3i_DS4_KSS
ln -s 444_shifted_KSS tbs_3i_DS5_KSS
</pre>
<p>
Now we can finally run the the test with
<pre>
abinit < tbs_3.files >& tbs3.log &
</pre>
This job should last 3-4 minutes so be patient!
<p>
Let us hope that your calculation has been completed, and that we can examine the output results.
<p>
Use the following sequence of gnuplot commands:
<pre>
$ gnuplot
gnuplot> p "tbs_3o_DS1_EXC_MDF" u 1:3 w l
gnuplot> rep "tbs_3o_DS2_EXC_MDF" u 1:3 w l
gnuplot> rep "tbs_3o_DS3_EXC_MDF" u 1:3 w l
gnuplot> rep "tbs_3o_DS4_EXC_MDF" u 1:3 w l
gnuplot> rep "tbs_3o_DS5_EXC_MDF" u 1:3 w l
</pre>
to plot on the same graphic the absorption spectrum obtained with different transition spaces.
You should obtain a graphic similar to this one:
<p align="center"><img src=./lesson_bse/tbs3.png></p>
<p>
The results obtained with (<a href="../input_variables/vargw.html#bs_loband" target="kwimg">bs_loband</a>=4,
<a href="../input_variables/varbas.html#nband" target="kwimg">nband</a>=5) are clearly unconverged
as the basis set contains too few transitions that are not able to describe the frequency-dependence
of the polarizability in the energy range under investigation.
For a well converged spectrum, we have to include the three higher occupied states and the first four conduction bands
(the blue curve corresponding to
<a href="../input_variables/vargw.html#bs_loband" target="kwimg">bs_loband</a>=2, and
<a href="../input_variables/varbas.html#nband" target="kwimg">nband</a>=8).
<p>
Note that adding the first occupied band, curve (1-8), gives results that are almost on top of (2,8).
This is due to the fact that, in silicon, the bottom of the first band
is located at around 12 eV from the top of the conduction band therefore its inclusion does not
lead to any significant improvement of the transition space in the frequency range [0,8] eV.
For completeness, we also report the results obtained in a separate calculation done with
<a href="../input_variables/vargw.html#bs_loband" target="kwimg">bs_loband</a>=2
<a href="../input_variables/varbas.html#nband" target="kwimg">nband</a>=9 to show that four empty
states are enough to converge the spectrum.
<p>
We therefore fix the number of bands for the transition space using the converged values
<a href="../input_variables/vargw.html#bs_loband" target="kwimg">bs_loband</a>=2,
<a href="../input_variables/varbas.html#nband" target="kwimg">nband</a>=8,
and we proceed to analyse the convergence of the spectrum with respect to the number of planewaves in the screening.
<h4>For expert users:</h4>
<p>
The use of <a href="../input_variables/varfil.html#irdkss" target="kwimg">irdkss</a> and
<a href="../input_variables/varfil.html#irdscr" target="kwimg">irdscr</a>
is not handy when we have several datasets that are reading the SAME external file
as we are forced to use different names for the input of each dataset.
To work around this annoyance, one can introduce a fictious dataset (say dataset 99),
and let the code use the output of this unexisting dataset as the input of the real datasets.
An example will help clarify:
Instead of using the lengthy list of links as done before, we might use the much simpler
sequence of commands
<pre>
ln -s 444_shifted_KSS tbs_3o_DS99_KSS
ln -s 444_SCR tbs_3o_DS99_SCR
</pre>
provided that, in the input file, we replace
<a href="../input_variables/varfil.html#irdkss" target="kwimg">irdkss</a>
and
<a href="../input_variables/varfil.html#irdscr" target="kwimg">irdscr</a> with
<pre>
getkss 99 # Trick to read the same file tbs_o3_DS99_KSS in each dataset
getscr 99 # Same trick for the SCR file
</pre>
</div>
<hr>
<h3><p><b><a name="conv_on_ecuteps">5.</a>
Convergence with respect to the number of planewaves in the screening
</b></h3>
<p>
First of all, before running the calculation, take some time to understand what is done in
~abinit/tests/tutorial/Input/tbs_4.in.
<p>
The structure of the input file is very similar to the one of tbs_3.in,
the main difference is in the first section:
<pre>
ndtset 4
ecuteps: 1 ecuteps+ 2
bs_coulomb_term 11
</pre>
that istructs the code to execute four calculations where the direct term is constructed
using different value of <a href="../input_variables/vargw.html#ecuteps" target="kwimg">ecuteps</a>.
We also relax the diagonal-only approximation for the screening by setting
<a href="../input_variables/vargw.html#bs_coulomb_term" target="kwimg">bs_coulomb_term</a>=11
so that the non-locality of W(r,r') is correctly taken into account.
<p>
It is important to stress that it is not necessary to recalculate the SCR file from scratch just to modify
the value of <a href="../input_variables/vargw.html#ecuteps" target="kwimg">ecuteps</a> used in the BS run.
The SCR file calculated in the <a href="lesson_bse.html#kss_scr">preparatory step</a>
contains G-vectors whose energy extends up to ecuteps=6.0 Ha.
This is the MAXIMUM cutoff energy that can be used in our convergence tests.
If the value of <a href="../input_variables/vargw.html#ecuteps" target="kwimg">ecuteps</a> specified in the input file
is smaller than the one stored on disk, the code will read a sublock of the initial matrix. A WARNING message is issued
if the value specified in the input file is larger than the one available in the SCR file.
<p>
Now we can finally run the calculation.
As usual,
we have to copy ~abinit/tests/tutorial/Input/tbs_4.files in the working directory Work_bs,
then we have to create a bunch of symbolic links for the input KSS and SCR files:
<pre>
ln -s 444_SCR tbs_4i_DS1_SCR
ln -s 444_SCR tbs_4i_DS2_SCR
ln -s 444_SCR tbs_4i_DS3_SCR
ln -s 444_SCR tbs_4i_DS4_SCR
ln -s 444_shifted_KSS tbs_4i_DS1_KSS
ln -s 444_shifted_KSS tbs_4i_DS2_KSS
ln -s 444_shifted_KSS tbs_4i_DS3_KSS
ln -s 444_shifted_KSS tbs_4i_DS4_KSS
</pre>
Now issue
<pre>
abinit < tbs_4.files >& tbs4.log &
</pre>
to execute the test (it should take around 2 minutes).
<p>
Once the calculation is completed, plot the spectra obtained with different
<a href="../input_variables/vargw.html#ecuteps" target="kwimg">ecuteps</a> using
<pre>
$ gnuplot
gnuplot> p "tbs_4o_DS1_EXC_MDF" u 1:3 w l
gnuplot> p "tbs_4o_DS2_EXC_MDF" u 1:3 w l
gnuplot> p "tbs_4o_DS3_EXC_MDF" u 1:3 w l
gnuplot> p "tbs_4o_DS4_EXC_MDF" u 1:3 w l
</pre>
<p align="center"><img src=./lesson_bse/tbs4.png></p>
The spectrum is found to converge quickly in <a href="../input_variables/vargw.html#ecuteps" target="kwimg">ecuteps</a>.
The curves obtained with <a href="../input_variables/vargw.html#ecuteps" target="kwimg">ecuteps</a>=3 and 4 Ha
are almost indistinguishable from each other.
Our final estimate for <a href="../input_variables/vargw.html#ecuteps" target="kwimg">ecuteps</a> is therefore 3 Ha.
<p>
Note that this value is smaller than the one required to converge the QP corrections within 0.01 eV
(in the <a href="lesson_gw1.html">first lesson</a> of the GW tutorial we obtained 6.0 Ha).
This is a general behavior, in the sense that Bethe-Salpeter spectra, unlike GW corrections,
are not usually very sensitive to truncations in the planewave expansion of W.
Reasonable BS spectra are obtained even when W is treated within the diagonal approximation or, alternatively,
with model dielectric functions.
<p>
Note also how the two peaks are affected in a different way by the change of <a href="../input_variables/vargw.html#ecuteps" target="kwimg">ecuteps</a>,
with the first peak affected the most.
This behavior is consistent with our affirmation that the first peak of silicon has a strong excitonic character.
<hr>
<h3><p><b><a name="conv_on_k">6.</a>
Convergence with respect to the number of k-points
</b></h3>
The last parameter that should be checked for convergence is the number of k-points.
This convergence study represents the most tedious and difficult part since it requires the generation of new KSS files
and of the new SCR file for each k-mesh (the list of k-points for the wavefunctions and the set of q-points
in the screening must be consistent with each other).
<p>
The file ~abinit/tests/tutorial/Input/tbs_5.in gathers the different steps of a standard BS calculation
(generation of two KSS file, screening calculation, BS run) into a single input.
The calculation is done with the converged parameters found in the previous studies,
only <a href="../input_variables/varbas.html#ngkpt" target="kwimg">ngkpt</a>
has been intentionally left undefined.
<!--
<a href="../input_variables/vargw.html#bs_loband" target="kwimg">bs_loband</a>,
<a href="../input_variables/varbas.html#nband" target="kwimg">nband</a>,
<a href="../input_variables/vargw.html#ecuteps" target="kwimg">ecuteps</a>,
-->
<p>
Use tbs_5.in as a template for performing BS calculations with different k-meshes.
For example, you might try to compare the three meshes 4x4x4, 5x5x5, and 6x6x6.
To facilitate the analysis of the results, we suggest to run the calculations
in different directories so that we can keep the output results separated.
<p>
Be aware that both the CPU time as well as the memory requirements increase quickly with
the number of divisions in the mesh.
These are, for example, the CPU times required by different k-meshes on Intel Xeon X5570:
<p>
<pre>
4x4x4: +Overall time at end (sec) : cpu= 112.4 wall= 112.4
5x5x5: +Overall time at end (sec) : cpu= 362.8 wall= 362.8
6x6x6: +Overall time at end (sec) : cpu= 914.8 wall= 914.8
8x8x8: +Overall time at end (sec) : cpu= 5813.3 wall= 5813.3
10x10x10: +Overall time at end (sec) : cpu= 20907.1 wall= 20907.1
12x12x12: +Overall time at end (sec) : cpu= 62738.2 wall= 62738.2
</pre>
6x6x6 is likely the most dense sampling you can afford on a single-CPU machine.
For you convenience, we have collected the results of the convergence test in the figure below.
<p align="center"><img src=./lesson_bse/tbs5.png></p>
<p>
As anticipated, the spectrum converges slowly with the number of k-points
and our first calculation done with the 4x4x4 grid is severely unconverged.
The most accurate results are obtained with the 12x12x12 k-mesh, but even this
sampling leads to converged results only for frequencies below 4.5 eV.
This is a problem common to all BS computations, in the sense that it is extremely
difficult to achieve global converge in the spectra.
This analysis shows that we can trust the 12x12x12 results in the [0:4,5] eV range while
the correct description of the spectrum at higher energies would require the
inclusion of more k-point and, possibly, more bands so that the band dispersion is correctly taken
into account (even the RPA spectrum does not coverge at high frequencies when 12x12x12 is used).
<p>
It should be stressed that <a href="../input_variables/vargw.html#zcut" target="kwimg">zcut</a>
plays a very important role in these converge tests.
For example, the results obtained with the 8x8x8 or the 10x10x10 k-mesh
can be brought closer to the 12x12x12 by just increasing the Lorentzian broadening.
When comparing theory with experiments, it is common to treat
<a href="../input_variables/vargw.html#zcut" target="kwimg">zcut</a>
as an <i>a posteriori</i> parameter chosen to produce the best agreement with the experiment.
<!--
<p>
The results are in agreement with similar ab-initio calculations reported in the literature.
The agreement with experiments is good but not perfect.
When comparing theoretical results with experiment, however, one should always take into
account that there are many approximations involved in the standard Bethe-Salpeter approach
frequency dependence in the screening, temperature effects, electron-phonon interaction ...
-->
<hr>
<h3><p><b><a name="additional_exercises">6.</a>
Additional exercises (optional)
</b></h3>
<ul>
<li>
<!--
<p>
In crystalline systems, the Tamm-Dancoff approximation usually gives imaginary parts that are almost indistinguishable from
the results obtained by solving the problem for the full Hamiltonian (see also ???)
On the other hand the correct description of excitonic effects in isolated systems might require the
correct inclusion of the coupling terms due to the localization of the electronic wavefunctions.
Another case in which one should test the reliability of the TDA is represented by the calculation
of the real part of the macroscopic dielectric function (EELS).
-->
<p>
Use <a href="../input_variables/vargw.html#bs_coupling" target="kwimg">bs_coupling</a>=1 to
perform an excitonic calculation for silicon including the coupling term.
Compare the imaginary part of the macroscopic dielectric function obtained with and
without coupling. Do you find significant differences?
(Caveat: calculations with coupling cannot use the Haydock method and are much more CPU demanding.
You might have to decrease some input parameters to have results in reasonable time.)
<li>
<p>
Calculate the one-shot GW corrections for silicon following the <a href="lesson_gw1.html">first lesson</a>
of the GW tutorial. Then use the _GW file produced by the code to calculate the absorption spectrum.
</ul>
<hr>
<p><a name="BS_MPI"></a><br>
<h3><b> 7. Notes on the MPI implementation.</b></h3>
<p>
In this section, we discuss the approach used to parallelize the two steps of the BS run,
<i>i.e.</i> the construction of the H matrix and the evaluation of the macroscopic dielectric function.
<p>
First of all, it is important to stress that, unlike the GW code, the BS routines do not employ any kind
of memory distribution for the wavefunctions. The entire set of orbitals used to construct the transition space
is stored on each node
This choiche has been dictated by the fact that the size of H is usually much larger
than the array used to store the wavefunction, hence it is much more important to distribute the
matrix than the wavefunctions.
Besides, having all the states on each node simplifies the calculation of several intermediate
quantities needed at run-time.
<p>
The memory allocated for the wavefunctions and the screening thus will not scale with the number of processors.
However, for very memory demanding calculations, the real space orbitals
can be calculated on the fly with an increase in computational time instead.
This option is controlled by the second digit of the input variable
<a href="../input_variables/vargw.html#gwmem" target="kwimg">gwmem</a>.
<p>
When discussing the MPI parallelization of the Bethe-Salpeter routines, we have to consider the two steps separately.
<p>
In the first step, the upper triangle of the resonant (coupling) block is distributed among the nodes.
Each CPU computes its own portion and stores the results in a temporary array.
At the end of the computation, the portions of the upper triangle are communicated to
the master node which writes the binary file BSR (BSC).
<p>
In the second step, each node reads the data stored in the external files in order to build the excitonic Hamiltonian.
<!--
<p>
The distribution of the matrix now depends on the algorithm specified by
<a href="../input_variables/vargw.html#bs_algorithm" target="kwimg">bs_algorithm</a>.
The direct diagonalization supports the ScalaPack interface, but we prefer not to discuss this rather technical point
in this tutorial.
-->
The matrix is distributed using a column-block partitioning, so that
the matrix-vector multiplications required in the Haydock iterative scheme can be easily performed in parallel
(see the schematic representation reported below).
A similar distribution scheme is also employed for the conjugate-gradient minimization.
For a balanced distribution of computational work, the number of processors should divide the total number of resonant transitions.
<p align="center"><img src=./lesson_paral_mbt/MPI_mv.png></p>
<!--
<hr>
<h3><p><b><a name="references">References</a>
</b></h3>
<ol>
<li><a name="onida_rmp">G. Onida, L. Reining, and A. Rubio, Rev. Mod. Phys. <b>48</b>, 601 (2002)
and references therein</a> </li>
<li><a name="albrecth">S. Albrecht, Ph.D. thesis, Ecole Polytechnique, Palaiseau (Paris), 1999</a> </li>
<li><a name="haydock2">R. Haydock, Comput. Phys. Comm. <b>20</b>, 11 (1980)</a> </li>
<li><a name="gruning">M. Gruning, A. Marini and X. Gonze, Nano Letters <b>9</b>, 2820 (2009) </li>
<li><a name="baroni1986">S. Baroni and R. Resta, Phys. Rev. B <b>33</b>, 7017 (1986) </li>
<li><a name="benedict">L. X. Benedict and E. L. Shirley, Phys. Rev. B <b>59</b>, 5441 (1999)</a> </li>
<li><a name="adler">S. L. Adler, Phys. Rev. <b>126</b>, 413 (1962)</a> </li>
<li><a name="wiser">N. Wiser, Phys. Rev. <b>129</b>, 72 (1963)</a> </li>
</ol>
-->
<script type="text/javascript" src="list_internal_links.js"> </script>
</body>
</html>
|