File: lesson_bse.html

package info (click to toggle)
abinit 7.8.2-2
  • links: PTS, VCS
  • area: main
  • in suites: jessie, jessie-kfreebsd
  • size: 278,292 kB
  • ctags: 19,095
  • sloc: f90: 463,759; python: 50,419; xml: 32,095; perl: 6,968; sh: 6,209; ansic: 4,705; fortran: 951; objc: 323; makefile: 43; csh: 42; pascal: 31
file content (993 lines) | stat: -rw-r--r-- 48,868 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
<html>
<head>
<title>BSE tutorial</title>
</head>
<body bgcolor="#ffffff">
<hr>
<h1>ABINIT, tutorial on Bethe-Salpeter calculations :</h1>
<h2>absorption spectra including excitonic effects. </h2>
<hr>
<p>
This lesson discusses how to calculate the macroscopic dielectric function including excitonic effects 
within the Bethe-Salpeter (BS) approach.
Crystalline silicon is used as test case.
A brief description of the formalism can be found in the <a href="theory_bse.html">BS_notes</a>.
<p>
The user should be familiarized with the four basic lessons of ABINIT
and the first lesson of the GW tutorial, see the <a href="welcome.html">tutorial home page</a>.

<p>This lesson should take about one hour to be completed.

<h5>Copyright (C) 2006-2014 ABINIT group (MG,MS)
<br> This file is distributed under the terms of the GNU General Public License, see
~abinit/COPYING or <a href="http://www.gnu.org/copyleft/gpl.txt">
http://www.gnu.org/copyleft/gpl.txt </a>.
<br> For the initials of contributors, see ~abinit/doc/developers/contributors.txt .
</h5>

<script type="text/javascript" src="list_internal_links.js"> </script>

<h3><b>Content of the lesson</b></h3>
<p>
<ul>
  <li><a href="#kss_scr"              >1.</a> Preparatory steps (generating the KSS and the SCR file)
  <li><a href="#first_bsrun"          >2.</a> Computing the absorption spectrum with different approximations
  <li><a href="#conv_on_bands"        >3.</a> Convergence with respect to the number of bands in the transition space
  <li><a href="#conv_on_ecuteps"      >4.</a> Convergence with respect to the number of planewaves in the screening
  <li><a href="#conv_on_k"            >5.</a> Convergence with respect to the number of k-points
  <li><a href="#additional_exercises" >6.</a> Additional exercises (optional)
  <li><a href="#BS_MPI"               >7.</a> Notes on the MPI implementation
</ul>

<p>
<p><a name="kss_scr"></a><br>
<h3><b> 1. Preparatory steps (generating the KSS and the SCR file).</b></h3>

<p><i>Before beginning, you might consider to work in a different subdirectory
as for the other lessons. Why not "Work_bs" ?</i>

<p>
In the directory ~abinit/tests/tutorial/Input/Work_bs, copy the files file ~abinit/tests/tutorial/Input/tbs_1.files.
Now run immediately the calculation with the command:

<pre>
abinit < tbs_1.files >& tbs_1.log &
</pre>

so that we can analyze the input file while the code is running.
<p>
The input file is located in ~abinit/tests/tutorial/Input/tbs_1.in.
The header reports a brief description of the calculation so read it carefully.
Don't worry if some parts are not clear to you as we are going to discuss the calculation in step by step fashion. 
<p>
This input file generates the two KSS files and the screening file needed for the subsequent Bethe-Salpeter computations.
The first dataset performs a rather standard ground-state calculation 
on an unshifted 4x4x4 grid (64 k points in the full Brillouin Zone, folding to 8 k points in the irreducible wedge).
Then the ground-state density is used in dataset 2 and 3 
to generate two KSS files with a standard NSCF cycle, solved 
with the conjugate-gradient method (<a href="../input_variables/varfil.html#kssform" target="kwimg">kssform</a>=3).
<p>
<!--
with <a href="../input_variables/varbas.html#tolvrs" target="kwimg">tolvrs</a>=1.0d-8 as stopping 
We use <a href="../input_variables/varbas.html#tolvrs" target="kwimg">tolvrs</a> as stopping 
criterion since we want to obtain an accurate potential for our band structure calculation.
The density file is then used in dataset 2 and 3 to produce two different KSS files that
only differ for the k-point mesh and the number of bands stored on file.
-->
Note that the KSS file computed in dataset 2 contains 100 bands on the 4x4x4 gamma-centered k-mesh whereas 
the KSS file produced in dataset 3 has only 10 bands on a 4x4x4 k-mesh that has been shifted
along the direction
<pre>
shiftk3    0.11 0.21 0.31  # This shift breaks the symmetry of the k-mesh.
</pre>

The gamma-centered k-mesh contains 8 points in the irreducible zone 
while the shifted k-mesh breaks the symmetry of the crystal leading to 64 points in the IBZ 
(actually the IBZ now coincides with the full Brillouin zone).
The second mesh is clearly inefficient, so you might wonder why we are using such a bizarre sampling 
and, besides, why we need to generate two different KSS files!
<p>
Indeed this approach strongly differs from the one we followed in the GW tutorials, but there is 
a good reason for doing so.
It is anticipated that optical spectra converge slowly with the Brillouin zone sampling,
and that symmetry-breaking k-meshes lead to faster convergence in 
<a href="../input_variables/varbas.html#nkpt" target="kwimg">nkpt</a>
than the standard symmetric k-meshes commonly used for ground-state or GW calculations.
<p>
This explains the bizarre shift but still why two KSS files?
Why don't we simply use the KSS file on the shifted k-mesh to compute the screening?
<p>
The reason is that a screening calculation done with many empty bands on the shifted k-mesh 
would be very memory demanding as the code should allocate a huge portion of memory whose size scales with
(<a href="../input_variables/varbas.html#nband" target="kwimg">nband</a> *
<a href="../input_variables/varbas.html#nkpt" target="kwimg">nkpt</a>),
and no symmetry can be used to reduce the number of k-points.
<p>

To summarize:
the KSS with the symmetric k-point sampling and 100 bands will be used to compute the screening, while the 
KSS file with the shifted k-mesh and 10 bands will be used to construct the transition space employed for 
solving the Bethe-Salpeter equation.
The two k-meshes differ just for the shift thus they produce the same set of q-points (the list of
q-points in the screening is defined as all the possible differences between the k-points of the KSS file).
This means that, in the BS run, we can use the SCR file generated with the symmetric mesh even though 
the transition space is constructed with the shifted k-mesh.
<p>
After this lengthy discussion needed to clarify this rather technical point, 
we can finally proceed to analyze the screening computation performed in the last dataset of tbs_1.in.
<p>
The SCR file is calculated in dataset 4 using 
<a href="../input_variables/varbas.html#nband" target="kwimg">nband</a>=100
and
<a href="../input_variables/vargw.html#ecuteps" target="kwimg">ecuteps</a> 6.0 Ha.
In the <a href="lesson_gw1.html">first lesson</a> of the GW tutorial,
these values were found to give QP energies converged within 0.01 eV, 
so we are confident that our SCR file is well converged and it can be safely used for performing 
convergence tests in the Bethe-Salpeter part. 

<p>
Note that, for efficiency reasons, only the static limit of W is computed:
<pre>
nfreqre4  1     # Only the static limit of W is needed for standard BSE calculations.
nfreqim4  0
</pre>
Indeed, in the standard formulation of the Bethe-Salpeter equation, only the static limit
of the screened interaction is needed to construct the Coulomb term of the BS Hamiltonian.
Using a single frequency allows us to save some CPU time in the screening part, but keep in mind
that this SCR file can only be used either for Bethe-Salpeter computations or for 
GW calculations employing the plasmon-pole models corresponding to 
<a href="../input_variables/vargw.html#ppmodel" target="kwimg">ppmodel</a>=3,4.
<!--
<p>
Note also that the k point grid is quite rough, 
<a href="../input_variables/varbas.html#ngkpt" target="kwimg">ngkpt</a>=4 4 4).
Converged results would need a much denser sampling of the Brillouin zone. 
We postpone the discussion on the converge study in the number of k-point to the last section of this lesson.
-->
<p>
At this point the calculation should have completed, but there's still one thing that we have 
to do before moving to the next paragraph.
<p>
As we said, we will need the KSS file on the shifted k-mesh and the SCR file for our BS
calculations so do not delete them!
It is also a good idea to rename these precious files using more meaningful names e.g.:

<pre>
mv tbs_1o_DS2_KSS 444_gamma_KSS
mv tbs_1o_DS3_KSS 444_shifted_KSS
mv tbs_1o_DS4_SCR 444_SCR
</pre>

Keep in mind that henceforth the k-point sampling cannot be changed anymore:
the list of k-points specified in the BS input files MUST equal the one used to generate the KSS file.
Two new KSS files and a new SCR file must be generated from scrath if we want to change the k-point
sampling used to construct the transition space.

<hr>
<h3><p><b><a name="first_bsrun">2.</a>
Computing the absorption spectrum within the Tamm-Dancoff approximation.
</b></h3>

<p>
This section is intended to show how to perform a standard excitonic calculation within the Tamm-Dancoff approximation (TDA) 
using the Haydock iterative technique. 
The input file is ~abinit/tests/tutorial/Input/tbs_2.in.
<p>
Before running the job, we have to connect this calculation with the output results produced in tbs_1.in.
<p>
Use the Unix commands:

<pre>
ln -s 444_shifted_KSS tbs_2i_KSS
ln -s 444_SCR tbs_2i_SCR
</pre>

to create two symbolic links for the shifted KSS and the SCR file.
The reason for doing so will be clear afterwards once we discuss the input file.
<p>
This job lasts 1-2 minutes on a modern machine so it is worth running it before inspecting the input file. 
<p>
Copy the files file ~abinit/tests/tutorial/Input/tbs_2.files
in the working directory and issue
<pre>
abinit < tbs_2.files >& tbs_2.log &
</pre>
<p>
to put the job in background so that we can examine tbs_2.in.
<p>
Now open ~abinit/tests/tutorial/Input/tbs_2.in
in your preferred editor and go to the next section where we discuss 
the most important variables governing a typical BS computation.

<h4><a name="2a"></a></h4>
<h4><b> 2.a The structure of the input file.</b></h4>

<p>
First we need to set <a href="../input_variables/vargs.html#optdriver" target="kwimg">optdriver</a>=99
to call the BSE routines
<pre>
optdriver  99        # BS calculation
</pre>

The variables 
<a href="../input_variables/varfil.html#irdkss" target="kwimg">irdkss</a> and
<a href="../input_variables/varfil.html#irdscr" target="kwimg">irdscr</a>
are similar to other "ird" variables of ABINIT and are used to read the files produced in the
previous paragraph

<pre>
irdkss  1     # Read the KSS file produced in tbs_1 
irdscr  1     # Read the SCR file produced in tbs_1 
</pre>

The code expects to find an input KSS file and an input SCR file whose name is constructed 
according to prefix specified in the files file tbs_2.files
(see <a href="../users/abinit_help.html#intro1" target="helpsimg">section 1.1</a> of the abinit_help file).
This is the reason why we had to create the two symbolic links before running the code.

<p>
Then we have a list of five variables specifying how to construct the excitonic Hamiltonian.
<pre>
bs_calctype       1       # L0 is constructed with KS orbitals and energies.
soenergy          0.8 eV  # Scissors operator used to correct the KS band structure.
bs_exchange_term  1       # Exchange term included.
bs_coulomb_term  11       # Coulomb term included using the full matrix W_GG'
bs_coupling       0       # Tamm-Dancoff approximation.
</pre>

The value <a href="../input_variables/vargw.html#bs_calctype" target="kwimg">bs_calctype</a>=1 specifies that
the independent-particle polarizability should be costructed with the Kohn-Sham orbitals and energies read from the KSS file.
To simulate the self-energy correction, the KS energies are corrected with a scissors operator 
of energy <a href="../input_variables/vargw.html#soenergy" target="kwimg">soenergy</a>= 0.8 eV.
This permits us to avoid a cumbersome GW calculation for each state 
included in our transition space. The use of the scissors operator is a reasonable approximation for silicon but 
it might fail in more complicated systems in which the GW corrections cannot be simulated in terms of a simple 
rigid shift of the initial KS bands structure.
<p>
The remaining three variables specify how to construct the excitonic Hamiltonian.
<a href="../input_variables/vargw.html#bs_exchange_term" target="kwimg">bs_exchange_term</a>=1 tells
the program to calculate the exchage part of the kernel, hence this calculation includes local-field effects.
The variable <a href="../input_variables/vargw.html#bs_coulomb_term" target="kwimg">bs_coulomb_term</a> 
is used to select among different options that are available for the Coulomb term
(please take some time to read the description of the variable and the relevant equations in 
the <a href="theory_bse.html">BS_notes</a>).
Finally <a href="../input_variables/vargw.html#bs_coupling" target="kwimg">bs_coupling</a>=0
specifies that the off-diagonal coupling blocks should be neglected (Tamm-Dancoff approximation).
This particular combination of parameters thus corresponds to a Bethe-Salpeter calculation within 
the Tamm-Dancoff approximation with local field effects included.
<p>
Then we have the specification of the bands used to construct the transition space:
<pre>
bs_loband         2 
nband             8
</pre>

In this case all the bands around the gap whose index is between 2 and 8 are included in the basis set.

<p>
The frequency mesh for the the macroscopic dielectric function is specified through
<a href="../input_variables/vargw.html#bs_freq_mesh" target="kwimg">bs_freq_mesh</a>
<pre>
bs_freq_mesh 0 6 0.02 eV  # Frequency mesh.
</pre>

This triplet of real values defines a linear mesh that covers the range [0,6] eV with a step of 0.02 eV.
The number of frequency points in the mesh does not have any significant effect on the CPU time,
but it is important to stress that the number of bands included in the transition space defines, in
conjunction with the number of k-points, the frequency range that can be described.
As a consequence
<a href="../input_variables/vargw.html#bs_loband" target="kwimg">bs_loband</a> and
<a href="../input_variables/varbas.html#nband" target="kwimg">nband</a>
should be subject to an accurate converge study.
<p>
Then we have the parameters that define and control the algorithm employed to calculate the macroscopic dielectric function

<pre>
bs_algorithm        2      # Haydock method.
bs_haydock_niter   100     # Max number of iterations for the Haydock method.
bs_haydock_tol     0.05    # Tolerance for the iterative method.
zcut               0.15 eV # Complex shift to avoid divergences in the continued fraction.
</pre>

<a href="../input_variables/vargw.html#bs_algorithm" target="kwimg">bs_algorithm</a> specifies 
the algorithm used to calculate the macroscopic dielectric function. 
In this case we use the iterative Haydock technique whose maximum number of
iterations is given by <a href="../input_variables/vargw.html#bs_haydock_niter" target="kwimg">bs_haydock_niter</a>.
The iterative algorithm stops when the difference between two consecutive evaluations of the optical 
spectra is less than <a href="../input_variables/vargw.html#bs_haydock_tol">bs_haydock_tol</a>.
The input variable <a href="../input_variables/vargw.html#zcut" target="kwimg">zcut</a> gives the complex shift to avoid divergences 
in the continued fraction. 
From a physical point of view, this parameters mimics the experimental broadening of the absorption peaks.
In this test, due to the coarseness of the k-mesh, we have to use a value slightly larger than the default one (0.1 eV) 
in order to facilitate the convergence of the Haydock algorithm.
Ideally, one should make a convergence study decreasing 
the value of <a href="../input_variables/vargw.html#zcut" target="kwimg">zcut</a> for increasing number of k-points.
<p>

The k-point sampling is specified by the set of variables.

<pre>
# Definition of the k-point grid
kptopt 1                  # Option for the automatic generation of k points,
ngkpt  4 4 4              # This mesh is too coarse for optical properties.
nshiftk 1
shiftk    0.11 0.21 0.31  # This shift breaks the symmetry of the k-mesh.
chksymbreak 0             # Mandatory for using symmetry-breaking k-meshes in the BS code.
</pre>

The values of  
<a href="../input_variables/varbas.html#kptopt" target="kwimg">kptopt</a>,
<a href="../input_variables/varbas.html#ngkpt" target="kwimg">ngkpt</a>,
<a href="../input_variables/varbas.html#nshiftk" target="kwimg">nshiftk</a>, 
and 
<a href="../input_variables/varbas.html#shiftk" target="kwimg">shiftk</a>
MUST equal the ones used to specify the grid for the KSS file.
<a href="../input_variables/vargs.html#chksymbreak" target="kwimg">chksymbreak</a>=0
is used to bypass the check on symmetry breaking that, otherwise, would make the code stop.
<p>
<!--
TODO: add a check in the code
Do not use <a href="../input_variables/vargs.html#chksymbreak" target="kwimg">chksymbreak</a>=0
in the Bethe-Salpeter run if the mesh preserves the symmetry
-->

<p>
The last section of the input file

<pre>
ecutwfn 8.0               # Cutoff for the wavefunction.
ecuteps 2.0               # Cutoff for W and /bare v used to calculate the BS matrix elements.
inclvkb 2                 # The Commutator for the optical limit is correctly evaluated.
</pre>

specifies the parameters used to calculate the kernel matrix elements and the matrix elements of the dipole operator.
We have already encoutered these variables in the <a href="lesson_gw1.html">first lesson</a> of the
GW tutorial so their meaning is (hopefully) familiar to you.
A more detailed discussion of the role played by these variables in the BS code can be found in the <a href="theory_bse.html">BS_notes</a>.

<h4><a name="2c"></a></h4>
<h4><b> 2.c Output files.</b></h4>

<p> 
The output file, tbs_2.out, reports the basic parameters of the calculation 
and eventual WARNINGs that are issued if the iterative method does not converge.
Please take some time to understand its structure. 
<p>
Could you answer the the following questions?
<ol>
  <li>How many transitions are included in the basis set?
  <li>How many directions are used to evaluate the optical limit?
  <li>What is the value of the Lorentzian broadening used in the continued fraction?
</ol>

<p>
After this digression on the main output file, we can finally proceed to analyse the output data of the computation.
<p>
The most important results are stored in five different files: 
<ul>
  <li>tbs_2o_BSR 
  <li>tbs_2o_HAYDR_SAVE
  <li>tbs_2o_RPA_NLF_MDF
  <li>tbs_2o_GW_NLF_MDF 
  <li>tbs_2o_EXC_MDF
</ul>

In what follows, we provide a brief description of the format and of the content of each output file.

<p>
<ul><li>
tbs_2o_BSR:
</li></ul>
<p>
This binary file stores the upper triangle of the resonant block (the matrix is Hermitian
hence only the non-redundant part is computed and saved on file).
The BSR file can be used to restart the run from a previous computation using the variables
<a href="../input_variables/varfil.html#getbsreso" target="kwimg">getbsreso</a> 
or <a href="../input_variables/varfil.html#irdbsreso" target="kwimg">irdbsreso</a>.
This restart capability is useful for restarting the Haydock method if convergence was not achieved 
or to execute Haydock computations with different values of 
<a href="../input_variables/vargw.html#zcut" target="kwimg">zcut</a>.
<a href="../input_variables/varfil.html#getbsreso" target="kwimg">getbsreso</a>
and <a href="../input_variables/varfil.html#irdbsreso" target="kwimg">irdbsreso</a>
are also handy if one wants to include the coupling on top of a pre-existing TDA calculation
since the code uses two different files to store the resonant and the coupling block (BSC is the
prefix used for the files storing the coupling term). 
<p>
<ul><li>
tbs_2o_HAYDR_SAVE:
</li></ul>
<p>
It is a binary file containing the results of the Haydock method: the coefficient of the tridiagonal
matrix and the three vectors employed in the iterative algorithm.
It is usually used to restart the algorithm if convergence has not been achieved
(see the related input variables
<a href="../input_variables/varfil.html#gethaydock">gethaydock</a> and <a href="../input_variables/varfil.html#irdhaydock">irdhaydock</a>).
<p> 
<ul><li>
tbs_2o_RPA_NLF_MDF and tbs_2o_GW_NLF_MDF 
</li></ul>
<p>
The RPA spectrum without local field effects obtained with KS energies and the GW energies, respectively 
(mnemonics: NLF stands for No Local Field, while MDF stands for Macroscopic Dielectric Function). 
<p>
<ul><li>
tbs_2o_EXC_MDF:
</li></ul>
<p>
Formatted file reporting the macroscopic dielectric function with excitonic effects.
Since this file contains the most important results of our calculation it is worth spending some time to
discuss its format.
<p>
First we have a header reporting the basic parameters of the calculation
<pre>
# Macroscopic dielectric function obtained with the BS equation.
#  RPA L0 with KS energies and KS wavefunctions     LOCAL FIELD EFFECTS INCLUDED
# RESONANT-ONLY calculation
# Coulomb term constructed with full W(G1,G2)
# Scissor operator energy =  0.8000 [eV]
# Tolerance =  0.0500
# npweps  = 27
# npwwfn  = 283
# nbands  = 8
# loband  = 2
# nkibz   = 64
# nkbz    = 64
# Lorentzian broadening =  0.1500 [eV]
</pre>

then the list of q-points giving the direction of the incident photon:
<pre>
#  List of q-points for the optical limit:
# q =  0.938821, 0.000000, 0.000000, [Reduced coords] 
# q =  0.000000, 0.938821, 0.000000, [Reduced coords] 
# q =  0.000000, 0.000000, 0.938821, [Reduced coords] 
# q =  0.000000, 0.813043, 0.813043, [Reduced coords] 
# q =  0.813043, 0.000000, 0.813043, [Reduced coords] 
# q =  0.813043, 0.813043, 0.000000, [Reduced coords] 
</pre>

By default the code calculates the macroscopic dielectric function considering six different 
directions in q-space (the three basis vectors of the reciprocal lattice and the three Cartesian
axis). It is possible to specify custom directions using the input variables
<a href="../input_variables/vargw.html#gw_nqlwl" target="kwimg">gw_nqlwl</a> and 
<a href="../input_variables/vargw.html#gw_qlwl" target="kwimg">gw_qlwl</a>. 
<p>
Then comes the section with the real and the imaginary part of the macroscopic dielectric as a
function of frequency for the different directions:
<pre>
# omega [eV]    RE(eps(q=1)) IM(eps(q=1) RE(eps(q=2) ) ... 
0.000  1.8026E+01  0.0000E+00  1.7992E+01  0.0000E+00  1.4292E+01  0.0000E+00  1.3993E+01 0.0000E+00  1.7117E+01  0.0000E+00  1.7080E+01  0.0000E+00
  .... .... ...
</pre>
 
You can visualize the data using your preferred software. For instance,
<pre>
$ gnuplot
gnuplot&#62  p "tbs_2o_EXC_MDF" u 1:3 w l
</pre>
will plot the imaginary part of the macroscopic dielectric function (the absorption spectrum) for the first q-point.
You should obtain a graphic similar to the one reported below

<p align="center"><img src=./lesson_bse/tbs2_1.png></p>

<p>
Please note that these results are not converged, we postpone the discussion about convergence tests
to the next paragraphs of this tutorial. 
<!--
For the moment, we are mainly interested in discussing the physi         meaning of our results.
-->
<p>
The most important feature of the spectrum is the presence of two peaks located at around 3.4 and 4.3 eV.
To understand the nature of these peaks and the role played by the BS kernel, it is useful to compare 
the excitonic spectra with the RPA results obtained without local field effects.
<p>
Use the sequence of gnuplot command

<pre>
$ gnuplot
gnuplot&#62  p   "tbs_2o_EXC_MDF"     u 1:3 w l
gnuplot&#62  rep "tbs_2o_RPA_NLF_MDF" u 1:3 w l
gnuplot&#62  rep "tbs_2o_GW_NLF_MDF"  u 1:3 w l
</pre>

to plot the absorption spectrum obtained with the three different approaches.
The final result is reported in the figure below. 

<p align="center"><img src=./lesson_bse/tbs2_2.png></p>

<p>
The RPA-KS spectrum underestimates the experimental optical threshold due to the well know band gap problem of DFT.
Most importantly, the amplitude of the first peak is underestimated, a problem than 
is not solved when local-field effects are correctly included in the calculation.
<p>
The RPA-GW results with QP corrections simulated with <a href="../input_variables/vargw.html#soenergy" target="kwimg">soenergy</a>
does not show any significant improvement over RPA-KS: 
the RPA-GW spectrum is just shifted towards higher frequencies due to opening of the gap,
but the shape of the two spectra is very similar, in particular the amplitude of the first peak is
still underestimated.
<p>
On the contrary, the inclusion of the BS kernel leads to important changes both in the optical threshold
as well as in the amplitude of the first peak. 
This simple analysis tells us that the first peak in the absorption spectrum of silicon has a strong 
excitonic character that is not correctly described within the RPA.
Our first BS spectrum is not converged at all and it barely resembles the experimental result,
nevertheless this unconverged calculation is already able to capture the most important physics.

<h4><a name="2c"></a></h4>
<h4><b> 2.c Optional Exercises.</b></h4>
<ul>
  <li>
  <p>
  Change the value of the Lorentzian broadening <a href="../input_variables/vargw.html#zcut" target="kwimg">zcut</a>
  used to avoid divergences in the continued fraction. 
  Then restart the Haydock algorithm from the _BSR and _HAYDR_SAVE files using the appropriate variables.
  What is the main effect of the broadening on the final spectrum. Does the number of iterations
  needed to converge depend on the broadening? 
  <li>
  <p>
  Use the appropriate values for <a href="../input_variables/vargw.html#bs_exchange_term" target="kwimg">bs_exchange_term</a> and
  <a href="../input_variables/vargw.html#bs_coulomb_term" target="kwimg">bs_coulomb_term</a> 
  to calculate the BS spectrum without local field effects.
  Compare the results obtained with and without local field effects.
  <li>
  <p>
  Modify the input file tbs_2.in so that the code reads in the resonant block produced in the previous run 
  and calculates the spectrum employing the method based on the direct diagonalization
  (use <a href="../input_variables/varfil.html#irdbsreso" target="kwimg">irdbsreso</a> to
  restart the run but remember to rename the file with the resonant block). Compare the CPU time
  needed by the two algorithms as a function of the number of transitions in the transition space.
  Which one has the best scaling?
</ul>

<h4><p><b><a name="discussion_on_conv">2.d</a>
Preliminary discussion about convergence studies
</b></h4>
<p>

Converging the excitonic spectrum requires a careful analysis of many different parameters:

<ul>
<li> <a href="../input_variables/vargw.html#bs_loband" target="kwimg">bs_loband</a>
<li> <a href="../input_variables/varbas.html#nband" target="kwimg">nband</a>
<li> <a href="../input_variables/vargw.html#ecutwfn" target="kwimg">ecutwfn</a>
<li> <a href="../input_variables/vargw.html#ecuteps" target="kwimg">ecuteps</a>
<li> <a href="../input_variables/varbas.html#ngkpt" target="kwimg">ngkpt</a>
<li> <a href="../input_variables/varbas.html#nshiftk" target="kwimg">nshiftk</a>
<li> <a href="../input_variables/varbas.html#shiftk" target="kwimg">shiftk</a>
</ul>

Since the memory requirements scale quadratically with the number of k-points in the <i><b>full</b></i> Brillouin zone
<i><b>times</b></i> the number of valence bands <i><b>times</b></i> the number of conduction bands included in the transition space, 
it is very important to find a good compromise between accuracy and computational efficiency.
<p>
First of all one should select the frequency range of interest since this choice has an important
effect on the number of valence and conduction states that have to be included in the transition space.
The optical spectrum is expected to converge faster in the number of bands than the GW corrections since 
only those transitions whose energy is "close" to the frequency range under investigation are
expected to contribute.
<p>
<a href="../input_variables/vargw.html#ecutwfn" target="kwimg">ecutwfn</a> usually plays a secondary
role since it only affects the accuracy of the oscillator matrix elements.
We suggest avoiding any truncation of the initial basis set by setting 
<a href="../input_variables/vargw.html#ecutwfn" target="kwimg">ecutwfn</a> to a value slightly
larger than the value of  <a href="../input_variables/varbas.html#ecut" target="kwimg">ecut</a> used
to generate the KSS file.
One should truncate the initial planewave basis set only when experiencing memory problems although this kind of
problems can be usually solved by just increasing the number of processors or, alternatively, 
with an appropriate choice of <a href="../input_variables/vargw.html#gwmem" target="kwimg">gwmem</a>.
<p>
The value of <a href="../input_variables/vargw.html#ecuteps" target="kwimg">ecuteps</a> affects the
accuracy of the matrix elements of the Coulomb term, the fundamental term that drives the creation
of the excitons. 
As a consequence  <a href="../input_variables/vargw.html#ecuteps" target="kwimg">ecuteps</a> should be subject 
to an accurate convergence test. 
As a rule of thumb, <a href="../input_variables/vargw.html#ecuteps" target="kwimg">ecuteps</a> can be
chosen equal or, sometimes, even smaller than the value needed to converge the GW corrections.
<p>
As already stated: optical spectra converge slowly with the Brillouin zone sampling.
The convergence in the number of k-points thus represents the most important and tedious part of our convergence study. 
For this reason, this study should be done once converged values for the other parameters have been already found.

<hr>
<h3><p><b><a name="conv_on_bands">3.</a>
Convergence with respect to the number of bands in the transition space
</b></h3>

In this section we take advantage of the multi-dataset capabilities of ABINIT to perform calculations 
with different values for 
<a href="../input_variables/vargw.html#bs_loband" target="kwimg">bs_loband</a>
and 
<a href="../input_variables/varbas.html#nband" target="kwimg">nband</a>

<p>
Before running the test take some time to read the input file ~abinit/tests/tutorial/Input/tbs_3.in.
<p>
The convergence in the number of transitions is performed by defining five datasets
with different values for <a href="../input_variables/varbas.html#nband" target="kwimg">nband</a>
and <a href="../input_variables/vargw.html#bs_loband" target="kwimg">bs_loband</a>

<pre>
ndtset     5
bs_loband1  4 nband1  5
bs_loband2  3 nband2  6
bs_loband3  2 nband3  7
bs_loband4  2 nband4  8
bs_loband5  1 nband5  8
</pre>

<!--
<p>
In our case we are only interested in the absorption spectrum for frequencies up to 8 eV  hence 
we use reasonable values for the k-points sampling and the other parameters and we monitor
the convergence of the spectrum with respect to the number of conduction and valence states used to
construct our basis set.
Also in this case we use the Haydock technique. 
-->
<p>
The parameters defining how to build the excitonic Hamiltonian are similar to the ones used in tbs_2.in. 
The only difference is in the value used for <a href="../input_variables/vargw.html#bs_coulomb_term" target="kwimg">bs_coulomb_term</a>,  i.e.

<pre>
bs_coulomb_term      10 # Coulomb term evaluated within the diagonal approximation.
</pre>

that allows us to save some CPU time during the computation of the Coulomb term.

<p>
Also in this case, before running the test, we have to connect tbs_3.in to the KSS and the SCR file
produced in the first step.
Note that tbs_3.in uses <a href="../input_variables/varfil.html#irdkss" target="kwimg">irdkss</a> and
<a href="../input_variables/varfil.html#irdscr" target="kwimg">irdscr</a> to read the external files,
hence we have to create symbolic links for each dataset:

<pre>
ln -s 444_SCR tbs_3i_DS1_SCR
ln -s 444_SCR tbs_3i_DS2_SCR
ln -s 444_SCR tbs_3i_DS3_SCR
ln -s 444_SCR tbs_3i_DS4_SCR
ln -s 444_SCR tbs_3i_DS5_SCR
ln -s 444_shifted_KSS tbs_3i_DS1_KSS
ln -s 444_shifted_KSS tbs_3i_DS2_KSS
ln -s 444_shifted_KSS tbs_3i_DS3_KSS
ln -s 444_shifted_KSS tbs_3i_DS4_KSS
ln -s 444_shifted_KSS tbs_3i_DS5_KSS
</pre>

<p>
Now we can finally run the the test with

<pre>
abinit < tbs_3.files >& tbs3.log &
</pre>

This job should last 3-4 minutes so be patient!
<p>
Let us hope that your calculation has been completed, and that we can examine the output results.
<p>
Use the following sequence of gnuplot commands:

<pre>
$ gnuplot
gnuplot&#62 p   "tbs_3o_DS1_EXC_MDF" u 1:3 w l
gnuplot&#62 rep "tbs_3o_DS2_EXC_MDF" u 1:3 w l
gnuplot&#62 rep "tbs_3o_DS3_EXC_MDF" u 1:3 w l
gnuplot&#62 rep "tbs_3o_DS4_EXC_MDF" u 1:3 w l
gnuplot&#62 rep "tbs_3o_DS5_EXC_MDF" u 1:3 w l
</pre>

to plot on the same graphic the absorption spectrum obtained with different transition spaces.
You should obtain a graphic similar to this one:

<p align="center"><img src=./lesson_bse/tbs3.png></p>

<p>
The results obtained with (<a href="../input_variables/vargw.html#bs_loband" target="kwimg">bs_loband</a>=4,
<a href="../input_variables/varbas.html#nband" target="kwimg">nband</a>=5) are clearly unconverged 
as the basis set contains too few transitions that are not able to describe the frequency-dependence 
of the polarizability in the energy range under investigation. 
For a well converged spectrum, we have to include the three higher occupied states and the first four conduction bands 
(the blue curve corresponding to 
<a href="../input_variables/vargw.html#bs_loband" target="kwimg">bs_loband</a>=2, and
<a href="../input_variables/varbas.html#nband" target="kwimg">nband</a>=8).
<p>
Note that adding the first occupied band, curve (1-8), gives results that are almost on top of (2,8).
This is due to  the fact that, in silicon, the bottom of the first band 
is located at around 12 eV from the top of the conduction band therefore its inclusion does not 
lead to any significant improvement of the transition space in the frequency range [0,8] eV.
For completeness, we also report the results obtained in a separate calculation done with
<a href="../input_variables/vargw.html#bs_loband" target="kwimg">bs_loband</a>=2
<a href="../input_variables/varbas.html#nband" target="kwimg">nband</a>=9 to show that four empty
states are enough to converge the spectrum.
<p>
We therefore fix the number of bands for the transition space using the converged values
<a href="../input_variables/vargw.html#bs_loband" target="kwimg">bs_loband</a>=2,
<a href="../input_variables/varbas.html#nband" target="kwimg">nband</a>=8,
and we proceed to analyse the convergence of the spectrum with respect to the number of planewaves in the screening. 


<h4>For expert users:</h4>
<p>
The use of <a href="../input_variables/varfil.html#irdkss" target="kwimg">irdkss</a> and
<a href="../input_variables/varfil.html#irdscr" target="kwimg">irdscr</a> 
is not handy when we have several datasets that are reading the SAME external file 
as we are forced to use different names for the input of each dataset.
To work around this annoyance, one can introduce a fictious dataset (say dataset 99), 
and let the code use the output of this unexisting dataset as the input of the real datasets.
An example will help clarify:
Instead of using the lengthy list of links as done before, we might use the much simpler
sequence of commands

<pre>
ln -s 444_shifted_KSS tbs_3o_DS99_KSS
ln -s 444_SCR         tbs_3o_DS99_SCR
</pre>

provided that, in the input file, we replace 

<a href="../input_variables/varfil.html#irdkss" target="kwimg">irdkss</a> 
and
<a href="../input_variables/varfil.html#irdscr" target="kwimg">irdscr</a> with

<pre>
getkss  99              # Trick to read the same file tbs_o3_DS99_KSS in each dataset
getscr  99              # Same trick for the SCR file
</pre>
</div>

<hr>
<h3><p><b><a name="conv_on_ecuteps">5.</a>
Convergence with respect to the number of planewaves in the screening
</b></h3>
<p>
First of all, before running the calculation, take some time to understand what is done in 
~abinit/tests/tutorial/Input/tbs_4.in. 
<p>
The structure of the input file is very similar to the one of tbs_3.in, 
the main difference is in the first section:

<pre>
ndtset 4
ecuteps: 1 ecuteps+ 2 
bs_coulomb_term 11
</pre>

that istructs the code to execute four calculations where the direct term is constructed
using different value of <a href="../input_variables/vargw.html#ecuteps" target="kwimg">ecuteps</a>.
We also relax the diagonal-only approximation for the screening by setting 
<a href="../input_variables/vargw.html#bs_coulomb_term" target="kwimg">bs_coulomb_term</a>=11
so that the non-locality of W(r,r') is correctly taken into account.
<p>
It is important to stress that it is not necessary to recalculate the SCR file from scratch just to modify 
the value of  <a href="../input_variables/vargw.html#ecuteps" target="kwimg">ecuteps</a> used in the BS run.
The SCR file calculated in the <a href="lesson_bse.html#kss_scr">preparatory step</a> 
contains G-vectors whose energy extends up to ecuteps=6.0 Ha.
This is the MAXIMUM cutoff energy that can be used in our convergence tests.
If the value of <a href="../input_variables/vargw.html#ecuteps" target="kwimg">ecuteps</a>  specified in the input file 
is smaller than the one stored on disk, the code will read a sublock of the initial matrix. A WARNING message is issued 
if the value specified in the input file is larger than the one available in the SCR file.
<p>
Now we can finally run the calculation.
As usual, 
we have to copy ~abinit/tests/tutorial/Input/tbs_4.files in the working directory Work_bs,
then we have to create a bunch of symbolic links for the input KSS and SCR files:

<pre>
ln -s 444_SCR tbs_4i_DS1_SCR
ln -s 444_SCR tbs_4i_DS2_SCR
ln -s 444_SCR tbs_4i_DS3_SCR
ln -s 444_SCR tbs_4i_DS4_SCR
ln -s 444_shifted_KSS tbs_4i_DS1_KSS
ln -s 444_shifted_KSS tbs_4i_DS2_KSS
ln -s 444_shifted_KSS tbs_4i_DS3_KSS
ln -s 444_shifted_KSS tbs_4i_DS4_KSS
</pre>

Now issue 

<pre>
abinit < tbs_4.files >& tbs4.log &
</pre>

to execute the test (it should take around 2 minutes).
<p>
Once the calculation is completed, plot the spectra obtained with different 
<a href="../input_variables/vargw.html#ecuteps" target="kwimg">ecuteps</a> using

<pre>
$ gnuplot
gnuplot&#62  p "tbs_4o_DS1_EXC_MDF" u 1:3 w l
gnuplot&#62  p "tbs_4o_DS2_EXC_MDF" u 1:3 w l
gnuplot&#62  p "tbs_4o_DS3_EXC_MDF" u 1:3 w l
gnuplot&#62  p "tbs_4o_DS4_EXC_MDF" u 1:3 w l
</pre>

<p align="center"><img src=./lesson_bse/tbs4.png></p>

The spectrum is found to converge quickly in <a href="../input_variables/vargw.html#ecuteps" target="kwimg">ecuteps</a>.
The curves obtained with <a href="../input_variables/vargw.html#ecuteps" target="kwimg">ecuteps</a>=3 and 4 Ha 
are almost indistinguishable from each other.
Our final estimate for <a href="../input_variables/vargw.html#ecuteps" target="kwimg">ecuteps</a> is therefore 3 Ha.
<p>
Note that this value is smaller than the one required to converge the QP corrections within 0.01 eV
(in the  <a href="lesson_gw1.html">first lesson</a> of the GW tutorial we obtained 6.0 Ha).
This is a general behavior, in the sense that Bethe-Salpeter spectra, unlike GW corrections, 
are not usually very sensitive to truncations in the planewave expansion of W. 
Reasonable BS spectra are obtained even when W is treated within the diagonal approximation or, alternatively, 
with model dielectric functions.
<p>
Note also how the two peaks are affected in a different way by the change of <a href="../input_variables/vargw.html#ecuteps" target="kwimg">ecuteps</a>,
with the first peak affected the most.
This behavior is consistent with our affirmation that the first peak of silicon has a strong excitonic character.

<hr>
<h3><p><b><a name="conv_on_k">6.</a>
Convergence with respect to the number of k-points
</b></h3>

The last parameter that should be checked for convergence is the number of k-points.
This convergence study represents the most tedious and difficult part since it requires the generation of new KSS files
and of the new SCR file for each k-mesh (the list of k-points for the wavefunctions and the set of q-points 
in the screening must be consistent with each other).
<p>
The file ~abinit/tests/tutorial/Input/tbs_5.in gathers the different steps of a standard BS calculation 
(generation of two KSS file, screening calculation, BS run) into a single input. 
The calculation is done with the converged parameters found in the previous studies,
only <a href="../input_variables/varbas.html#ngkpt" target="kwimg">ngkpt</a> 
has been intentionally left undefined.
<!--
<a href="../input_variables/vargw.html#bs_loband" target="kwimg">bs_loband</a>,
<a href="../input_variables/varbas.html#nband" target="kwimg">nband</a>,
<a href="../input_variables/vargw.html#ecuteps" target="kwimg">ecuteps</a>,
-->
<p>
Use tbs_5.in as a template for performing BS calculations with different k-meshes.
For example, you might try to compare the three meshes 4x4x4, 5x5x5, and 6x6x6.
To facilitate the analysis of the results, we suggest to run the calculations 
in different directories so that we can keep the output results separated.
<p>
Be aware that both the CPU time as well as the memory requirements increase quickly with 
the number of divisions in the mesh.
These are, for example, the CPU times required by different k-meshes on Intel Xeon X5570:
<p>

<pre>
4x4x4:    +Overall time at end (sec) : cpu=        112.4  wall=        112.4
5x5x5:    +Overall time at end (sec) : cpu=        362.8  wall=        362.8
6x6x6:    +Overall time at end (sec) : cpu=        914.8  wall=        914.8
8x8x8:    +Overall time at end (sec) : cpu=       5813.3  wall=       5813.3
10x10x10: +Overall time at end (sec) : cpu=      20907.1  wall=      20907.1
12x12x12: +Overall time at end (sec) : cpu=      62738.2  wall=      62738.2
</pre>

6x6x6 is likely the most dense sampling you can afford on a single-CPU machine.
For you convenience, we have collected the results of the convergence test in the figure below.

<p align="center"><img src=./lesson_bse/tbs5.png></p>

<p>
As anticipated, the spectrum converges slowly with the number of k-points
and our first calculation done with the 4x4x4 grid is severely unconverged.
The most accurate results are obtained with the 12x12x12 k-mesh, but even this 
sampling leads to converged results only for frequencies below 4.5 eV.
This is a problem common to all BS computations, in the sense that it is extremely
difficult to achieve global converge in the spectra.
This analysis shows that we can trust the 12x12x12 results in the [0:4,5] eV range while
the correct description of the spectrum at higher energies would require the 
inclusion of more k-point and, possibly, more bands so that the band dispersion is correctly taken
into account (even the RPA spectrum does not coverge at high frequencies when 12x12x12 is used).
<p>
It should be stressed that <a href="../input_variables/vargw.html#zcut" target="kwimg">zcut</a> 
plays a very important role in these converge tests.
For example, the results obtained with the 8x8x8 or the 10x10x10 k-mesh 
can be brought closer to the 12x12x12 by just increasing the Lorentzian broadening.
When comparing theory with experiments, it is common to treat 
<a href="../input_variables/vargw.html#zcut" target="kwimg">zcut</a> 
as an <i>a posteriori</i> parameter chosen to produce the best agreement with the experiment. 
<!--
<p>
The results are in agreement with similar ab-initio calculations reported in the literature.
The agreement with experiments is good but not perfect. 
When comparing theoretical results with experiment, however, one should always take into 
account that there are many approximations involved in the standard Bethe-Salpeter approach
frequency dependence in the screening, temperature effects, electron-phonon interaction ...
-->

<hr>
<h3><p><b><a name="additional_exercises">6.</a>
Additional exercises (optional)
</b></h3>

<ul>
  <li>
  <!--
  <p>
  In crystalline systems, the Tamm-Dancoff approximation usually gives imaginary parts that are almost indistinguishable from
  the results obtained by solving the problem for the full Hamiltonian (see also ???)
  On the other hand the correct description of excitonic effects in isolated systems might require the
  correct inclusion of the coupling terms due to the localization of the electronic wavefunctions.
  Another case in which one should test the reliability of the TDA is represented by the calculation
  of the real part of the macroscopic dielectric function (EELS).
  -->
  <p>
  Use <a href="../input_variables/vargw.html#bs_coupling" target="kwimg">bs_coupling</a>=1 to
  perform an excitonic calculation for silicon including the coupling term. 
  Compare the imaginary part of the macroscopic dielectric function obtained with and
  without coupling. Do you find significant differences?
  (Caveat: calculations with coupling cannot use the Haydock method and are much more CPU demanding.
  You might have to decrease some input parameters to have results in reasonable time.)
  <li>
  <p>
  Calculate the one-shot GW corrections for silicon following the <a href="lesson_gw1.html">first lesson</a>
  of the GW tutorial. Then use the _GW file produced by the code to calculate the absorption spectrum.
</ul>

<hr>

<p><a name="BS_MPI"></a><br>
<h3><b> 7. Notes on the MPI implementation.</b></h3>

<p>
In this section, we discuss the approach used to parallelize the two steps of the BS run, 
<i>i.e.</i> the construction of the H matrix and the evaluation of the macroscopic dielectric function.
<p>
First of all, it is important to stress that, unlike the GW code, the BS routines do not employ any kind 
of memory distribution for the wavefunctions. The entire set of orbitals used to construct the transition space 
is stored on each node
This choiche has been dictated by the fact that the size of H is usually much larger
than the array used to store the wavefunction, hence it is much more important to distribute the 
matrix than the wavefunctions.
Besides, having all the states on each node simplifies the calculation of several intermediate
quantities needed at run-time.
<p>
The memory allocated for the wavefunctions and the screening thus will not scale with the number of processors.
However, for very memory demanding calculations, the real space orbitals
can be calculated on the fly with an increase in computational time instead. 
This option is controlled by the second digit of the input variable 
<a href="../input_variables/vargw.html#gwmem" target="kwimg">gwmem</a>.
<p>
When discussing the MPI parallelization of the Bethe-Salpeter routines, we have to consider the two steps separately.
<p>
In the first step, the upper triangle of the resonant (coupling) block is distributed among the nodes. 
Each CPU computes its own portion and stores the results in a temporary array. 
At the end of the computation, the portions of the upper triangle are communicated to 
the master node which writes the binary file BSR (BSC).
<p>
In the second step, each node reads the data stored in the external files in order to build the excitonic Hamiltonian.
<!--
<p>
The distribution of the matrix now depends on the algorithm specified by 
<a href="../input_variables/vargw.html#bs_algorithm" target="kwimg">bs_algorithm</a>.
The direct diagonalization supports the ScalaPack interface, but we prefer not to discuss this rather technical point
in this tutorial.
-->
The matrix is distributed using a column-block partitioning, so that 
the matrix-vector multiplications required in the Haydock iterative scheme can be easily performed in parallel
(see the schematic representation reported below).
A similar distribution scheme is also employed for the conjugate-gradient minimization.
For a balanced  distribution of computational work, the number of processors should divide the total number of resonant transitions. 

<p align="center"><img src=./lesson_paral_mbt/MPI_mv.png></p>

<!--
<hr>
<h3><p><b><a name="references">References</a>
</b></h3>

<ol>
<li><a name="onida_rmp">G. Onida, L. Reining, and A. Rubio, Rev. Mod. Phys. <b>48</b>, 601 (2002)
and references therein</a> </li>
<li><a name="albrecth">S. Albrecht, Ph.D. thesis, Ecole Polytechnique, Palaiseau (Paris), 1999</a> </li>
<li><a name="haydock2">R. Haydock, Comput. Phys. Comm. <b>20</b>, 11 (1980)</a> </li>
<li><a name="gruning">M. Gruning, A. Marini and X. Gonze, Nano Letters <b>9</b>, 2820 (2009) </li>
<li><a name="baroni1986">S. Baroni and R. Resta, Phys. Rev. B <b>33</b>, 7017 (1986) </li>
<li><a name="benedict">L. X. Benedict and E. L. Shirley, Phys. Rev. B <b>59</b>, 5441 (1999)</a> </li>
<li><a name="adler">S. L. Adler, Phys. Rev. <b>126</b>, 413 (1962)</a> </li>
<li><a name="wiser">N. Wiser, Phys. Rev. <b>129</b>, 72 (1963)</a> </li>
</ol>
-->

<script type="text/javascript" src="list_internal_links.js"> </script>

</body>
</html>