File: node15.html

package info (click to toggle)
espresso 6.7-4
  • links: PTS, VCS
  • area: main
  • in suites: forky, sid
  • size: 311,068 kB
  • sloc: f90: 447,429; ansic: 52,566; sh: 40,631; xml: 37,561; tcl: 20,077; lisp: 5,923; makefile: 4,503; python: 4,379; perl: 1,219; cpp: 761; fortran: 618; java: 568; awk: 128
file content (267 lines) | stat: -rw-r--r-- 9,642 bytes parent folder | download | duplicates (3)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">

<!--Converted with LaTeX2HTML 2019.2 (Released June 5, 2019) -->
<HTML lang="EN">
<HEAD>
<TITLE>4.1 Execution time</TITLE>
<META NAME="description" CONTENT="4.1 Execution time">
<META NAME="keywords" CONTENT="user_guide">
<META NAME="resource-type" CONTENT="document">
<META NAME="distribution" CONTENT="global">

<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=utf-8">
<META NAME="viewport" CONTENT="width=device-width, initial-scale=1.0">
<META NAME="Generator" CONTENT="LaTeX2HTML v2019.2">

<LINK REL="STYLESHEET" HREF="user_guide.css">

<LINK REL="next" HREF="node16.html">
<LINK REL="previous" HREF="node14.html">
<LINK REL="next" HREF="node16.html">
</HEAD>

<BODY >
<!--Navigation Panel-->
<A
 HREF="node16.html">
<IMG WIDTH="37" HEIGHT="24" ALT="next" SRC="next.png"></A> 
<A
 HREF="node14.html">
<IMG WIDTH="26" HEIGHT="24" ALT="up" SRC="up.png"></A> 
<A
 HREF="node14.html">
<IMG WIDTH="63" HEIGHT="24" ALT="previous" SRC="prev.png"></A> 
<A ID="tex2html193"
  HREF="node1.html">
<IMG WIDTH="65" HEIGHT="24" ALT="contents" SRC="contents.png"></A>  
<BR>
<B> Next:</B> <A
 HREF="node16.html">4.2 Memory requirements</A>
<B> Up:</B> <A
 HREF="node14.html">4 Performances</A>
<B> Previous:</B> <A
 HREF="node14.html">4 Performances</A>
 &nbsp; <B>  <A ID="tex2html194"
  HREF="node1.html">Contents</A></B> 
<BR>
<BR>
<!--End of Navigation Panel-->

<H2><A ID="SECTION00051000000000000000">
4.1 Execution time</A>
</H2>

<P>
The following is a rough estimate of the complexity of a plain
scf calculation with <TT>pw.x</TT>, for NCPP. USPP and PAW 
give raise additional terms to be calculated, that may add from a 
few percent 
up to 30-40% to execution time. For phonon calculations, each of the
3<I>N</I><SUB>at</SUB> modes requires a time of the same order of magnitude of
self-consistent calculation in the same system (possibly times a small multiple). 
For <TT>cp.x</TT>, each time step takes something in the order of
<!-- MATH
 $T_h + T_{orth} + T_{sub}$
 -->
<I>T</I><SUB>h</SUB> + <I>T</I><SUB>orth</SUB> + <I>T</I><SUB>sub</SUB> defined below.

<P>
The time required for the self-consistent solution at fixed ionic
positions, <I>T</I><SUB>scf</SUB> , is:
<P><!-- MATH
 \begin{displaymath}
T_{scf} = N_{iter} T_{iter} + T_{init}
\end{displaymath}
 -->
</P>
<DIV ALIGN="CENTER">
<I>T</I><SUB>scf</SUB> = <I>N</I><SUB>iter</SUB><I>T</I><SUB>iter</SUB> + <I>T</I><SUB>init</SUB>
</DIV><P></P>
where <I>N</I><SUB>iter</SUB> = number of self-consistency iterations (<TT>niter</TT>), 
<I>T</I><SUB>iter</SUB> =
time for a single iteration, <I>T</I><SUB>init</SUB> = initialization time
(usually much smaller than the first term).

<P>
The time required for a single self-consistency iteration <I>T</I><SUB>iter</SUB> is:
<P><!-- MATH
 \begin{displaymath}
T_{iter} = N_k T_{diag} +T_{rho} + T_{scf}
\end{displaymath}
 -->
</P>
<DIV ALIGN="CENTER">
<I>T</I><SUB>iter</SUB> = <I>N</I><SUB>k</SUB><I>T</I><SUB>diag</SUB> + <I>T</I><SUB>rho</SUB> + <I>T</I><SUB>scf</SUB>
</DIV><P></P>
where <I>N</I><SUB>k</SUB> = number of k-points, <I>T</I><SUB>diag</SUB> = time per 
Hamiltonian iterative diagonalization, <I>T</I><SUB>rho</SUB> = time for charge density 
calculation, <I>T</I><SUB>scf</SUB> = time for Hartree and XC potential
calculation.

<P>
The time for a Hamiltonian iterative diagonalization <I>T</I><SUB>diag</SUB> is:
<P><!-- MATH
 \begin{displaymath}
T_{diag} = N_h T_h + T_{orth} + T_{sub}
\end{displaymath}
 -->
</P>
<DIV ALIGN="CENTER">
<I>T</I><SUB>diag</SUB> = <I>N</I><SUB>h</SUB><I>T</I><SUB>h</SUB> + <I>T</I><SUB>orth</SUB> + <I>T</I><SUB>sub</SUB>
</DIV><P></P>
where <I>N</I><SUB>h</SUB> = number of <I>Hψ</I> products needed by iterative diagonalization,
<I>T</I><SUB>h</SUB> = time per <I>Hψ</I> product, <I>T</I><SUB>orth</SUB> = CPU time for 
orthonormalization, <I>T</I><SUB>sub</SUB> = CPU time for subspace diagonalization.

<P>
The time <I>T</I><SUB>h</SUB> required for a <I>Hψ</I> product is
<P><!-- MATH
 \begin{displaymath}
T_h = a_1 M N + a_2 M N_1 N_2 N_3 log(N_1 N_2 N_3 ) + a_3 M P N.
\end{displaymath}
 -->
</P>
<DIV ALIGN="CENTER">
<I>T</I><SUB>h</SUB> = <I>a</I><SUB>1</SUB><I>MN</I> + <I>a</I><SUB>2</SUB><I>MN</I><SUB>1</SUB><I>N</I><SUB>2</SUB><I>N</I><SUB>3</SUB><I>log</I>(<I>N</I><SUB>1</SUB><I>N</I><SUB>2</SUB><I>N</I><SUB>3</SUB>) + <I>a</I><SUB>3</SUB><I>MPN</I>.
</DIV><P></P>
The first term comes from the kinetic term and is usually much smaller
than the others. The second and third terms come respectively from local
and nonlocal potential. <!-- MATH
 $a_1, a_2, a_3$
 -->
<I>a</I><SUB>1</SUB>, <I>a</I><SUB>2</SUB>, <I>a</I><SUB>3</SUB> are prefactors (i.e.
small numbers <!-- MATH
 ${\cal O}(1)$
 -->
<IMG STYLE="height: 1.69ex; vertical-align: -0.10ex; " SRC="img3.png"
 ALT="$\cal {O}$">(1)), <I>M</I> = number of valence
bands (<TT>nbnd</TT>), <I>N</I> = number of PW (basis set dimension: <TT>npw</TT>), <!-- MATH
 $N_1, N_2, N_3$
 -->
<I>N</I><SUB>1</SUB>, <I>N</I><SUB>2</SUB>, <I>N</I><SUB>3</SUB> =
dimensions of the FFT grid for wavefunctions (<TT>nr1s</TT>, <TT>nr2s</TT>,
<TT>nr3s</TT>; <!-- MATH
 $N_1 N_2 N_3 \sim 8N$
 -->
<I>N</I><SUB>1</SUB><I>N</I><SUB>2</SUB><I>N</I><SUB>3</SUB>∼8<I>N</I> ), 
P = number of pseudopotential projectors, summed on all atoms, on all values of the
angular momentum <I>l</I>, and <!-- MATH
 $m = 1, . . . , 2l + 1$
 -->
<I>m</I> = 1,..., 2<I>l</I> + 1.

<P>
The time <I>T</I><SUB>orth</SUB> required by orthonormalization is
<P><!-- MATH
 \begin{displaymath}
T_{orth} = b_1 N M_x^2
\end{displaymath}
 -->
</P>
<DIV ALIGN="CENTER">
<I>T</I><SUB>orth</SUB> = <I>b</I><SUB>1</SUB><I>NM</I><SUB>x</SUB><SUP>2</SUP>
</DIV><P></P>
and the time <I>T</I><SUB>sub</SUB> required by subspace diagonalization is
<P><!-- MATH
 \begin{displaymath}
T_{sub} = b_2 M_x^3
\end{displaymath}
 -->
</P>
<DIV ALIGN="CENTER">
<I>T</I><SUB>sub</SUB> = <I>b</I><SUB>2</SUB><I>M</I><SUB>x</SUB><SUP>3</SUP>
</DIV><P></P>
where <I>b</I><SUB>1</SUB> and <I>b</I><SUB>2</SUB> are prefactors, <I>M</I><SUB>x</SUB> = number of trial wavefunctions 
(this will vary between <I>M</I> and 2÷4<I>M</I>, depending on the algorithm).

<P>
The time <I>T</I><SUB>rho</SUB> for the calculation of charge density from wavefunctions is
<P><!-- MATH
 \begin{displaymath}
T_{rho} = c_1 M N_{r1} N_{r2}N_{r3} log(N_{r1} N_{r2} N_{r3}) +
            c_2 M N_{r1} N_{r2} N_{r3} + T_{us}
\end{displaymath}
 -->
</P>
<DIV ALIGN="CENTER">
<I>T</I><SUB>rho</SUB> = <I>c</I><SUB>1</SUB><I>MN</I><SUB>r1</SUB><I>N</I><SUB>r2</SUB><I>N</I><SUB>r3</SUB><I>log</I>(<I>N</I><SUB>r1</SUB><I>N</I><SUB>r2</SUB><I>N</I><SUB>r3</SUB>) + <I>c</I><SUB>2</SUB><I>MN</I><SUB>r1</SUB><I>N</I><SUB>r2</SUB><I>N</I><SUB>r3</SUB> + <I>T</I><SUB>us</SUB>
</DIV><P></P>
where <!-- MATH
 $c_1, c_2, c_3$
 -->
<I>c</I><SUB>1</SUB>, <I>c</I><SUB>2</SUB>, <I>c</I><SUB>3</SUB> are prefactors, <!-- MATH
 $N_{r1}, N_{r2}, N_{r3}$
 -->
<I>N</I><SUB>r1</SUB>, <I>N</I><SUB>r2</SUB>, <I>N</I><SUB>r3</SUB> =
dimensions of the FFT grid for charge density (<TT>nr1</TT>,
<TT>nr2</TT>, <TT>nr3</TT>; <!-- MATH
 $N_{r1} N_{r2} N_{r3} \sim 8N_g$
 -->
<I>N</I><SUB>r1</SUB><I>N</I><SUB>r2</SUB><I>N</I><SUB>r3</SUB>∼8<I>N</I><SUB>g</SUB>,
where <I>N</I><SUB>g</SUB> = number of G-vectors for the charge density,
<TT>ngm</TT>), and 
<I>T</I><SUB>us</SUB> = time required by PAW/USPPs contribution (if any).
Note that for NCPPs the FFT grids for charge and
wavefunctions are the same.

<P>
The time <I>T</I><SUB>scf</SUB> for calculation of potential from charge density is
<P><!-- MATH
 \begin{displaymath}
T_{scf} = d_2 N_{r1} N_{r2} N_{r3} + d_3 N_{r1} N_{r2} N_{r3}
            log(N_{r1} N_{r2} N_{r3} )
\end{displaymath}
 -->
</P>
<DIV ALIGN="CENTER">
<I>T</I><SUB>scf</SUB> = <I>d</I><SUB>2</SUB><I>N</I><SUB>r1</SUB><I>N</I><SUB>r2</SUB><I>N</I><SUB>r3</SUB> + <I>d</I><SUB>3</SUB><I>N</I><SUB>r1</SUB><I>N</I><SUB>r2</SUB><I>N</I><SUB>r3</SUB><I>log</I>(<I>N</I><SUB>r1</SUB><I>N</I><SUB>r2</SUB><I>N</I><SUB>r3</SUB>)
</DIV><P></P>
where <I>d</I><SUB>1</SUB>, <I>d</I><SUB>2</SUB> are prefactors.

<P>
For hybrid DFTs, the dominant term is by far the calculation of the 
nonlocal (<I>V</I><SUB>x</SUB><I>ψ</I>) product, taking as much as
<P><!-- MATH
 \begin{displaymath}
T_{exx} = e N_k N_q M^2 N_1 N_2N_3 log(N_1 N_2 N_3)
\end{displaymath}
 -->
</P>
<DIV ALIGN="CENTER">
<I>T</I><SUB>exx</SUB> = <I>eN</I><SUB>k</SUB><I>N</I><SUB>q</SUB><I>M</I><SUP>2</SUP><I>N</I><SUB>1</SUB><I>N</I><SUB>2</SUB><I>N</I><SUB>3</SUB><I>log</I>(<I>N</I><SUB>1</SUB><I>N</I><SUB>2</SUB><I>N</I><SUB>3</SUB>)
</DIV><P></P>
where <I>N</I><SUB>q</SUB> is the number of points in the <I>k</I> + <I>q</I> grid, determined by
options <TT>nqx1,nqx2,nqx3</TT>, <I>e</I> is a prefactor.

<P>
The above estimates are for serial execution. In parallel execution,
each contribution may scale in a different manner with the number of processors (see below).

<P>
<HR>
<!--Navigation Panel-->
<A
 HREF="node16.html">
<IMG WIDTH="37" HEIGHT="24" ALT="next" SRC="next.png"></A> 
<A
 HREF="node14.html">
<IMG WIDTH="26" HEIGHT="24" ALT="up" SRC="up.png"></A> 
<A
 HREF="node14.html">
<IMG WIDTH="63" HEIGHT="24" ALT="previous" SRC="prev.png"></A> 
<A ID="tex2html193"
  HREF="node1.html">
<IMG WIDTH="65" HEIGHT="24" ALT="contents" SRC="contents.png"></A>  
<BR>
<B> Next:</B> <A
 HREF="node16.html">4.2 Memory requirements</A>
<B> Up:</B> <A
 HREF="node14.html">4 Performances</A>
<B> Previous:</B> <A
 HREF="node14.html">4 Performances</A>
 &nbsp; <B>  <A ID="tex2html194"
  HREF="node1.html">Contents</A></B> 
<!--End of Navigation Panel-->

</BODY>
</HTML>