File: index.html

package info (click to toggle)
nvidia-cuda-toolkit 12.4.1-2
  • links: PTS, VCS
  • area: non-free
  • in suites: forky, trixie
  • size: 18,505,836 kB
  • sloc: ansic: 203,477; cpp: 64,769; python: 34,699; javascript: 22,006; xml: 13,410; makefile: 3,085; sh: 2,343; perl: 352
file content (821 lines) | stat: -rw-r--r-- 71,055 bytes parent folder | download | duplicates (6)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
<!DOCTYPE html>
<html class="writer-html5" lang="en" >
<head>
  <meta charset="utf-8" /><meta name="generator" content="Docutils 0.17.1: http://docutils.sourceforge.net/" />
<meta content="Nsight Compute Customization Guide." name="description" />
<meta content="User Guide" name="keywords" />

  <meta name="viewport" content="width=device-width, initial-scale=1.0" />
  <title>1. Customization Guide &mdash; NsightCompute 12.4 documentation</title>
      <link rel="stylesheet" href="../_static/pygments.css" type="text/css" />
      <link rel="stylesheet" href="../_static/css/theme.css" type="text/css" />
      <link rel="stylesheet" href="../_static/design-style.b7bb847fb20b106c3d81b95245e65545.min.css" type="text/css" />
      <link rel="stylesheet" href="../_static/omni-style.css" type="text/css" />
      <link rel="stylesheet" href="../_static/api-styles.css" type="text/css" />
    <link rel="shortcut icon" href="../_static/nsight-compute.ico"/>
  <!--[if lt IE 9]>
    <script src="../_static/js/html5shiv.min.js"></script>
  <![endif]-->
  
        <script data-url_root="../" id="documentation_options" src="../_static/documentation_options.js"></script>
        <script src="../_static/jquery.js"></script>
        <script src="../_static/underscore.js"></script>
        <script src="../_static/doctools.js"></script>
        <script src="../_static/mermaid-init.js"></script>
        <script src="../_static/design-tabs.js"></script>
        <script src="../_static/version.js"></script>
        <script src="../_static/social-media.js"></script>
    <script src="../_static/js/theme.js"></script>
    <link rel="index" title="Index" href="../genindex.html" />
    <link rel="search" title="Search" href="../search.html" />
    <link rel="next" title="2. NvRules API" href="../NvRulesAPI/index.html" />
    <link rel="prev" title="4. Nsight Compute CLI" href="../NsightComputeCli/index.html" />
 


</head>

<body class="wy-body-for-nav"> 
  <div class="wy-grid-for-nav">
    <nav data-toggle="wy-nav-shift" class="wy-nav-side">
      <div class="wy-side-scroll">
        <div class="wy-side-nav-search" >


  <a href="../index.html">
  <img src="../_static/nsight-compute.png" class="logo" alt="Logo"/>
</a>

<div role="search">
  <form id="rtd-search-form" class="wy-form" action="../search.html" method="get">
    <input type="text" name="q" placeholder="Search docs" />
    <input type="hidden" name="check_keywords" value="yes" />
    <input type="hidden" name="area" value="default" />
  </form>
</div>
        </div><div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="Navigation menu">
              <p class="caption" role="heading"><span class="caption-text">Nsight Compute</span></p>
<ul>
<li class="toctree-l1"><a class="reference internal" href="../ReleaseNotes/index.html">1. Release Notes</a></li>
<li class="toctree-l1"><a class="reference internal" href="../ProfilingGuide/index.html">2. Kernel Profiling Guide</a></li>
<li class="toctree-l1"><a class="reference internal" href="../NsightCompute/index.html">3. Nsight Compute</a></li>
<li class="toctree-l1"><a class="reference internal" href="../NsightComputeCli/index.html">4. Nsight Compute CLI</a></li>
</ul>
<p class="caption" role="heading"><span class="caption-text">Developer Interfaces</span></p>
<ul class="current">
<li class="toctree-l1 current"><a class="current reference internal" href="#">1. Customization Guide</a><ul>
<li class="toctree-l2"><a class="reference internal" href="#introduction">1.1. Introduction</a></li>
<li class="toctree-l2"><a class="reference internal" href="#metric-sections">1.2. Metric Sections</a><ul>
<li class="toctree-l3"><a class="reference internal" href="#section-files">1.2.1. Section Files</a></li>
<li class="toctree-l3"><a class="reference internal" href="#section-definition">1.2.2. Section Definition</a></li>
<li class="toctree-l3"><a class="reference internal" href="#metric-options-and-filters">1.2.3. Metric Options and Filters</a></li>
<li class="toctree-l3"><a class="reference internal" href="#counter-domains">1.2.4. Counter Domains</a></li>
<li class="toctree-l3"><a class="reference internal" href="#missing-sections">1.2.5. Missing Sections</a></li>
<li class="toctree-l3"><a class="reference internal" href="#derived-metrics">1.2.6. Derived Metrics</a></li>
</ul>
</li>
<li class="toctree-l2"><a class="reference internal" href="#rule-system">1.3. Rule System</a><ul>
<li class="toctree-l3"><a class="reference internal" href="#writing-rules">1.3.1. Writing Rules</a></li>
<li class="toctree-l3"><a class="reference internal" href="#integration">1.3.2. Integration</a></li>
<li class="toctree-l3"><a class="reference internal" href="#rule-system-architecture">1.3.3. Rule System Architecture</a></li>
<li class="toctree-l3"><a class="reference internal" href="#nvrules-api">1.3.4. NvRules API</a></li>
<li class="toctree-l3"><a class="reference internal" href="#rule-file-api">1.3.5. Rule File API</a></li>
<li class="toctree-l3"><a class="reference internal" href="#rule-examples">1.3.6. Rule Examples</a></li>
</ul>
</li>
<li class="toctree-l2"><a class="reference internal" href="#python-report-interface">1.4. Python Report Interface</a><ul>
<li class="toctree-l3"><a class="reference internal" href="#basic-usage">1.4.1. Basic Usage</a></li>
<li class="toctree-l3"><a class="reference internal" href="#high-level-interface">1.4.2. High-Level Interface</a></li>
<li class="toctree-l3"><a class="reference internal" href="#metric-attributes">1.4.3. Metric attributes</a></li>
<li class="toctree-l3"><a class="reference internal" href="#nvtx-support">1.4.4. NVTX Support</a></li>
<li class="toctree-l3"><a class="reference internal" href="#sample-script">1.4.5. Sample Script</a></li>
</ul>
</li>
<li class="toctree-l2"><a class="reference internal" href="#source-counters">1.5. Source Counters</a></li>
<li class="toctree-l2"><a class="reference internal" href="#report-file-format">1.6. Report File Format</a><ul>
<li class="toctree-l3"><a class="reference internal" href="#version-7-format">1.6.1. Version 7 Format</a></li>
</ul>
</li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="../NvRulesAPI/index.html">2. NvRules API</a></li>
</ul>
<p class="caption" role="heading"><span class="caption-text">Training</span></p>
<ul>
<li class="toctree-l1"><a class="reference internal" href="../Training/index.html">Training</a></li>
</ul>
<p class="caption" role="heading"><span class="caption-text">Release Information</span></p>
<ul>
<li class="toctree-l1"><a class="reference internal" href="../Archives/index.html">Archives</a></li>
</ul>
<p class="caption" role="heading"><span class="caption-text">Copyright and Licenses</span></p>
<ul>
<li class="toctree-l1"><a class="reference internal" href="../CopyrightAndLicenses/index.html">Copyright and Licenses</a></li>
</ul>

        </div>
      </div>
    </nav>

    <section data-toggle="wy-nav-shift" class="wy-nav-content-wrap"><nav class="wy-nav-top" aria-label="Mobile navigation menu" >
          <i data-toggle="wy-nav-top" class="fa fa-bars"></i>
          <a href="../index.html">NsightCompute</a>
      </nav>

      <div class="wy-nav-content">
        <div class="rst-content">
          <div role="navigation" aria-label="Page navigation">
  <ul class="wy-breadcrumbs">


<li><a href="../index.html" class="icon icon-home"></a> &raquo;</li>
<li><span class="section-number">1. </span>Customization Guide</li>

      <li class="wy-breadcrumbs-aside">
      </li>
<li class="wy-breadcrumbs-aside">


  <span>v2024.1.1 |</span>



  <a href="https://developer.nvidia.com/nsight-compute-history" class="reference external">Archive</a>


  <span>&nbsp;</span>
</li>

  </ul>
  <hr/>
</div>
          <div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
           <div itemprop="articleBody">
             
  <section id="customization-guide">
<h1><span class="section-number">1. </span>Customization Guide<a class="headerlink" href="#customization-guide" title="Permalink to this headline"></a></h1>
<p>Nsight Compute Customization Guide.</p>
<p>User manual on customizing NVIDIA Nsight Compute tools or integrating them with custom workflows. Information on writing section files, rules for automatic result analysis and scripting access to report files.</p>
<section id="introduction">
<h2><span class="section-number">1.1. </span>Introduction<a class="headerlink" href="#introduction" title="Permalink to this headline"></a></h2>
<p>The goal of NVIDIA Nsight Compute is to design a profiling tool that can be easily extended and customized by expert users. While we provide useful defaults, this allows adapting the reports to a specific use case or to design new ways to investigate collected data. All the following is data driven and does not require the tools to be recompiled.</p>
<p>While working with section files or rules files it is recommended to open the <em>Metric Selection</em> tool window from the <em>Profile</em> menu item. This tool window lists all sections and rules that were loaded. Rules are grouped as children of their associated section or grouped in the <em>[Independent Rules]</em> entry. For files that failed to load, the table shows the error message. Use the <em>Reload</em> button to reload rule files from disk.</p>
</section>
<section id="metric-sections">
<span id="sections"></span><h2><span class="section-number">1.2. </span>Metric Sections<a class="headerlink" href="#metric-sections" title="Permalink to this headline"></a></h2>
<p>The <a class="reference external" href="../NsightCompute/index.html#profiler-report-details-page">Details</a> page consists of <a class="reference external" href="../ProfilingGuide/index.html#sets-and-sections">sections</a> that focus on a specific part of the kernel analysis each. Every section is defined by a corresponding <a class="reference external" href="index.html#section-files">section file</a> that specifies the data to be collected as well as the visualization used in the UI or CLI output for this data. Simply modify a deployed section file to add or modify what is collected.</p>
<section id="section-files">
<h3><span class="section-number">1.2.1. </span>Section Files<a class="headerlink" href="#section-files" title="Permalink to this headline"></a></h3>
<p>The section files delivered with the tool are stored in the <code class="docutils literal notranslate"><span class="pre">sections</span></code> sub-folder of the NVIDIA Nsight Compute install directory. Each section is defined in a separate file with the <code class="docutils literal notranslate"><span class="pre">.section</span></code> file extension. At runtime, the installed stock sections (and rules) are deployed to a user-writable directory. This can be disabled with an <a class="reference external" href="../NsightComputeCli/index.html#environment-variables">environment variable</a>. Section files from the deployment directory are loaded automatically at the time the UI connects to a target application or the command line profiler is launched. This way, any changes to section files become immediately available in the next profile run.</p>
<p>A section file is a text representation of a <em>Google Protocol Buffer</em> message. The full definition of all available fields of a section message is given in <a class="reference external" href="index.html#section-definition">Section Definition</a>. In short, each section consists of a unique <em>Identifier</em> (no spaces allowed), a <em>Display Name</em>, an optional <em>Order</em> value (for sorting the sections in the <a class="reference external" href="../NsightCompute/index.html#profiler-report-details-page">Details page</a>), an optional <em>Description</em> providing guidance to the user, an optional header table, an optional list of metrics to be collected but not displayed, optional bodies with additional UI elements, and other elements. See <code class="docutils literal notranslate"><span class="pre">ProfilerSection.proto</span></code> for the exact list of available elements. A small example of a very simple section is:</p>
<div class="highlight-text notranslate"><div class="highlight"><pre><span></span>Identifier: &quot;SampleSection&quot;
DisplayName: &quot;Sample Section&quot;
Description: &quot;This sample section shows information on active warps and cycles.&quot;
Header {
  Metrics {
    Label: &quot;Active Warps&quot;
    Name: &quot;smsp__active_warps_avg&quot;
  }
  Metrics {
    Label: &quot;Active Cycles&quot;
    Name: &quot;smsp__active_cycles_avg&quot;
  }
}
</pre></div>
</div>
<p>On data collection, this section will cause the two PerfWorks metrics <code class="docutils literal notranslate"><span class="pre">smsp__active_warps_avg</span></code> and <code class="docutils literal notranslate"><span class="pre">smsp__active_cycles_avg</span></code> to be collected.</p>
<figure class="align-default" id="id2">
<img alt="../_images/section-files.png" class="image" src="../_images/section-files.png" />
<figcaption>
<p><span class="caption-text">The section as shown on the Details page</span><a class="headerlink" href="#id2" title="Permalink to this image"></a></p>
</figcaption>
</figure>
<p>By default, when not available, metrics specified in section files will only generate a warning during data collection, and would then show up as “N/A” in the UI or CLI. This is in contrast to metrics requested via <code class="docutils literal notranslate"><span class="pre">--metrics</span></code> which would cause an error when not available. How to specify metrics as required for data collection is described in <a class="reference external" href="index.html#metric-options">Metric Options and Filters</a>.</p>
<p>More advanced elements can be used in the body of a section. See the <code class="docutils literal notranslate"><span class="pre">ProfilerSection.proto</span></code> file for which elements are available. The following example shows how to use these in a slightly more complex example. The usage of regexes is allowed in tables and charts in the section <em>Body</em> only and follows the format <code class="docutils literal notranslate"><span class="pre">regex:</span></code> followed by the actual regex to match <em>PerfWorks</em> metric names.</p>
<p>The supported list of metrics that can be used in sections can be queried using the <a class="reference external" href="../NsightComputeCli/index.html#command-line-options-profile">command line interface</a> with the <code class="docutils literal notranslate"><span class="pre">--query-metrics</span></code> option. Each of these metrics can be used in any section and will be automatically collected if they appear in any enabled section. Note that even if a metric is used in multiple sections, it will only be collected once. Look at all the shipped sections to see how they are implemented.</p>
<div class="highlight-text notranslate"><div class="highlight"><pre><span></span>Identifier: &quot;SampleSection&quot;
DisplayName: &quot;Sample Section&quot;
Description: &quot;This sample section shows various metrics.&quot;
Header {
  Metrics {
    Label: &quot;Active Warps&quot;
    Name: &quot;smsp__active_warps_avg&quot;
  }
  Metrics {
    Label: &quot;Active Cycles&quot;
    Name: &quot;smsp__active_cycles_avg&quot;
  }
}
Body {
  Items {
    Table {
      Label: &quot;Example Table&quot;
      Rows: 2
      Columns: 1
      Metrics {
        Label: &quot;Avg. Issued Instructions Per Scheduler&quot;
        Name: &quot;smsp__inst_issued_avg&quot;
      }
      Metrics {
        Label: &quot;Avg. Executed Instructions Per Scheduler&quot;
        Name: &quot;smsp__inst_executed_avg&quot;
      }
    }
  }
  Items {
    Table {
      Label: &quot;Metrics Table&quot;
      Columns: 2
      Order: ColumnMajor
      Metrics {
        Name: &quot;regex:.*__elapsed_cycles_sum&quot;
      }
    }
  }
  Items {
    BarChart {
      Label: &quot;Metrics Chart&quot;
      CategoryAxis {
        Label: &quot;Units&quot;
      }
      ValueAxis {
        Label: &quot;Cycles&quot;
      }
      Metrics {
        Name: &quot;regex:.*__elapsed_cycles_sum&quot;
      }
    }
  }
}
</pre></div>
</div>
<img alt="../_images/section-files-2.png" class="align-center" src="../_images/section-files-2.png" />
<p>The output of this section would look similar to this screenshot in the UI</p>
</section>
<section id="section-definition">
<h3><span class="section-number">1.2.2. </span>Section Definition<a class="headerlink" href="#section-definition" title="Permalink to this headline"></a></h3>
<p>Protocol buffer definitions are in the NVIDIA Nsight Compute installation directory under <code class="docutils literal notranslate"><span class="pre">extras/FileFormat</span></code>. To understand section files, start with the definitions and documentation in <code class="docutils literal notranslate"><span class="pre">ProfilerSection.proto</span></code>.</p>
<p>To see the list of available <em>PerfWorks</em> metrics for any device or chip, use the <code class="docutils literal notranslate"><span class="pre">--query-metrics</span></code> option of the <a class="reference external" href="../NsightComputeCli/index.html#command-line-options-profile">command line</a>.</p>
</section>
<section id="metric-options-and-filters">
<h3><span class="section-number">1.2.3. </span>Metric Options and Filters<a class="headerlink" href="#metric-options-and-filters" title="Permalink to this headline"></a></h3>
<p>Sections allow the user to specify alternative <em>options</em> for metrics that have a different metric name on different GPU architectures. Metric options use a min-arch/max-arch range <em>filter</em>, replacing the base metric with the first metric option for which the current GPU architecture matches the filter. While not strictly enforced, options for a base metric are expected to share the same meaning and subsequently unit, etc., with the base metric.</p>
<p>In addition to its options, the base metric can be filtered by the same criteria. This is useful for metrics that are only available for certain architectures or in limited collection scopes. See <code class="docutils literal notranslate"><span class="pre">ProfilerMetricOptions.proto</span></code> for which filter options are available.</p>
<p>In the below example, the metric <code class="docutils literal notranslate"><span class="pre">dram__cycles_elapsed.avg.per_second</span></code> is collected on SM 7.0 and SM 7.5-8.6, but not on any in between. It uses the same metric name on these architectures.</p>
<div class="highlight-text notranslate"><div class="highlight"><pre><span></span>Metrics {
    Label: &quot;DRAM Frequency&quot;
    Name: &quot;dram__cycles_elapsed.avg.per_second&quot;
    Filter {
      MaxArch: CC_70
    }
    Options {
      Name: &quot;dram__cycles_elapsed.avg.per_second&quot;
      Filter {
        MinArch: CC_75
        MaxArch: CC_86
      }
    }
}
</pre></div>
</div>
<p>In the next example, the metric in the section header is only collected for launch-based collection scopes (i.e. kernel- and application replay for CUDA kernels or CUDA Graph nodes), but not in range-based scopes.</p>
<div class="highlight-text notranslate"><div class="highlight"><pre><span></span>Header {
  Metrics {
    Label: &quot;Theoretical Occupancy&quot;
    Name: &quot;sm__maximum_warps_per_active_cycle_pct&quot;
    Filter {
      CollectionFilter {
        CollectionScopes: CollectionScope_Launch
      }
    }
  }
}
</pre></div>
</div>
<p>Similarly, <code class="docutils literal notranslate"><span class="pre">CollectionFilter</span></code>s can be used to set the <code class="docutils literal notranslate"><span class="pre">Importance</span></code> of a metric, which specifies an expectation on its availability during data collection. <code class="docutils literal notranslate"><span class="pre">Required</span></code> metrics, for instance, are expected to be collectable and would generate an error in case they are not available, whereas <code class="docutils literal notranslate"><span class="pre">Optional</span></code> metrics would only generate a warning. Here is a minimal example, illustrating the functionality:</p>
<div class="highlight-text notranslate"><div class="highlight"><pre><span></span>Metrics {
  Label: &quot;Compute (SM) Throughput&quot;
  Name: &quot;sm__throughput.avg.pct_of_peak_sustained_elapsed&quot;
  Filter {
    CollectionFilter {
      Importance: Required
    }
  }
}
</pre></div>
</div>
<p>Filters can be applied to an entire section instead of or in addition to being set for individual metrics. If both types of filters are specified, they are combined, such that <code class="docutils literal notranslate"><span class="pre">Metrics</span></code>-scope filters take precedence over section-scope filters.</p>
</section>
<section id="counter-domains">
<h3><span class="section-number">1.2.4. </span>Counter Domains<a class="headerlink" href="#counter-domains" title="Permalink to this headline"></a></h3>
<p>PM sampling metrics are composed of one or more raw counter dependencies internally.
Each counter is associated with a <a class="reference external" href="../ProfilingGuide/index.html#pm-sampling">counter domain</a>, which describes how and where in the hardware the counter is collected.
For metrics specified in section files, the automatic domain selection can be overwritten when needed to form more optimal PM sampling metric groups.</p>
<div class="highlight-text notranslate"><div class="highlight"><pre><span></span>Metrics {
  Label: &quot;Short Scoreboard&quot;
  Name: &quot;pmsampling:smsp__warps_issue_stalled_short_scoreboard.avg&quot;
  Groups: &quot;sampling_ws4&quot;
  CtrDomains: &quot;gpu_sm_c&quot;
}
</pre></div>
</div>
<p>Note that the <code class="docutils literal notranslate"><span class="pre">CtrDomains</span></code> field is currently only supported for the section <code class="docutils literal notranslate"><span class="pre">Metrics</span></code> field, but not for individual <a class="reference external" href="index.html#metric-options-and-filters">Options</a>.</p>
</section>
<section id="missing-sections">
<h3><span class="section-number">1.2.5. </span>Missing Sections<a class="headerlink" href="#missing-sections" title="Permalink to this headline"></a></h3>
<p>If new or updated section files are not used by NVIDIA Nsight Compute, it is most commonly one of two reasons:</p>
<p><strong>The file is not found:</strong> Section files must have the <code class="docutils literal notranslate"><span class="pre">.section</span></code> extension. They must also be on the section search path. The default search path is the <code class="docutils literal notranslate"><span class="pre">sections</span></code> directory within the installation directory. In NVIDIA Nsight Compute CLI, the search paths can be overwritten using the <code class="docutils literal notranslate"><span class="pre">--section-folder</span></code> and <code class="docutils literal notranslate"><span class="pre">--section-folder-recursive</span></code> options. In NVIDIA Nsight Compute, the search path can be configured in the <em>Profile</em> options.</p>
<p><strong>Syntax errors:</strong> If the file is found but has syntax errors, it will not be available for metric collection. However, error messages are reported for easier debugging. In NVIDIA Nsight Compute CLI, use the <code class="docutils literal notranslate"><span class="pre">--list-sections</span></code> option to get a list of error messages, if any. In NVIDIA Nsight Compute, error messages are reported in the <em>Metric Selection</em> tool window.</p>
</section>
<section id="derived-metrics">
<h3><span class="section-number">1.2.6. </span>Derived Metrics<a class="headerlink" href="#derived-metrics" title="Permalink to this headline"></a></h3>
<p>Derived Metrics allow you to define new metrics composed of constants or existing metrics directly in a section file. The new metrics are computed at collection time and added permanently to the profile result in the report. They can then subsequently be used for any tables, charts, rules, etc.</p>
<p>NVIDIA Nsight Compute currently supports the following syntax for defining derived metrics in section files:</p>
<div class="highlight-text notranslate"><div class="highlight"><pre><span></span>MetricDefinitions {
  MetricDefinitions {
    Name: &quot;derived_metric_name&quot;
    Expression: &quot;derived_metric_expr&quot;
  }
  MetricDefinitions {
    ...
  }
  ...
}
</pre></div>
</div>
<p>The actual metric expression is defined as follows:</p>
<div class="highlight-text notranslate"><div class="highlight"><pre><span></span>derived_metric_expr ::= operand operator operand
operator            ::= + | - | * | /
operand             ::= metric | constant
metric              ::= (an existing metric name)
constant            ::= double | uint64
double              ::= (double-precision number of the form &quot;N.(M)?&quot;, e.g. &quot;5.&quot; or &quot;0.3109&quot;)
uint64              ::= (64-bit unsigned integer number of the form &quot;N&quot;, e.g. &quot;2029&quot;)
</pre></div>
</div>
<p>Operators are defined as follows:</p>
<div class="highlight-text notranslate"><div class="highlight"><pre><span></span>For op in (+ | - | *): For each element in a metric it is applied to, the expression left-hand side op-combined with expression right-hand side.
For op in (/): For each element in a metric it is applied to, the expression left-hand side op-combined with expression right-hand side. If the right-hand side operand is of integer-type, and 0, the result is the left-hand side value.
</pre></div>
</div>
<p>Since metrics can contain regular values and/or <a class="reference external" href="../ProfilingGuide/index.html#metrics-structure">instanced values</a>, elements are combined as below. Constants are treated as metrics with only a regular value.</p>
<div class="highlight-text notranslate"><div class="highlight"><pre><span></span>1. Regular values are operator-combined.
a + b

2. If both metrics have no correlation ids, the first N values are operator-combined, where N is the minimum of the number of elements in both metrics.
a1 + b1
a2 + b2
a3
a4

3. Else if both metrics have correlation ids, the sets of correlation ids from both metrics are joined and then operator-combined as applicable.
a1 + b1
a2
b3
a4 + b4
b5

4. Else if only the left-hand side metric has correlation ids, the right-hand side regular metric value is operator-combined with every element of the left-hand side metric.
a1 + b
a2 + b
a3 + b

5. Else if only the right-hand side metric has correlation ids, the right-hand side element values are operator-combined with the regular metric value of the left-hand side metric.
a + b1 + b2 + b3
</pre></div>
</div>
<p>In all operations, the value kind of the left-hand side operand is used. If the right-hand side operand has a different value kind, it is converted. If the left-hand side operand is a string-kind, it is returned unchanged.</p>
<p>Examples for derived metrics are <code class="docutils literal notranslate"><span class="pre">derived__avg_thread_executed</span></code>, which provides a hint on the number of threads executed on average at each instruction, and <code class="docutils literal notranslate"><span class="pre">derived__uncoalesced_l2_transactions_global</span></code>, which indicates the ratio of actual L2 transactions vs. ideal L2 transactions at each applicable instruction.</p>
<div class="highlight-text notranslate"><div class="highlight"><pre><span></span>MetricDefinitions {
  MetricDefinitions {
    Name: &quot;derived__avg_thread_executed&quot;
    Expression: &quot;thread_inst_executed_true / inst_executed&quot;
  }
  MetricDefinitions {
    Name: &quot;derived__uncoalesced_l2_transactions_global&quot;
    Expression: &quot;memory_l2_transactions_global / memory_ideal_l2_transactions_global&quot;
  }
  MetricDefinitions {
    Name: &quot;sm__sass_thread_inst_executed_op_ffma_pred_on_x2&quot;
    Expression: &quot;sm__sass_thread_inst_executed_op_ffma_pred_on.sum.peak_sustained * 2&quot;
  }
}
</pre></div>
</div>
</section>
</section>
<section id="rule-system">
<h2><span class="section-number">1.3. </span>Rule System<a class="headerlink" href="#rule-system" title="Permalink to this headline"></a></h2>
<p>NVIDIA Nsight Compute features a new Python-based rule system. It is designed as the successor to the <em>Expert System</em> (un)guided analysis in NVIDIA Visual Profiler, but meant to be more flexible and more easily extensible to different use cases and APIs.</p>
<section id="writing-rules">
<h3><span class="section-number">1.3.1. </span>Writing Rules<a class="headerlink" href="#writing-rules" title="Permalink to this headline"></a></h3>
<p>To create a new rule, you need to create a new text file with the extension <code class="docutils literal notranslate"><span class="pre">.py</span></code> and place it at some location that is detectable by the tool (see Nsight Compute Integration on how to specify the search path for rules). At a minimum, the rule file must implement two functions, <code class="docutils literal notranslate"><span class="pre">get_identifier</span></code> and <code class="docutils literal notranslate"><span class="pre">apply</span></code>. See Rule File API for a description of all functions supported in rule files. See NvRules for details on the interface available in the rule’s <code class="docutils literal notranslate"><span class="pre">apply</span></code> function.</p>
</section>
<section id="integration">
<h3><span class="section-number">1.3.2. </span>Integration<a class="headerlink" href="#integration" title="Permalink to this headline"></a></h3>
<p>The rule system is integrated into NVIDIA Nsight Compute as part of the profile report view. When you profile a kernel, available rules will be shown in the report’s <em>Details</em> page. You can either select to apply all available rules at once by clicking <em>Apply Rules</em> at the top of the page, or apply rules individually. Once applied, the rule results will be added to the current report. By default, all rules are applied automatically.</p>
<img alt="../_images/integration-1.png" class="image" src="../_images/integration-1.png" />
<p>Section with a single Bottleneck rule available.</p>
<img alt="../_images/integration-2.png" class="align-center" src="../_images/integration-2.png" />
<p>The same section with the Bottleneck rule applied. It added a single message to the report.</p>
<img alt="../_images/integration-3.png" class="align-center" src="../_images/integration-3.png" />
<p>The section Rule has two associated rules, Basic Template Rule and Advanced Template Rule. The latter is not yet applied. Rules can add various UI elements, including warning and error messages as well as charts and tables.</p>
<img alt="../_images/integration-4.png" class="align-center" src="../_images/integration-4.png" />
<p>Some rules are applied independently from sections. They are shown under Independent Rules.</p>
</section>
<section id="rule-system-architecture">
<h3><span class="section-number">1.3.3. </span>Rule System Architecture<a class="headerlink" href="#rule-system-architecture" title="Permalink to this headline"></a></h3>
<p>The rule system consists of the Python interpreter, the <em>NvRules C++ interface</em>, the <em>NvRules Python interface</em> (NvRules.py) and a set of rule files. Each rule file is valid Python code that imports the NvRules.py module, adheres to certain standards defined by the <a class="reference external" href="index.html#rule-file-api">Rule File API</a> and is called to from the tool.</p>
<p>When applying a rule, a handle to the rule <em>Context</em> is provided to its apply function. This context captures most of the functionality that is available to rules as part of the <a class="reference external" href="index.html#nvrules-api">NvRules API</a>. In addition, some functionality is provided directly by the NvRules module, e.g. for global error reporting. Finally, since rules are valid Python code, they can use regular libraries and language functionality that ship with Python as well.</p>
<p>From the rule <em>Context</em>, multiple further objects can be accessed, e.g. the <em>Frontend</em>, <em>Ranges</em> and <em>Actions</em>. It should be noted that those are only interfaces, i.e. the actual implementation can vary from tool to tool that decides to implement this functionality.</p>
<p>Naming of these interfaces is chosen to be as API-independent as possible, i.e. not to imply CUDA-specific semantics. However, since many compute and graphics APIs map to similar concepts, it can easily be mapped to CUDA terminology, too. A <em>Range</em> refers to a CUDA stream, an Action refers to a single CUDA kernel instance. Each action references several <em>Metrics</em> that have been collected during profiling (e.g. <code class="docutils literal notranslate"><span class="pre">instructions</span> <span class="pre">executed</span></code>) or are statically available (e.g. the launch configuration). <em>Metrics</em> are accessed via their names from the <em>Action</em>.</p>
<p>Each CUDA stream can contain any number of kernel (or other device activity) instances and so each <em>Range</em> can reference one or more <em>Actions</em>. However, currently only a single <em>Action</em> per <em>Range</em> will be available, as only a single CUDA kernel can be profiled at once.</p>
<p>The <em>Frontend</em> provides an interface to manipulate the tool UI by adding messages, graphical elements such as line and bar charts or tables, as well as speedup estimations, focus metrics and source markers. The most common use case is for a rule to show at least one message, stating the result to the user, as illustrated in <code class="docutils literal notranslate"><span class="pre">extras/RuleTemplates/BasicRuleTemplate.py</span></code> This could be as simple as “No issues have been detected,” or contain direct hints as to how the user could improve the code, e.g. “Memory is more heavily utilized than Compute. Consider whether it is possible for the kernel to do more compute work.” For more advanced use cases, such as adding speedup estimates, key performance indicators (a.k.a. focus metrics) or source markers to annotate individual lines of code to your rule, see the templates in <code class="docutils literal notranslate"><span class="pre">extras/RuleTemplates</span></code>.</p>
</section>
<section id="nvrules-api">
<h3><span class="section-number">1.3.4. </span>NvRules API<a class="headerlink" href="#nvrules-api" title="Permalink to this headline"></a></h3>
<p>The <em>NvRules API</em> is defined as a C/C++ style interface, which is converted to the NvRules.py Python module to be consumable by the rules. As such, C++ class interfaces are directly converted to Python classes und functions. See the <a class="reference external" href="../NvRulesAPI/index.html#abstract">NvRules API</a> documentation for the classes and functions available in this interface.</p>
</section>
<section id="rule-file-api">
<h3><span class="section-number">1.3.5. </span>Rule File API<a class="headerlink" href="#rule-file-api" title="Permalink to this headline"></a></h3>
<p>The <em>Rule File API</em> is the implicit contract between the rule Python file and the tool. It defines which functions (syntactically and semantically) the Python file must provide to properly work as a rule.</p>
<p><strong>Mandatory Functions</strong></p>
<ul class="simple">
<li><p><code class="docutils literal notranslate"><span class="pre">get_identifier()</span></code>: Return the unique rule identifier string.</p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">apply(handle)</span></code>: Apply this rule to the rule context provided by handle. Use <code class="docutils literal notranslate"><span class="pre">NvRules.get_context(handle)</span></code> to obtain the <em>Context</em> interface from handle.</p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">get_name()</span></code>: Return the user-consumable display name of this rule.</p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">get_description()</span></code>: Return the user-consumable description of this rule.</p></li>
</ul>
<p><strong>Optional Functions</strong></p>
<ul>
<li><p><code class="docutils literal notranslate"><span class="pre">get_section_identifier()</span></code>: Return the unique section identifier that maps this rule to a section. Section-mapped rules will only be available if the corresponding section was collected. They implicitly assume that the metrics requested by the section are collected when the rule is applied.</p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">evaluate(handle)</span></code>:</p>
<p>Declare required metrics and rules that are necessary for this rule to be applied. Use <code class="docutils literal notranslate"><span class="pre">NvRules.require_metrics(handle,</span> <span class="pre">[...])</span></code> to declare the list of metrics that must be collected prior to applying this rule.</p>
<p>Use e.g. <code class="docutils literal notranslate"><span class="pre">NvRules.require_rules(handle,</span> <span class="pre">[...])</span></code> to declare the list of other rules that must be available before applying this rule. Those are the only rules that can be safely proposed by the <em>Controller</em> interface.</p>
</li>
</ul>
</section>
<section id="rule-examples">
<h3><span class="section-number">1.3.6. </span>Rule Examples<a class="headerlink" href="#rule-examples" title="Permalink to this headline"></a></h3>
<p>The following example rule determines on which major GPU architecture a kernel was running.</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">NvRules</span>

<span class="k">def</span> <span class="nf">get_identifier</span><span class="p">():</span>
  <span class="k">return</span> <span class="s2">&quot;GpuArch&quot;</span>

<span class="k">def</span> <span class="nf">apply</span><span class="p">(</span><span class="n">handle</span><span class="p">):</span>
  <span class="n">ctx</span> <span class="o">=</span> <span class="n">NvRules</span><span class="o">.</span><span class="n">get_context</span><span class="p">(</span><span class="n">handle</span><span class="p">)</span>
  <span class="n">action</span> <span class="o">=</span> <span class="n">ctx</span><span class="o">.</span><span class="n">range_by_idx</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span><span class="o">.</span><span class="n">action_by_idx</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>
  <span class="n">ccMajor</span> <span class="o">=</span> <span class="n">action</span><span class="o">.</span><span class="n">metric_by_name</span><span class="p">(</span><span class="s2">&quot;device__attribute_compute_capability_major&quot;</span><span class="p">)</span><span class="o">.</span><span class="n">as_uint64</span><span class="p">()</span>
  <span class="n">ctx</span><span class="o">.</span><span class="n">frontend</span><span class="p">()</span><span class="o">.</span><span class="n">message</span><span class="p">(</span><span class="s2">&quot;Running on major compute capability &quot;</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">ccMajor</span><span class="p">))</span>
</pre></div>
</div>
</section>
</section>
<section id="python-report-interface">
<h2><span class="section-number">1.4. </span>Python Report Interface<a class="headerlink" href="#python-report-interface" title="Permalink to this headline"></a></h2>
<p>NVIDIA Nsight Compute features a Python-based interface to interact with exported report files.</p>
<p>The module is called <code class="docutils literal notranslate"><span class="pre">ncu_report</span></code> and works on any Python version from 3.4 <a class="footnote-reference brackets" href="#fn1" id="id1">1</a>. It can be found in the <code class="docutils literal notranslate"><span class="pre">extras/python</span></code> directory of your NVIDIA Nsight Compute package.</p>
<p>In order to use the Python module, you need a report file generated by NVIDIA Nsight Compute. You can obtain such a file by saving it from the graphical interface or by using the <code class="docutils literal notranslate"><span class="pre">--export</span></code> flag of the command line tool.</p>
<p>The types and functions in the <code class="docutils literal notranslate"><span class="pre">ncu_report</span></code> module are a subset of the ones available in the NvRules API. The documentation in this section serves as a tutorial. For a more formal description of the exposed API, please refer to the the <a class="reference external" href="../NvRulesAPI/index.html#abstract">NvRules API</a> documentation.</p>
<dl class="footnote brackets">
<dt class="label" id="fn1"><span class="brackets"><a class="fn-backref" href="#id1">1</a></span></dt>
<dd><p>On Linux machines you will also need a GNU-compatible libc and <code class="docutils literal notranslate"><span class="pre">libgcc_s.so</span></code>.</p>
</dd>
</dl>
<section id="basic-usage">
<h3><span class="section-number">1.4.1. </span>Basic Usage<a class="headerlink" href="#basic-usage" title="Permalink to this headline"></a></h3>
<p>In order to be able to import <code class="docutils literal notranslate"><span class="pre">ncu_report</span></code> you will either have to navigate to the <code class="docutils literal notranslate"><span class="pre">extras/python</span></code> directory, or add its absolute path to the <code class="docutils literal notranslate"><span class="pre">PYTHONPATH</span></code> environment variable. Then, the module can be imported like any Python module:</p>
<div class="highlight-text notranslate"><div class="highlight"><pre><span></span>&gt;&gt;&gt; import ncu_report
</pre></div>
</div>
<p><strong>Importing a report</strong></p>
<p>Once the module is imported, you can load a report file by calling the <code class="docutils literal notranslate"><span class="pre">load_report</span></code> function with the path to the file. This function returns an object of type <code class="docutils literal notranslate"><span class="pre">IContext</span></code> which holds all the information concerning that report.</p>
<div class="highlight-text notranslate"><div class="highlight"><pre><span></span>&gt;&gt;&gt; my_context = ncu_report.load_report(&quot;my_report.ncu-rep&quot;)
</pre></div>
</div>
<p><strong>Querying ranges</strong></p>
<p>When working with the Python module, kernel profiling results are grouped into <em>ranges</em> which are represented by <code class="docutils literal notranslate"><span class="pre">IRange</span></code> objects. You can inspect the number of <em>ranges</em> contained in the loaded report by calling the <code class="docutils literal notranslate"><span class="pre">num_ranges()</span></code> member function of an <code class="docutils literal notranslate"><span class="pre">IContext</span></code> object and retrieve a <em>range</em> by its index using <code class="docutils literal notranslate"><span class="pre">range_by_idx(index)</span></code>.</p>
<div class="highlight-text notranslate"><div class="highlight"><pre><span></span>&gt;&gt;&gt; my_context.num_ranges()
1
&gt;&gt;&gt; my_range = my_context.range_by_idx(0)
</pre></div>
</div>
<p><strong>Querying actions</strong></p>
<p>Inside a <em>range</em>, kernel profiling results are called <em>actions</em>. You can query the number of <em>actions</em> contained in a given <em>range</em> by using the <code class="docutils literal notranslate"><span class="pre">num_actions</span></code> method of an <code class="docutils literal notranslate"><span class="pre">IRange</span></code> object.</p>
<div class="highlight-text notranslate"><div class="highlight"><pre><span></span>&gt;&gt;&gt; my_range.num_actions()
2
</pre></div>
</div>
<p>In the same way <em>ranges</em> can be obtained from an <code class="docutils literal notranslate"><span class="pre">IContext</span></code> object by using the <code class="docutils literal notranslate"><span class="pre">range_by_idx(index)</span></code> method, individual <em>actions</em> can be obtained from <code class="docutils literal notranslate"><span class="pre">IRange</span></code> objects by using the <code class="docutils literal notranslate"><span class="pre">action_by_idx(index)</span></code> method. The resulting <em>actions</em> are represented by the <code class="docutils literal notranslate"><span class="pre">IAction</span></code> class.</p>
<div class="highlight-text notranslate"><div class="highlight"><pre><span></span>&gt;&gt;&gt; my_action = my_range.action_by_idx(0)
</pre></div>
</div>
<p>As mentioned previously, an <em>action</em> represents a single kernel profiling result. To query the kernel’s name you can use the <code class="docutils literal notranslate"><span class="pre">name()</span></code> member function of the <code class="docutils literal notranslate"><span class="pre">IAction</span></code> class.</p>
<div class="highlight-text notranslate"><div class="highlight"><pre><span></span>&gt;&gt;&gt; my_action.name()
MyKernel
</pre></div>
</div>
<p><strong>Querying metrics</strong></p>
<p>To get a tuple of all metric names contained within an <em>action</em> you can use the <code class="docutils literal notranslate"><span class="pre">metric_names()</span></code> method. It is meant to be combined with the <code class="docutils literal notranslate"><span class="pre">metric_by_name()</span></code> method which returns an <code class="docutils literal notranslate"><span class="pre">IMetric</span></code> object. However, for the same task you may also use the <code class="docutils literal notranslate"><span class="pre">[]</span></code> operator, as explained in the <a class="reference external" href="index.html#python-report-interface-high-level">High-Level Interface</a> section below.</p>
<p>The metric names displayed here are the same as the ones you can use with the <code class="docutils literal notranslate"><span class="pre">--metrics</span></code> flag of NVIDIA Nsight Compute. Once you have extracted a <em>metric</em> from an <em>action</em>, you can obtain its value by using one of the following three methods:</p>
<ul class="simple">
<li><p><code class="docutils literal notranslate"><span class="pre">as_string()</span></code> to obtain its value as a Python <code class="docutils literal notranslate"><span class="pre">str</span></code></p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">as_uint64()</span></code> to obtain its value as a Python <code class="docutils literal notranslate"><span class="pre">int</span></code></p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">as_double()</span></code> to obtain its value as a Python <code class="docutils literal notranslate"><span class="pre">float</span></code></p></li>
</ul>
<p>For example, to print the display name of the GPU on which the kernel was profiled you can query the <code class="docutils literal notranslate"><span class="pre">device__attribute_display_name</span></code> metric.</p>
<div class="highlight-text notranslate"><div class="highlight"><pre><span></span>&gt;&gt;&gt; display_name_metric = my_action.metric_by_name(&#39;device__attribute_display_name&#39;)
&gt;&gt;&gt; display_name_metric.as_string()
&#39;NVIDIA GeForce RTX 3060 Ti&#39;
</pre></div>
</div>
<p>Note that accessing a metric with the wrong type can lead to unexpected (conversion) results.</p>
<div class="highlight-text notranslate"><div class="highlight"><pre><span></span>&gt;&gt;&gt; display_name_metric.as_double()
0.0
</pre></div>
</div>
<p>Therefore, it is advisable to directly use the <a class="reference external" href="index.html#python-report-interface-high-level">High-Level</a> function <code class="docutils literal notranslate"><span class="pre">value()</span></code>, as explained below.</p>
</section>
<section id="high-level-interface">
<h3><span class="section-number">1.4.2. </span>High-Level Interface<a class="headerlink" href="#high-level-interface" title="Permalink to this headline"></a></h3>
<p>On top of the low-level <a class="reference external" href="../NvRulesAPI/index.html#abstract">NvRules API</a> the Python Report Interface also implements part of the <a class="reference external" href="https://docs.python.org/3/reference/datamodel.html">Python object model</a>. By implementing special methods, the Python Report Interface’s exposed classes can be used with built-in Python mechanisms such as iteration, string formatting and length querying.</p>
<p>This allows you to access <em>metrics</em> objects via the <code class="docutils literal notranslate"><span class="pre">self[key]</span></code> instance method of the <code class="docutils literal notranslate"><span class="pre">IAction</span></code> class:</p>
<div class="highlight-text notranslate"><div class="highlight"><pre><span></span>&gt;&gt;&gt; display_name_metric = my_action[&quot;device__attribute_display_name&quot;]
</pre></div>
</div>
<p>There is also a convenience method <code class="docutils literal notranslate"><span class="pre">IMetric.value()</span></code> which allows you to query the value of a <em>metric</em> object without knowledge of its type:</p>
<div class="highlight-text notranslate"><div class="highlight"><pre><span></span>&gt;&gt;&gt; display_name_metric.value()
&#39;NVIDIA GeForce RTX 3060 Ti&#39;
</pre></div>
</div>
<p>All the available methods of a class, as well as their associated Python docstrings, can be looked up interactively via</p>
<div class="highlight-text notranslate"><div class="highlight"><pre><span></span>&gt;&gt;&gt; help(ncu_report.IMetric)
</pre></div>
</div>
<p>or similarly for other classes and methods. In your code, you can access the docstrings via the <code class="docutils literal notranslate"><span class="pre">__doc__</span></code> attribute, i.e. <code class="docutils literal notranslate"><span class="pre">ncu_report.IMetric.value.__doc__</span></code>.</p>
</section>
<section id="metric-attributes">
<h3><span class="section-number">1.4.3. </span>Metric attributes<a class="headerlink" href="#metric-attributes" title="Permalink to this headline"></a></h3>
<p>Apart from the possibility to query the <code class="docutils literal notranslate"><span class="pre">name()</span></code> and <code class="docutils literal notranslate"><span class="pre">value()</span></code> of an <code class="docutils literal notranslate"><span class="pre">IMetric</span></code> object, you can also query the following additional metric attributes:</p>
<ul class="simple">
<li><p><code class="docutils literal notranslate"><span class="pre">metric_type()</span></code></p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">metric_subtype()</span></code></p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">rollup_operation()</span></code></p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">unit()</span></code></p></li>
<li><p><code class="docutils literal notranslate"><span class="pre">description()</span></code></p></li>
</ul>
<p>The first method <code class="docutils literal notranslate"><span class="pre">metric_type()</span></code> returns one out of three <em>enum</em> values (<code class="docutils literal notranslate"><span class="pre">IMetric.MetricType_COUNTER</span></code>, <code class="docutils literal notranslate"><span class="pre">IMetric.MetricType_RATIO</span></code>, <code class="docutils literal notranslate"><span class="pre">IMetric.MetricType_THROUGHPUT</span></code>) if the metric is a hardware metric, or <code class="docutils literal notranslate"><span class="pre">IMetric.MetricType_OTHER</span></code> otherwise (e.g. for launch or device attributes).</p>
<p>The method <code class="docutils literal notranslate"><span class="pre">metric_subtype()</span></code> returns an <em>enum</em> value representing the subtype of a metric (e.g. <code class="docutils literal notranslate"><span class="pre">IMetric.MetricSubtype_PEAK_SUSTAINED</span></code>, <code class="docutils literal notranslate"><span class="pre">IMetric.MetricSubtype_PER_CYCLE_ACTIVE</span></code>). In case a metric does not have a subtype, <code class="docutils literal notranslate"><span class="pre">None</span></code> is returned. All available values (without the necessary <code class="docutils literal notranslate"><span class="pre">IMetric.MetricSubtype_</span></code> prefix) may be found in the <a class="reference external" href="../NvRulesAPI/index.html#abstract">NvRules API</a> documentation, or may be looked up interactively by executing <code class="docutils literal notranslate"><span class="pre">help(ncu_report.IMetric)</span></code>.</p>
<p><code class="docutils literal notranslate"><span class="pre">IMetric.rollup_operation()</span></code> returns the operation which is used to accumulate different values of the same <em>metric</em> and can be one of <code class="docutils literal notranslate"><span class="pre">IMetric.RollupOperation_AVG</span></code>, <code class="docutils literal notranslate"><span class="pre">IMetric.RollupOperation_MAX</span></code>, <code class="docutils literal notranslate"><span class="pre">IMetric.RollupOperation_MIN</span></code> or <code class="docutils literal notranslate"><span class="pre">IMetric.RollupOperation_SUM</span></code> for averaging, maximum, minimum or summation, respectively. If the <em>metric</em> in question does not specify a rollup operation <code class="docutils literal notranslate"><span class="pre">None</span></code> will be returned.</p>
<p>Lastly, <code class="docutils literal notranslate"><span class="pre">unit()</span></code> and <code class="docutils literal notranslate"><span class="pre">description()</span></code> return a (possibly empty) string of the metric’s <em>unit</em> and a short textual <em>description</em> for hardware metrics, respectively.</p>
<p>The above methods can be combined to filter through all <em>metrics</em> of a report, given certain criteria:</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="k">for</span> <span class="n">metric</span> <span class="ow">in</span> <span class="n">metrics</span><span class="p">:</span>
    <span class="k">if</span> <span class="n">metric</span><span class="o">.</span><span class="n">metric_type</span><span class="p">()</span> <span class="o">==</span> <span class="n">IMetric</span><span class="o">.</span><span class="n">MetricType_COUNTER</span> <span class="ow">and</span> \
       <span class="n">metric</span><span class="o">.</span><span class="n">metric_subtype</span><span class="p">()</span> <span class="o">==</span> <span class="n">IMetric</span><span class="o">.</span><span class="n">MetricSubtype_PER_SECOND</span> <span class="ow">and</span> \
       <span class="n">metric</span><span class="o">.</span><span class="n">rollup_operation</span><span class="p">()</span> <span class="o">==</span> <span class="n">IMetric</span><span class="o">.</span><span class="n">RollupOperation_AVG</span><span class="p">:</span>
        <span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">&quot;</span><span class="si">{</span><span class="n">metric</span><span class="o">.</span><span class="n">name</span><span class="p">()</span><span class="si">}</span><span class="s2">: </span><span class="si">{</span><span class="n">metric</span><span class="o">.</span><span class="n">value</span><span class="p">()</span><span class="si">}</span><span class="s2"> </span><span class="si">{</span><span class="n">metric</span><span class="o">.</span><span class="n">unit</span><span class="p">()</span><span class="si">}</span><span class="s2">&quot;</span><span class="p">)</span>
</pre></div>
</div>
</section>
<section id="nvtx-support">
<h3><span class="section-number">1.4.4. </span>NVTX Support<a class="headerlink" href="#nvtx-support" title="Permalink to this headline"></a></h3>
<p>The <code class="docutils literal notranslate"><span class="pre">ncu_report</span></code> has support for the NVIDIA Tools Extension (NVTX). This comes through the <code class="docutils literal notranslate"><span class="pre">INvtxState</span></code> object which represents the NVTX state of a profiled kernel.</p>
<p>An <code class="docutils literal notranslate"><span class="pre">INvtxState</span></code> object can be obtained from an action by using its <code class="docutils literal notranslate"><span class="pre">nvtx_state()</span></code> method. It exposes the <code class="docutils literal notranslate"><span class="pre">domains()</span></code> method which returns a tuple of integers representing the domains this kernel has state in. These integers can be used with the <code class="docutils literal notranslate"><span class="pre">domain_by_id(id)</span></code> method to get an <code class="docutils literal notranslate"><span class="pre">INvtxDomainInfo</span></code> object which represents the state of a domain.</p>
<p>The <code class="docutils literal notranslate"><span class="pre">INvtxDomainInfo</span></code> can be used to obtain a tuple of <em>Push-Pop</em>, or <em>Start-End</em> ranges using the <code class="docutils literal notranslate"><span class="pre">push_pop_ranges()</span></code> and <code class="docutils literal notranslate"><span class="pre">start_end_ranges()</span></code> methods.</p>
<p>There is also a <code class="docutils literal notranslate"><span class="pre">actions_by_nvtx</span></code> member function in the <code class="docutils literal notranslate"><span class="pre">IRange</span></code> class which allows you to get a tuple of actions matching the NVTX state described in its parameter.</p>
<p>The parameters for the <code class="docutils literal notranslate"><span class="pre">actions_by_nvtx</span></code> function are two lists of strings representing the state for which we want to query the actions. The first parameter describes the NVTX states to include while the second one describes the NVTX states to exclude. These strings are in the same format as the ones used with the <code class="docutils literal notranslate"><span class="pre">--nvtx-include</span></code> and <code class="docutils literal notranslate"><span class="pre">--nvtx-exclude</span></code> options.</p>
</section>
<section id="sample-script">
<h3><span class="section-number">1.4.5. </span>Sample Script<a class="headerlink" href="#sample-script" title="Permalink to this headline"></a></h3>
<p><strong>NVTX Push-Pop range filtering</strong></p>
<p>This is a sample script which loads a report and prints the names of all the profiled kernels which were wrapped inside <code class="docutils literal notranslate"><span class="pre">BottomRange</span></code> and <code class="docutils literal notranslate"><span class="pre">TopRange</span></code><em>Push-Pop ranges</em> of the default NVTX domain.</p>
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="ch">#!/usr/bin/env python3</span>

<span class="kn">import</span> <span class="nn">sys</span>

<span class="kn">import</span> <span class="nn">ncu_report</span>

<span class="k">if</span> <span class="nb">len</span><span class="p">(</span><span class="n">sys</span><span class="o">.</span><span class="n">argv</span><span class="p">)</span> <span class="o">!=</span> <span class="mi">2</span><span class="p">:</span>
    <span class="nb">print</span><span class="p">(</span><span class="s2">&quot;usage: </span><span class="si">{}</span><span class="s2"> report_file&quot;</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">sys</span><span class="o">.</span><span class="n">argv</span><span class="p">[</span><span class="mi">0</span><span class="p">]),</span> <span class="n">file</span><span class="o">=</span><span class="n">sys</span><span class="o">.</span><span class="n">stderr</span><span class="p">)</span>
    <span class="n">sys</span><span class="o">.</span><span class="n">exit</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>

<span class="n">report</span> <span class="o">=</span> <span class="n">ncu_report</span><span class="o">.</span><span class="n">load_report</span><span class="p">(</span><span class="n">sys</span><span class="o">.</span><span class="n">argv</span><span class="p">[</span><span class="mi">1</span><span class="p">])</span>

<span class="k">for</span> <span class="n">range_idx</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">report</span><span class="o">.</span><span class="n">num_ranges</span><span class="p">()):</span>
    <span class="n">current_range</span> <span class="o">=</span> <span class="n">report</span><span class="o">.</span><span class="n">range_by_idx</span><span class="p">(</span><span class="n">range_idx</span><span class="p">)</span>
    <span class="k">for</span> <span class="n">action_idx</span> <span class="ow">in</span> <span class="n">current_range</span><span class="o">.</span><span class="n">actions_by_nvtx</span><span class="p">([</span><span class="s2">&quot;BottomRange/*/TopRange&quot;</span><span class="p">],</span> <span class="p">[]):</span>
        <span class="n">action</span> <span class="o">=</span> <span class="n">current_range</span><span class="o">.</span><span class="n">action_by_idx</span><span class="p">(</span><span class="n">action_idx</span><span class="p">)</span>
        <span class="nb">print</span><span class="p">(</span><span class="n">action</span><span class="o">.</span><span class="n">name</span><span class="p">())</span>
</pre></div>
</div>
</section>
</section>
<section id="source-counters">
<h2><span class="section-number">1.5. </span>Source Counters<a class="headerlink" href="#source-counters" title="Permalink to this headline"></a></h2>
<p>The <em>Source</em> page provides correlation of various metrics with CUDA-C, PTX and SASS source of the application, depending on availability.</p>
<p>Which <em>Source Counter</em> metrics are collected and the order in which they are displayed in this page is controlled using section files, specifically using the <em>ProfilerSectionMetrics</em> message type. Each <em>ProfilerSectionMetrics</em> defines one ordered group of metrics, and can be assigned an optional <em>Order</em> value. This value defines the ordering among those groups in the <em>Source</em> page. This allows, for example, you to define a group of memory-related source counters in one and a group of instruction-related counters in another section file.</p>
<div class="highlight-text notranslate"><div class="highlight"><pre><span></span>Identifier: &quot;SourceMetrics&quot;
DisplayName: &quot;Custom Source Metrics&quot;
Metrics {
  Order: 2
  Metrics {
    Label: &quot;Instructions Executed&quot;
    Name: &quot;inst_executed&quot;
  }
  Metrics {
    Label: &quot;&quot;
    Name: &quot;collected_but_not_shown&quot;
  }
}
</pre></div>
</div>
<p>If a <em>Source Counter</em> metric is given an empty label attribute in the section file, it will be collected but not shown on the page.</p>
<img alt="../_images/source-counters.png" class="align-center" src="../_images/source-counters.png" />
</section>
<section id="report-file-format">
<h2><span class="section-number">1.6. </span>Report File Format<a class="headerlink" href="#report-file-format" title="Permalink to this headline"></a></h2>
<p>This section documents the internals of the profiler report files (reports in the following) as created by NVIDIA Nsight Compute. <strong>The file format is subject to change in future releases without prior notice.</strong></p>
<section id="version-7-format">
<h3><span class="section-number">1.6.1. </span>Version 7 Format<a class="headerlink" href="#version-7-format" title="Permalink to this headline"></a></h3>
<p>Reports of version 7 are a combination of raw binary data and serialized Google Protocol Buffer version 2 messages (proto). All binary entries are stored as little endian. Protocol buffer definitions are in the NVIDIA Nsight Compute installation directory under <code class="docutils literal notranslate"><span class="pre">extras/FileFormat</span></code>.</p>
<table class="table-no-stripes docutils align-default" id="id3">
<caption><span class="caption-text">Table 1. Top-level report file format</span><a class="headerlink" href="#id3" title="Permalink to this table"></a></caption>
<colgroup>
<col style="width: 35%" />
<col style="width: 11%" />
<col style="width: 7%" />
<col style="width: 47%" />
</colgroup>
<thead>
<tr class="row-odd"><th class="head"><p>Offset [bytes]</p></th>
<th class="head"><p>Entry</p></th>
<th class="head"><p>Type</p></th>
<th class="head"><p>Value</p></th>
</tr>
</thead>
<tbody>
<tr class="row-even"><td><p>0</p></td>
<td><p>Magic Number</p></td>
<td><p>Binary</p></td>
<td><p>NVR\0</p></td>
</tr>
<tr class="row-odd"><td><p>4</p></td>
<td><p>Integer</p></td>
<td><p>Binary</p></td>
<td><p>sizeof(File Header)</p></td>
</tr>
<tr class="row-even"><td><p>8</p></td>
<td><p>File Header</p></td>
<td><p>Proto</p></td>
<td><p>Report version</p></td>
</tr>
<tr class="row-odd"><td><p>8 + sizeof(File Header)</p></td>
<td><p>Block 0</p></td>
<td><p>Mixed</p></td>
<td><p>CUDA CUBIN source, profile results, session information</p></td>
</tr>
<tr class="row-even"><td><p>8 + sizeof(File Header) + sizeof(Block 0)</p></td>
<td><p>Block 1</p></td>
<td><p>Mixed</p></td>
<td><p>CUDA CUBIN source, profile results, session information</p></td>
</tr>
<tr class="row-odd"><td><p>…</p></td>
<td><p>…</p></td>
<td><p>…</p></td>
<td><p>…</p></td>
</tr>
</tbody>
</table>
<table class="table-no-stripes docutils align-default" id="id4">
<caption><span class="caption-text">Table 2. Per-Block report file format</span><a class="headerlink" href="#id4" title="Permalink to this table"></a></caption>
<colgroup>
<col style="width: 20%" />
<col style="width: 11%" />
<col style="width: 6%" />
<col style="width: 63%" />
</colgroup>
<thead>
<tr class="row-odd"><th class="head"><p>Offset [bytes]</p></th>
<th class="head"><p>Entry</p></th>
<th class="head"><p>Type</p></th>
<th class="head"><p>Value</p></th>
</tr>
</thead>
<tbody>
<tr class="row-even"><td><p>0</p></td>
<td><p>Integer</p></td>
<td><p>Binary</p></td>
<td><p>sizeof(Block Header)</p></td>
</tr>
<tr class="row-odd"><td><p>4</p></td>
<td><p>Block Header</p></td>
<td><p>Proto</p></td>
<td><p>Number of entries per payload type, payload size</p></td>
</tr>
<tr class="row-even"><td><p>4 + sizeof(Block Header)</p></td>
<td><p>Block Payload</p></td>
<td><p>Mixed</p></td>
<td><p>Payload (CUDA CUBIN sources, profile results, session information, string table)</p></td>
</tr>
</tbody>
</table>
<table class="table-no-stripes docutils align-default" id="id5">
<caption><span class="caption-text">Table 3. Block payload report file format</span><a class="headerlink" href="#id5" title="Permalink to this table"></a></caption>
<colgroup>
<col style="width: 36%" />
<col style="width: 24%" />
<col style="width: 8%" />
<col style="width: 32%" />
</colgroup>
<thead>
<tr class="row-odd"><th class="head"><p>Offset [bytes]</p></th>
<th class="head"><p>Entry</p></th>
<th class="head"><p>Type</p></th>
<th class="head"><p>Value</p></th>
</tr>
</thead>
<tbody>
<tr class="row-even"><td><p>0</p></td>
<td><p>Integer</p></td>
<td><p>Binary</p></td>
<td><p>sizeof(Payload type 1, entry 1)</p></td>
</tr>
<tr class="row-odd"><td><p>4</p></td>
<td><p>Payload type 1, entry 1</p></td>
<td><p>Proto</p></td>
<td></td>
</tr>
<tr class="row-even"><td><p>4 + sizeof(Payload type 1, entry 1)</p></td>
<td><p>Integer</p></td>
<td><p>Binary</p></td>
<td><p>sizeof(Payload type 1, entry 2)</p></td>
</tr>
<tr class="row-odd"><td><p>8 + sizeof(Payload type 1, entry 1)</p></td>
<td><p>Payload type 1, entry 2</p></td>
<td><p>Proto</p></td>
<td></td>
</tr>
<tr class="row-even"><td><p>…</p></td>
<td><p>…</p></td>
<td><p>…</p></td>
<td><p>…</p></td>
</tr>
<tr class="row-odd"><td><p>…</p></td>
<td><p>Integer</p></td>
<td><p>Binary</p></td>
<td><p>sizeof(Payload type 2, entry 1)</p></td>
</tr>
<tr class="row-even"><td><p>…</p></td>
<td><p>Payload type 2, entry 1</p></td>
<td><p>Proto</p></td>
<td></td>
</tr>
<tr class="row-odd"><td><p>…</p></td>
<td><p>…</p></td>
<td><p>…</p></td>
<td><p>…</p></td>
</tr>
</tbody>
</table>
<p class="rubric-h1 rubric">Notices</p>
<p class="rubric-h2 rubric">Notices</p>
<p>ALL NVIDIA DESIGN SPECIFICATIONS, REFERENCE BOARDS, FILES, DRAWINGS, DIAGNOSTICS, LISTS, AND OTHER DOCUMENTS (TOGETHER AND SEPARATELY, “MATERIALS”) ARE BEING PROVIDED “AS IS.” NVIDIA MAKES NO WARRANTIES, EXPRESSED, IMPLIED, STATUTORY, OR OTHERWISE WITH RESPECT TO THE MATERIALS, AND EXPRESSLY DISCLAIMS ALL IMPLIED WARRANTIES OF NONINFRINGEMENT, MERCHANTABILITY, AND FITNESS FOR A PARTICULAR PURPOSE.</p>
<p>Information furnished is believed to be accurate and reliable. However, NVIDIA Corporation assumes no responsibility for the consequences of use of such information or for any infringement of patents or other rights of third parties that may result from its use. No license is granted by implication of otherwise under any patent rights of NVIDIA Corporation. Specifications mentioned in this publication are subject to change without notice. This publication supersedes and replaces all other information previously supplied. NVIDIA Corporation products are not authorized as critical components in life support devices or systems without express written approval of NVIDIA Corporation.</p>
<p class="rubric-h2 rubric">Trademarks</p>
<p>NVIDIA and the NVIDIA logo are trademarks or registered trademarks of NVIDIA Corporation in the U.S. and other countries. Other company and product names may be trademarks of the respective companies with which they are associated.</p>
</section>
</section>
</section>


           </div>
          </div>
          <footer>

  <hr/>

  <div role="contentinfo">
    <p>&#169; Copyright 2018-2024, NVIDIA Corporation &amp; Affiliates. All rights reserved.
      <span class="lastupdated">Last updated on Mar 06, 2024.
      </span></p>
  </div>

   

</footer>
        </div>
      </div>
    </section>
  </div>
  <script>
      jQuery(function () {
          SphinxRtdTheme.Navigation.enable(true);
      });
  </script>
 



</body>
</html>