Updates in 2020.1
General
- Added support for the NVIDIA GA100/SM 8.x GPU architecture 
- Removed support for the Pascal SM 6.x GPU architecture 
- Windows 7 is not a supported host or target platform anymore 
- Added a rule for reporting uncoalesced memory accesses as part of the Source Counters section 
- Added support for report name placeholders %p, %q, %i and %h 
- The Kernel Profiling Guide was added to the documentation 
NVIDIA Nsight Compute
- The UI command was renamed from - nv-nsight-cuto- ncu-ui. Old names remain for backwards compatibility.
- Added support for roofline analysis charts 
- Added linked hot spot tables in section bodies to indicate performance problems in the source code 
- Added section navigation links in rule results to quickly jump to the referenced section 
- Added a new option to select how kernel names are shown in the UI 
- Added new memory tables for the L1/TEX cache and the L2 cache. The old tables are still available for backwards compatibility and moved to a new section containing deprecated UI elements. 
- Memory tables now show the metric name as a tooltip 
- Source resolution now takes into account file properties when selecting a file from disk 
- Results in the profile report can now be filtered by NVTX range 
- The Source page now supports collapsing views even for single files 
- The UI shows profiler error messages as dismissible banners for increased visibility 
- Improved the baseline name control in the profiler report header 
NVIDIA Nsight Compute CLI
- The CLI command was renamed from - nv-nsight-cu-clito- ncu. Old names remain for backwards compatibility.
- Queried metrics on GV100 and newer chips are sorted alphabetically 
- Multiple instances of NVIDIA Nsight Compute CLI can now run concurrently on the same system, e.g. for profiling individual MPI ranks. Profiled kernels are serialized across all processes using a system-wide file lock. 
Resolved Issues
- More C++ kernel names can be properly demangled 
- Fixed a - free(): invalid pointererror when profiling applications using pytorch > 19.07
- Fixed profiling IBM Spectrum MPI applications that require PAMI GPU hooks ( - --smpiargs="-gpu")
- Fixed that the first kernel instruction was missed when computing - sass__inst_executed_per_opcode
- Reduced surplus DRAM write traffic created from flushing caches during kernel replay 
- The Compute Workload Analysis section shows the IMMA pipeline on GV11b GPUs 
- Profile reports now scroll properly on MacOS when using a trackpad 
- Relative output filenames for the Profile activity now use the document directory, instead of the current working directory 
- Fixed path expansion of - ~on Windows
- Memory access information is now shown properly for RED assembly instructions on the Source page 
- Fixed that user - PYTHONHOMEand- PYTHONPATHenvironment variables would be picked up by NVIDIA Nsight Compute, resulting in locale encoding issues.