2019-02-17 Vince Weaver * src/ctests/attach_cpu_sys_validate.c, src/ctests/attach_cpu_validate.c: ctests: attach_cpu_*validate: fix buffer overrun embarassing bug, on systems with more than 16 cores was running off the end of a buffer. Oddly this did not fail on my debian system with 32 cores. 2019-02-15 Anthony Castaldo * src/validation_tests/papi_tot_cyc.c: Added a 'priming' call of the naive matrix multiply, before counting the cycles on the second call. This is to overcome any first-time system overhead in the procedure, like loading the program, so the cycles will better match the 3rd and 4th calls. This corrects an error (and failure of the test) in which the first call to the routine takes over 10% more cycles to complete than the subsequent calls. * src/testlib/test_utils.c: Added new message to test_fail, so users will not think a test failure means their PAPI install is unusable, or that the failure should be reported to the PAPI development team. * src/run_tests_exclude.txt: Added new memleak_check.c in validation_tests; it is not a standalone test, but a utility to be run by valgrind when checking for memory leaks. * release_procedure.txt: Release guidance improved with more details on testing. 2019-02-11 Frank Winkler * src/ctests/zero.c, src/validation_tests/cycles_validation.c: Fixed warnings detected by clang. Replaced abs with llabs. * src/components/coretemp_freebsd/coretemp_freebsd.c: Fixed unused warnings. 2019-02-08 Anthony Castaldo * src/components/perf_event/pe_libpfm4_events.c: revert change that repaired a memory leak; it caused a problem with ARM systems, per Vince Weaver. 2019-02-08 Vince Weaver * src/linux-common.c: arm64: update the ARM family configuration to work with newer Linux kernels. As of Linux-3.19 the "CPU architecture" field in /proc/cpuinfo changed from "AArch64" to "8". Update the code so it properly falls back in this situation. Reported-by: Al Grant 2019-02-08 Frank Winkler * src/components/nvml/tests/Makefile: Fixed linking error. * src/components/nvml/linux-nvml.c: Fixed "this statement may fall through" warning. * src/components/pcp/tests/testPCP.c: Fixed sprintf warnings by replacing them with the safer method snprintf. * src/components/coretemp/linux-coretemp.c: Fixed warning: '%s' directive output may be truncated. Newer versions of gcc are more strict with regards to return values of snprintf(), so check the values. 2019-02-07 Anthony Castaldo * src/papi_internal.c: After PAPI_shutdown free(_papi_native_events); must reset variables and counts to ensure next PAPI_library_init() from same code will realloc() and rebuild the table. * src/libpfm4/Makefile, src/libpfm4/config.mk, src/libpfm4/examples/Makefile, src/libpfm4/lib/pfmlib_perf_event.c, src/libpfm4/perf_examples/Makefile: improve top makefile by separating targets This patch improves the Makefile structure by, at the top level, separating the various targets: all, install-lib, install-examples. The patch keeps install_examples as a backward compatible target. The patch also makes it possible to override PREFIX from cmdline. 2019-02-06 Anthony Castaldo * src/validation_tests/Makefile.recipies, src/validation_tests/memleak_check.c: memleak_check is a new simple test file to help expose memory leaks in PAPI main and components. * src/papi_internal.c, src/papi_preset.c: Patches provided by Jiali Li. Added cleanup code to prevent memory leaks. * src/libpfm4/lib/pfmlib_perf_event.c: Patches provided by Jiali Li. Added cleanup code to prevent memory leaks. * src/components/stealtime/linux-stealtime.c: Patches provided by Jiali Li. Added cleanup code to prevent memory leaks. * src/components/powercap/linux-powercap.c: Added cleanup code to prevent some of the memory leaks. Dynamic-library related calls leak, but we cannot prevent all of those leaks. * src/components/perf_event_uncore/perf_event_uncore.c: Patches provided by Jiali Li. Added cleanup code to prevent memory leaks. * src/components/perf_event/pe_libpfm4_events.c: Patches provided by Jiali Li. Added cleanup code to prevent memory leaks. * src/components/nvml/linux-nvml.c: Added cleanup code to prevent some of the memory leaks. Dynamic-library related calls leak, but we cannot prevent all of those leaks. * src/components/lustre/linux-lustre.c: Added cleanup code to prevent memory leaks. * src/components/lmsensors/linux-lmsensors.c: Added cleanup code to prevent some of the memory leaks. Dynamic-library related calls leak, but we cannot prevent all of those leaks. * src/components/infiniband_umad/linux-infiniband_umad.c: Added cleanup code to prevent some of the memory leaks. Dynamic-library related calls leak, but we cannot prevent all of those leaks. 2019-02-06 Heike Jagode * src/components/cuda/tests/cudaTest_cupti_only.cu, src/components/cuda/tests/likeComp_cupti_only.cu, src/components/cuda/tests/simpleMultiGPU.cu, src/components/cuda/tests/simpleMultiGPU.h, src/components/cuda/tests/timer.h: Added more details to the license statement for cuda tests. 2019-02-05 Anthony Castaldo * src/components/nvml/README: Expanded description of NVML component and usage. * src/components/README: Expanded notes to guide releases. 2019-02-01 Anthony Castaldo * src/components/infiniband_umad/linux-infiniband_umad.c: Corrected name of component to distinguish this one from the 'infiniband' component. * src/components/nvml/README, src/components/nvml/linux-nvml.c: Had a problem with undefined variables; added notes to README. 2019-02-01 Frank Winkler * src/components/perf_event/perf_event.c: Suppress "unused" warnings. * src/components/perf_event/perf_event.c: Get rid of "use of uninitialized variable" warnings. * src/components/infiniband_umad/README, src/components/lmsensors/README: Added component instructions. * src/components/cuda/Rules.cuda: Commented out target "native_clean" since it is used twice. * src/components/lmsensors/README: Added build instructions for component lmsensors. * src/components/lmsensors/linux-lmsensors.c: Suppress "unused variables" warnings. 2019-01-31 Anthony Castaldo * man/man1/PAPI_derived_event_files.1, man/man1/papi_avail.1, man/man1/papi_clockres.1, man/man1/papi_command_line.1, man/man1/papi_component_avail.1, man/man1/papi_cost.1, man/man1/papi_decode.1, man/man1/papi_error_codes.1, man/man1/papi_event_chooser.1, man/man1/papi_hybrid_native_avail.1, man/man1/papi_mem_info.1, man/man1/papi_multiplex_cost.1, man/man1/papi_native_avail.1, man/man1/papi_version.1, man/man1/papi_xml_event_info.1, man/man3/PAPIF_accum.3, man/man3/PAPIF_accum_counters.3, man/man3/PAPIF_add_event.3, man/man3/PAPIF_add_events.3, man/man3/PAPIF_add_named_event.3, man/man3/PAPIF_assign_eventset_component.3, man/man3/PAPIF_cleanup_eventset.3, man/man3/PAPIF_create_eventset.3, man/man3/PAPIF_destroy_eventset.3, man/man3/PAPIF_enum_event.3, man/man3/PAPIF_epc.3, man/man3/PAPIF_event_code_to_name.3, man/man3/PAPIF_event_name_to_code.3, man/man3/PAPIF_flips.3, man/man3/PAPIF_flops.3, man/man3/PAPIF_get_clockrate.3, man/man3/PAPIF_get_dmem_info.3, man/man3/PAPIF_get_domain.3, man/man3/PAPIF_get_event_info.3, man/man3/PAPIF_get_exe_info.3, man/man3/PAPIF_get_granularity.3, man/man3/PAPIF_get_hardware_info.3, man/man3/PAPIF_get_multiplex.3, man/man3/PAPIF_get_preload.3, man/man3/PAPIF_get_real_cyc.3, man/man3/PAPIF_get_real_nsec.3, man/man3/PAPIF_get_real_usec.3, man/man3/PAPIF_get_virt_cyc.3, man/man3/PAPIF_get_virt_usec.3, man/man3/PAPIF_ipc.3, man/man3/PAPIF_is_initialized.3, man/man3/PAPIF_library_init.3, man/man3/PAPIF_lock.3, man/man3/PAPIF_multiplex_init.3, man/man3/PAPIF_num_cmp_hwctrs.3, man/man3/PAPIF_num_counters.3, man/man3/PAPIF_num_events.3, man/man3/PAPIF_num_hwctrs.3, man/man3/PAPIF_perror.3, man/man3/PAPIF_query_event.3, man/man3/PAPIF_query_named_event.3, man/man3/PAPIF_read.3, man/man3/PAPIF_read_ts.3, man/man3/PAPIF_register_thread.3, man/man3/PAPIF_remove_event.3, man/man3/PAPIF_remove_events.3, man/man3/PAPIF_remove_named_event.3, man/man3/PAPIF_reset.3, man/man3/PAPIF_set_cmp_domain.3, man/man3/PAPIF_set_cmp_granularity.3, man/man3/PAPIF_set_debug.3, man/man3/PAPIF_set_domain.3, man/man3/PAPIF_set_event_domain.3, man/man3/PAPIF_set_granularity.3, man/man3/PAPIF_set_inherit.3, man/man3/PAPIF_set_multiplex.3, man/man3/PAPIF_shutdown.3, man/man3/PAPIF_start.3, man/man3/PAPIF_start_counters.3, man/man3/PAPIF_state.3, man/man3/PAPIF_stop.3, man/man3/PAPIF_stop_counters.3, man/man3/PAPIF_thread_id.3, man/man3/PAPIF_thread_init.3, man/man3/PAPIF_unlock.3, man/man3/PAPIF_unregister_thread.3, man/man3/PAPIF_write.3, man/man3/PAPI_accum.3, man/man3/PAPI_accum_counters.3, man/man3/PAPI_add_event.3, man/man3/PAPI_add_events.3, man/man3/PAPI_add_named_event.3, man/man3/PAPI_addr_range_option_t.3, man/man3/PAPI_address_map_t.3, man/man3/PAPI_all_thr_spec_t.3, man/man3/PAPI_assign_eventset_component.3, man/man3/PAPI_attach.3, man/man3/PAPI_attach_option_t.3, man/man3/PAPI_cleanup_eventset.3, man/man3/PAPI_component_info_t.3, man/man3/PAPI_cpu_option_t.3, man/man3/PAPI_create_eventset.3, man/man3/PAPI_debug_option_t.3, man/man3/PAPI_destroy_eventset.3, man/man3/PAPI_detach.3, man/man3/PAPI_disable_component.3, man/man3/PAPI_disable_component_by_name.3, man/man3/PAPI_dmem_info_t.3, man/man3/PAPI_domain_option_t.3, man/man3/PAPI_enum_cmp_event.3, man/man3/PAPI_enum_event.3, man/man3/PAPI_epc.3, man/man3/PAPI_event_code_to_name.3, man/man3/PAPI_event_info_t.3, man/man3/PAPI_event_name_to_code.3, man/man3/PAPI_exe_info_t.3, man/man3/PAPI_flips.3, man/man3/PAPI_flops.3, man/man3/PAPI_get_cmp_opt.3, man/man3/PAPI_get_component_index.3, man/man3/PAPI_get_component_info.3, man/man3/PAPI_get_dmem_info.3, man/man3/PAPI_get_event_component.3, man/man3/PAPI_get_event_info.3, man/man3/PAPI_get_eventset_component.3, man/man3/PAPI_get_executable_info.3, man/man3/PAPI_get_hardware_info.3, man/man3/PAPI_get_multiplex.3, man/man3/PAPI_get_opt.3, man/man3/PAPI_get_overflow_event_index.3, man/man3/PAPI_get_real_cyc.3, man/man3/PAPI_get_real_nsec.3, man/man3/PAPI_get_real_usec.3, man/man3/PAPI_get_shared_lib_info.3, man/man3/PAPI_get_thr_specific.3, man/man3/PAPI_get_virt_cyc.3, man/man3/PAPI_get_virt_nsec.3, man/man3/PAPI_get_virt_usec.3, man/man3/PAPI_granularity_option_t.3, man/man3/PAPI_hw_info_t.3, man/man3/PAPI_inherit_option_t.3, man/man3/PAPI_ipc.3, man/man3/PAPI_is_initialized.3, man/man3/PAPI_itimer_option_t.3, man/man3/PAPI_library_init.3, man/man3/PAPI_list_events.3, man/man3/PAPI_list_threads.3, man/man3/PAPI_lock.3, man/man3/PAPI_mh_cache_info_t.3, man/man3/PAPI_mh_info_t.3, man/man3/PAPI_mh_level_t.3, man/man3/PAPI_mh_tlb_info_t.3, man/man3/PAPI_mpx_info_t.3, man/man3/PAPI_multiplex_init.3, man/man3/PAPI_multiplex_option_t.3, man/man3/PAPI_num_cmp_hwctrs.3, man/man3/PAPI_num_components.3, man/man3/PAPI_num_counters.3, man/man3/PAPI_num_events.3, man/man3/PAPI_num_hwctrs.3, man/man3/PAPI_option_t.3, man/man3/PAPI_overflow.3, man/man3/PAPI_perror.3, man/man3/PAPI_preload_info_t.3, man/man3/PAPI_profil.3, man/man3/PAPI_query_event.3, man/man3/PAPI_query_named_event.3, man/man3/PAPI_read.3, man/man3/PAPI_read_counters.3, man/man3/PAPI_read_ts.3, man/man3/PAPI_register_thread.3, man/man3/PAPI_remove_event.3, man/man3/PAPI_remove_events.3, man/man3/PAPI_remove_named_event.3, man/man3/PAPI_reset.3, man/man3/PAPI_set_cmp_domain.3, man/man3/PAPI_set_cmp_granularity.3, man/man3/PAPI_set_debug.3, man/man3/PAPI_set_domain.3, man/man3/PAPI_set_granularity.3, man/man3/PAPI_set_multiplex.3, man/man3/PAPI_set_opt.3, man/man3/PAPI_set_thr_specific.3, man/man3/PAPI_shlib_info_t.3, man/man3/PAPI_shutdown.3, man/man3/PAPI_sprofil.3, man/man3/PAPI_sprofil_t.3, man/man3/PAPI_start.3, man/man3/PAPI_start_counters.3, man/man3/PAPI_state.3, man/man3/PAPI_stop.3, man/man3/PAPI_stop_counters.3, man/man3/PAPI_strerror.3, man/man3/PAPI_thread_id.3, man/man3/PAPI_thread_init.3, man/man3/PAPI_unlock.3, man/man3/PAPI_unregister_thread.3, man/man3/PAPI_write.3, release_procedure.txt: New Doc Files in preparatoin for release 5.7.0.0. 2019-01-30 Anthony Castaldo * src/configure: For new version 5.7.0.0 * doc/Doxyfile-common, src/Makefile.in, src/configure.in, src/papi.h: Changing version number to 5.7.0.0. 2019-01-30 Anthony Castaldo * src/components/cuda/tests/LDLIB.src: Corrected a path name. 2019-01-30 William Cohen * src/run_tests.sh: Elimininating some of the SHELLCHECK_WARNINGS. Removing unused variables. Correcting printf arguments. 2019-01-29 Konstantin Stefanov * src/components/nvml/linux-nvml.c: Change method for detecting available NVML component events Previously PAPI nvml component used ROM version for detecting the type of the GPU and find which events are supported. On some newer cards, e.g. Tesla Kepler and Tesla Pascal, this gives wrong result. Those card support GPU and memory utilization, for example, but it was not detected as Kepler card may not have powerROM, and PAPI nvml considers it as an old card. So I changed the way the event availability is detected: just try to obtain the info, and if it succeds, it is available. 2019-01-28 Anthony Castaldo * src/components/nvml/linux-nvml.c: Added (void)s to eliminate warnings about unused variables. * src/components/cuda/linux-cuda.c: Corrected field-name typo in a SUBDBG message that was not previously being compiled. * src/components/cuda/README, src/components/cuda/tests/LDLIB.src: Changes about accessing to cupti libs and includes. * src/components/cuda/tests/simpleMultiGPU.cu, src/components/nvml/tests/HelloWorld.cu: Corrected compile warnings for deprecated routines or compiler complaints. 2019-01-23 Vince Weaver * src/papi_events.csv: papi_events: the skylake events are actually split in two, make sure cascadelake gets both cases too 2019-01-23 Anthony Castaldo * src/components/nvml/tests/nvml_power_limiting_test.cu: structure member name was misspelt; 'cmpinfo->disabled_resaon' instead of 'cmpinfo->disabled_reason'. * src/components/infiniband_umad/tests/infiniband_umad_list_events.c: Code was missing the 'string.h' include necessary for use of 'strstr() function'. * src/components/infiniband_umad/linux-infiniband_umad.c: Header file changed; fixed prototypes for umad_get_ca() to use 'const char*' instead of 'char*'. 2019-01-22 Vince Weaver * src/papi_events.csv: papi_events: add cascade lake X support 2019-01-22 Anthony Castaldo * src/components/cuda/linux-cuda.c, src/components/cuda/tests/cudaTest_cupti_only.cu, src/libpfm4/lib/pfmlib_amd64.c, src/libpfm4/lib/pfmlib_intel_x86.c, src/libpfm4/lib/pfmlib_intel_x86_arch.c: linux-cuda.c and cudaTest_cupti_only.cu have cosmetic changes. the pfmlib changes were committed by Stephane to simplify cpuid; The push/pop were causing problems with some compiler optimizations. 2019-01-16 Anthony Castaldo * src/libpfm4/README, src/libpfm4/include/perfmon/pfmlib.h, src/libpfm4/lib/events/amd64_events_fam17h.h, src/libpfm4/lib/events/intel_skl_events.h, src/libpfm4/lib/pfmlib_common.c, src/libpfm4/lib/pfmlib_intel_skl.c, src/libpfm4/lib/pfmlib_intel_x86.c, src/libpfm4/lib/pfmlib_intel_x86_priv.h, src/libpfm4/lib/pfmlib_priv.h, src/libpfm4/tests/validate_x86.c: Three patches to libpfm4. (1) Add Intel CascadeLake X core PMU support. (2) Add get_num_events() support for Intel X86 (3) Check PMU models when validating event codes. * src/components/cuda/tests/runSMG.sh: Example file to run simpleMultiGPU. * src/components/cuda/tests/runCTCO.sh: Example file for running cudaTest_cupti_only * src/components/cuda/tests/runBW.sh: Example script to run nvlink_bandwidth on PEAK. * src/components/cuda/tests/runAll.sh: Example script to run nvlink_all on PEAK. * src/components/cuda/tests/simpleMultiGPU.cu: This is a PAPI version of an NVIDIA cupti-only sample program; it is a useful starting point to test a variety of metrics or events, which are specified in a simple internal table. * src/components/cuda/tests/likeComp_cupti_only.cu: This program (likeComp = likeComponent) tested if the events in a metric could be harvested and put into an event group, read and re-ordered to provide data to cuptiMetricGetValue. They can; we did this before rewriting the component to do all Metrics in this way. * src/components/cuda/tests/nvlink_bandwidth.cu: This is a tester for just 4 NVLINK bandwidth metrics. It moves data from CPU (host) to GPU, or GPU to GPU. Reporting of intermediate steps has been increased, and we retrieve the number of Async engines dynamically to optimize the number of streams used in the copies. It was previously hard-coded. * src/components/cuda/tests/nvlink_all.cu: This utility will iterate through all the available NVLINK metrics in the PAPI system, and run a test program for each of them, and report the results to stdout. The test program is memory movement, a command line argument can test CPU(host) to GPU memory movement, or GPU to GPU movement amongst available GPU devices. The report consists of all single events, followed by a list of all possible pairs of events on one GPU, and multiple GPUS. This report notes which nvlink events are incompatible pairs, and will also report if any metrics in pairs produce significantly different measurements than when they are read singly. All of this is done with PAPI. * src/components/cuda/tests/cudaTest_cupti_only.cu: This program will test a single performance metric or event that is provided on the command line, on one or more GPUs. Only cupti is used. The exercise will include both extensive memory moves NOT executed by kernel, and a kernel. Options allow the user to skip the kernel execution if desired, and to optionally use cuInit() and reset the devices before beginning the test. Reports of the steps in the process are output to stdout. The purpose of this program is to show what a cupti-only result looks like, in order to see if issues with an event are in the PAPI implementation only, or also exist in a cupti-only implementation. * src/components/cuda/tests/Makefile: Several targets were added for new test and utility programs. * src/components/cuda/linux-cuda.c: Several changes were made to more efficiently (and correctly) read and compute metrics, including the newly added nvlink metrics. The previous method was not reading groups properly; and though this did not cause an error, it could result in zeros being read instead of actual values. The change is to break down all metrics and events for a device into global event list (without duplicates) and build a single event group set for everything the user has added. We repeat this each time the user adds an event; on the assumption that this overhead is less likely to occur during a performance critical time than when the user reads the event set. After reading all the resultant groups we then re-order the events and values to compute each metric, then store those values (and any other event values) back into the user- provided order. Outstanding Issues: We do not provide to the user the cuda metric 'branch_efficiency', there is an issue with the library code sometimes segfaulting while reading the events for particular metric. The bug has been reported to nvidia as bug ID 2485834. 2019-01-10 Vince Weaver * src/ctests/Makefile.recipies, src/ctests/attach_cpu_sys_validate.c: ctests: add an attach_cpu_sys test this test for the Linux bug where you attach to a process with SYS granularity * src/components/perf_event/perf_event.c, src/components/perf_event/perf_event_lib.h, src/ctests/attach_cpu_validate.c: perf_event: fix granularity setting for attached processes the old code was setting the granularity wrong when attaching to a CPU. * src/components/perf_event/perf_event.c: perf_event: properly fall back to read() if rdpmc read attempt fails The code wasn't properly handling this. We now fall back to read() if *any* rdpmc call in an eventset fails. In theory it is possible to only fall back in a per-event fashion but that would make the code a lot more complex. * src/components/perf_event/perf_helpers.h: perf_event: internally indicate we need fallback when rdpmc not available 2018-12-07 Vince Weaver * src/ctests/attach_cpu_validate.c: ctests: attach_cpu_validate: fail test if all values are close to the same * src/ctests/Makefile.recipies, src/ctests/attach_cpu_validate.c: ctests: add attach_cpu_validate test 2018-12-03 Vince Weaver * src/ctests/branches.c: ctests/branches: remove code to set "sleep time" which is no longer used * src/ctests/branches.c: ctests/branches: make the failure message more verbose to see what was going wrong. the issue I was seeing on Haswell was because there was some perf-related system load happening on the same machine (the perf_fuzzer) * src/ctests/branches.c: ctests: branches, update code comments to explain what test is doing trying to figure out why sometimes failing on Haswell system 2018-11-20 Anthony Castaldo * src/components/cuda/linux-cuda.c, src/components/cuda/tests/Makefile, src/components/cuda/tests/nvlink_bandwidth.cu, .../cuda/tests/nvlink_bandwidth_cupti_only.cu, src/components/cuda/tests/runBW.sh, src/components/cuda/tests/runCO.sh, src/components/cuda/tests/simpleMultiGPU.cu: Several files modified to properly utilize the NVLINK metrics added to the linux-cuda.c component. Commenting improved to aid my own understanding of the existing code. Tony C. 2018-11-20 Terry Cojean * src/components/cuda/README, src/components/nvml/README, .../nvml/tests/nvml_power_limiting_test.cu, src/components/powercap/README, src/components/powercap/tests/powercap_limit.c: Improved error handling. Fixed typos. Added details to component README files. 2018-11-05 Anara Kozhokanova * src/components/powercap/utils/powercap_plot.c: Revert "Temporary Fix: The powercap component does not properly" This reverts commit bde6c257e4af47e9267ebb194b0aa4697568e99f. The issue with incorrect values reported by powercap component was fixed in ea8fa1f. Therefore, this temporary fix is no longer needed. * src/components/powercap/linux-powercap.c: Fix the bug in powercap component introduced in 2231b36. The reported values by powercap component were not correct (read values were not subtracted from start values and without wraparound). 2018-11-04 Frank Winkler * src/components/perf_event/perf_event.c: Fixed a bug that occurred when compiling with debug flag. - papi_pe_buffer was undeclared 2018-10-26 Anthony Castaldo * src/components/cuda/linux-cuda.c, src/components/cuda/tests/Makefile, src/components/cuda/tests/nvlink_all.cu, src/components/cuda/tests/nvlink_bandwidth.cu, .../cuda/tests/nvlink_bandwidth_cupti_only.cu, src/components/cuda/tests/runAll.sh, src/components/cuda/tests/runBW.sh, src/components/cuda/tests/runCO.sh: repairs, new features, run files, a new utility in nvlink_all. 2018-10-10 Anthony Castaldo * src/components/cuda/linux-cuda.c, src/components/cuda/tests/LDLIB.src, src/components/cuda/tests/Makefile, src/components/cuda/tests/nvlink_all.cu, src/components/cuda/tests/nvlink_bandwidth.cu, .../cuda/tests/nvlink_bandwidth_cupti_only.cu, src/components/cuda/tests/runAll.sh, src/components/cuda/tests/runBW.sh, src/components/nvml/PeakConfigure.sh: Added several files, and rewrote the tests. I created a new test, nvlink_all.cu, with a new approach to test all nvlink events present in the component standalone, and I rewrote the original nvlink_bandwidth.cu to make it work properly with PAPI. I also added some testing scripts needed to function on the PEAK supercomputer; where this code was tested. 2018-10-03 Anara Kozhokanova * src/utils/papi_avail.c: Add a note to the output of "papi_avail -e " if preset event is not available at the host architecture. 2018-09-28 Anthony Castaldo * src/components/cuda/tests/Makefile, src/components/cuda/tests/nvlink_bandwidth.cu, src/components/cuda/tests/simpleMultiGPU.cu, src/components/nvml/tests/Makefile, src/components/nvml/tests/nvmlcap_plot.cu: New and debugged files for NVML and CUDA testing. 2018-09-28 Vince Weaver * src/components/perf_event/perf_event.c: perf_event: remove debug printf from libpfm4 error handling code Steve Kaufmann reported this triggered sometimes and was unnecessary * src/utils/papi_avail.c: papi_avail: fix the -e option to not print spurious message the "no events available" message should not be printed if -e is being used 2018-09-27 Vince Weaver * src/components/perf_event/perf_event.c: perf_event: avoid floating point exception if running is 0 The perf_event interface isn't supposed to return 0 for running, but it happens occasionally. So be sure not to divide by zero if this happens. This makes the rdpmc code match the generic perf code in this case. This is in response to bitbucket issue #52 2018-09-25 Heike Jagode * src/components/powercap/utils/powercap_plot.c: Temporary Fix: The powercap component does not properly report energy values. At some point in Nov 2017, the read() function was rewritten, which resulted in numerous errors, such as: +++ the energy start values are not subtracted from the read values. +++ wraparound is no longer working properly. etc. This commit serves as an immediate workaround and adds a temporary fix to get the powercap_plot utility working again. However, all this should and will be fixed in the powercap component itself. 2018-09-21 Anthony Castaldo * src/components/cuda/tests/simpleMultiGPU.cu: Corrected a bug in the CUPTI_ONLY version of simpleMultiGPU.cu. This manifested specifically if the node has multiple GPUs and they are of different models or types; in which case they can have differently numbered PAPI events. We converted a scalar storing the eventID to a vector with one eventID per GPU. 2018-09-19 Anthony Castaldo * src/components/nvml/tests/Makefile, src/components/nvml/tests/benchSANVML.c, .../nvml/tests/nvml_power_limit_read_test.cu, .../nvml/tests/nvml_power_limiting_test.cu: new test files, more cleanup on failure reporting. * src/components/cuda/tests/Makefile, src/components/nvml/tests/Makefile, .../nvml/tests/nvml_power_limiting_test.cu: Additions to Makefiles, and several changes to power limiting testing to correct errors when multiple GPUs are present, remove extraneous code, and provide greater clarity in output and error messages. 2018-09-14 Heike Jagode * src/components/cuda/linux-cuda.c: Minor fix: return correct error message if libcupti.so not found. 2018-09-13 Heike Jagode * src/components/cuda/linux-cuda.c: minor fix * src/components/cuda/linux-cuda.c: Bug fix: Instead of normalizing all the event values to represent the total number of domain instances on the device, only the last event value was normalized. Tue Mar 20 09:37:56 2018 -0700 Steve Walk * src/libpfm4/lib/events/arm_cavium_tx2_events.h: Update libpfm4 Current with ------------ commit 6c9e44b95a55b8bf62cbd64009c4c9b30964a66c update Cavium ThunderX2 with now public events This patch adds new model specific events to the Cavium Thunder X2 core PMU. The updated list is based on publicly available documentation from Cavium which is available at: https://cavium.com/resources.html 2018-08-27 Vince Weaver * src/components/rapl/linux-rapl.c: rapl: add support for AMD Fam17h (Zen) CPUs AMD Fam17h chips have a new RAPL-like interface that supports energy measurement using register layouts like Intel RAPL, but at a different MSR number. This has been tested on an EPYC system and the package value seems to be plausible, but as reproted by the LIKWID people the cores value seems a bit too low. 2018-08-01 Anthony Castaldo * src/components/pcp/tests/Makefile2, src/components/pcp/tests/README_BenchTesting.txt, src/components/pcp/tests/benchPCP.c, src/components/pcp/tests/benchPCP_script.sh, src/components/pcp/tests/benchStats.c: benchmarking files and README. 2018-07-23 Tony Castaldo * ChangeLogP500.txt, RELEASENOTES.txt, man/man1/papi_multiplex_cost.1, man/man3/PAPI_attach.3, man/man3/PAPI_detach.3, man/man3/PAPI_get_dmem_info.3, man/man3/PAPI_hw_info_t.3, man/man3/PAPI_overflow.3, man/man3/PAPI_profil.3, src/Makefile.inc, src/components/Makefile_comp_tests, src/components/Makefile_comp_tests.target.in, src/components/appio/tests/Makefile, src/components/appio/tests/iozone/libasync.c, src/components/appio/tests/iozone/makefile, src/components/cuda/tests/Makefile, src/components/nvml/linux- nvml.c, src/components/perfctr/perfctr-x86.c, src/configure.in, src/ctests/Makefile.recipies, src/ctests/Makefile.target.in, src/ctests/overflow_force_software.c, src/examples/Makefile, src/examples/PAPI_overflow.c, src/freebsd/map-atom.c, src/freebsd /map-core2-extreme.c, src/freebsd/map-core2.c, src/ftests/Makefile.recipies, src/ftests/Makefile.target.in, src /linux-context.h, src/linux-timer.c, src/papi.c, src/papi.h, src/papi_events.csv, src/sw_multiplex.c, src/testlib/Makefile, src/testlib/Makefile.target.in, src/utils/Makefile, src/utils/Makefile.target.in, src/utils/papi_multiplex_cost.c, src/validation_tests/Makefile.recipies, src/validation_tests/Makefile.target.in: 8 patches to make system from Andreas Beckmann 2018-06-27 Anthony Castaldo * src/components/pcp/linux-pcp.c: Removed a duplicated IF statement. No difference in execution. 2018-06-25 Anthony Castaldo * src/components/pcp/linux-pcp.c: fixed pcp_init_component to show any errors in reason for a disabled PCP component. 2018-06-22 Anthony Castaldo * src/components/pcp/README, src/components/pcp/linux-pcp.c, src/components/pcp/tests/testPCP.c: fixed a debug print, added 'timescope' to testPCP output, completed README. * src/components/pcp/README: fixed up README file. 2018-06-21 Anthony Castaldo * src/components/pcp/linux-pcp.c, src/components/pcp/tests/testPCP.c: removed debug code, added non-zeroing on instantaneous variables. 2018-06-19 Heike Jagode * src/components/cuda/sampling/Makefile, src/components/cuda/tests/Makefile: Add cuda/lib64/stubs to linker for cuda tests to link with libcuda.so. 2018-06-19 Anthony Castaldo * src/components/pcp/linux-pcp.c, src/components/pcp/tests/testPCP.c: Code changes necessary to work on Power9. 2018-06-18 Anthony Castaldo * src/components/pcp/Rules.pcp, src/components/pcp/linux-pcp.c: Corrections to allow compile and execution on Power9. 2018-06-15 Anthony Castaldo * src/components/pcp/README, src/components/pcp/Rules.pcp, src/components/pcp/linux-pcp.c, src/components/pcp/tests/Makefile, src/components/pcp/tests/testPCP.c: Initial coding of pcp component and tester completed. Wed Jun 13 23:49:10 2018 -0700 Stephane Eranian * src/libpfm4/config.mk, src/libpfm4/debian/changelog: Update libpfm4 Current with ------------ commit 37d4628e37ba76c1ab586ab35e85340e30f7c523 update to version 4.10.1 - Fix build issues on Cavium Thunder X2 - Update Skylake event table Tue Jun 12 23:31:13 2018 -0700 Stephane Eranian * src/libpfm4/lib/Makefile, src/libpfm4/lib/events/intel_skl_events.h, src/libpfm4/lib/pfmlib_common.c: Update libpfm4 Current with ------------ commit fa65a75a8af5b4e2c360be41e66203e04735dfd2 update Skylake event table Based on Intel's skykake_core_v40,json event table from download.01.org. Added PARTIAL_RAT_STALLS.SCOREBOARD Added ROB_MISC_EVENT.PAUSE_INST Fixed encodings of some umasks for L2_RQSTS 2018-06-12 Vince Weaver * src/components/perf_event/perf_event.c, src/ctests/Makefile.recipies, src/ctests/attach_validate.c: ctests: add new attach_validate test actually tries to validate the counter values when attached we might have an issue with rdpmc() and attach and trying to make a test to catch it. Thu Jun 7 11:38:48 2018 -0700 Stephane Eranian * src/libpfm4/config.mk, src/libpfm4/debian/changelog: Update libpfm4 Current with ------------ commit 924437778d3fe75de5f7a43374ed6f4b1c0533a7 update to version 4.10.0 Update verison number to 4.10 2018-06-08 Steve Walk * src/papi_events.csv: enable Cavium ThunderX2 support 2018-06-07 Anara Kozhokanova * src/components/cuda/README: Update README in CUDA component: added '-i' flag to grep. Add a note about verifying whether the component is active or not before using it. Tue Jun 5 14:22:32 2018 -0700 William Cohen * src/libpfm4/python/self.py, src/libpfm4/python/src/pmu.py, src/libpfm4/python/sys.py: Update libpfm4 Current with ------------ commit 3106615db87f81f220efc13df7a4e36e31f1ee64 Import python print function So that code works in the same manner for python 2 and 3 Mon Jun 4 20:15:08 2018 -0700 William Cohen * src/libpfm4/lib/pfmlib_perf_event_pmu.c, src/libpfm4/perf_examples/syst_count.c, src/libpfm4/perf_examples/syst_smpl.c: Update libpfm4 Current with ------------ commit 29f626744df184913a200532408e205e2b0ec2ec Fix error: '%s' directive output may be truncated Newer versions of gcc are more strict with regards to return values of snprintf(), so check the values. 2018-06-01 Heike Jagode * src/Makefile.inc: Fixed 'make dist' step. Mon May 28 13:50:44 2018 -0700 Stephane Eranian * src/libpfm4/include/perfmon/pfmlib.h, src/libpfm4/lib/Makefile, src/libpfm4/lib/events/arm_cavium_tx2_events.h, src/libpfm4/lib/events/intel_skl_events.h, src/libpfm4/lib/pfmlib_arm_armv8.c, src/libpfm4/lib/pfmlib_common.c, src/libpfm4/lib/pfmlib_intel_x86_arch.c, src/libpfm4/lib/pfmlib_intel_x86_perf_event.c, src/libpfm4/lib/pfmlib_intel_x86_priv.h, src/libpfm4/lib/pfmlib_perf_event.c, src/libpfm4/lib/pfmlib_priv.h, src/libpfm4/tests/validate_arm64.c, src/libpfm4/tests/validate_x86.c: Update libpfm4 Current with ------------ commit 488697d43bc5601ca51a22f7072169781d5b45b2 fix typo in BUS_ACCESS event for Cavium ThunderX2 This patch fixes a typo in event name for the Cavium ThunderX2 core PMU event list. BUS_ACCESS_LD -> BUS_ACCESS_WR Event list based on ARM Architecture Reference Manual (ARM DDI 0487C.a). 2018-05-25 Vince Weaver * src/ftests/Makefile.recipies, src/ftests/openmp.F: ftests: add an openmp test 2018-04-30 Vince Weaver * src/ctests/Makefile.recipies, src/ctests/destroy.c: ctests: add destroy test this checks to make sure that when we destroy eventsets we aren't leaking file descriptors Wed Apr 18 19:03:36 2018 +0200 André Wild * src/libpfm4/lib/events/mips_74k_events.h, src/libpfm4/lib/events/s390x_cpumf_events.h, src/libpfm4/lib/pfmlib_intel_nhm_unc.c, src/libpfm4/lib/pfmlib_intel_x86.c: Update libpfm4 Current with ------------ commit 903d1c05ed72d45e5bebc1f2a1a1ae60b3ed1ee6 (HEAD -> master, origin/master, origin/HEAD) remove duplicate assignment in pfm_nhm_unc_get_encoding pe was assigned twice for no reason. commit 37b7e406b77acf6115386cca43bab128e2a2d905 clarify intel_x86_check_pebs() This routine is not used right now because we cannot determine in the x86 code whether or not PEBS has been requested for an event. This is usually requested at the OS interface level. But the patch keeps the code around in case we need it later on. commit 832e1a388d25ba39444505c2fa7ffb77f7537df5 fix typo in mip74k event name OCP_WRITE_CACHEABLE REQUESTS -> OCP_WRITE_CACHEABLE_REQUESTS Reported-by: Andreas Beckmann commit 56cea590df7e77a1c1f1044e95d836cc01cfdb56 s390/cpumf: rename IBM z13/z14 counter names Change the IBM z13/z14 counter names to be in sync with all other models. Wed Apr 4 18:45:18 2018 -0400 Heike Jagode * src/libpfm4/README, src/libpfm4/include/perfmon/pfmlib.h, src/libpfm4/lib/pfmlib_common.c, src/libpfm4/lib/pfmlib_intel_knl_unc_cha.c, src/libpfm4/lib/pfmlib_intel_knl_unc_edc.c, src/libpfm4/lib/pfmlib_intel_knl_unc_imc.c, src/libpfm4/lib/pfmlib_intel_knl_unc_m2pcie.c, src/libpfm4/lib/pfmlib_intel_snbep_unc.c, src/libpfm4/lib/pfmlib_intel_snbep_unc_priv.h, src/libpfm4/lib/pfmlib_priv.h, src/libpfm4/tests/validate_x86.c: Update libpfm4 Current with ------------ commit c4de2ea3b50fa14e66129b06619775840aafab2a Add support for Intel KNM uncore events This patch adds Intel Knights Mill uncore event support for: CHA uncore PMU Integrated EDRAM uncore PMU Integrated Memory Controller (IMC) uncore PMU M2PCIe uncore PMU It is based on the Knights Landing event table, which is shared with Knights Mill. 2018-04-02 Heike Jagode * src/papi_events.csv: PAPI preset event support for Intel Knights Mill. Mon Mar 19 23:53:23 2018 -0700 Stephane Eranian * src/libpfm4/README, src/libpfm4/docs/Makefile, src/libpfm4/docs/man3/libpfm_intel_knm.3, src/libpfm4/docs/man3/libpfm_intel_skx_unc_cha.3, src/libpfm4/docs/man3/libpfm_intel_skx_unc_iio.3, src/libpfm4/docs/man3/libpfm_intel_skx_unc_imc.3, src/libpfm4/docs/man3/libpfm_intel_skx_unc_irp.3, src/libpfm4/docs/man3/libpfm_intel_skx_unc_m2m.3, src/libpfm4/docs/man3/libpfm_intel_skx_unc_m3upi.3, src/libpfm4/docs/man3/libpfm_intel_skx_unc_pcu.3, src/libpfm4/docs/man3/libpfm_intel_skx_unc_ubo.3, src/libpfm4/docs/man3/libpfm_intel_skx_unc_upi.3, src/libpfm4/examples/check_events.c, src/libpfm4/examples/showevtinfo.c, src/libpfm4/include/perfmon/pfmlib.h, src/libpfm4/lib/Makefile, src/libpfm4/lib/events/intel_bdw_events.h, src/libpfm4/lib/events/intel_bdx_unc_cbo_events.h, src/libpfm4/lib/events/intel_bdx_unc_ha_events.h, src/libpfm4/lib/events/intel_bdx_unc_imc_events.h, src/libpfm4/lib/events/intel_bdx_unc_irp_events.h, .../lib/events/intel_bdx_unc_r3qpi_events.h, src/libpfm4/lib/events/intel_bdx_unc_sbo_events.h, .../lib/events/intel_ivbep_unc_pcu_events.h, src/libpfm4/lib/events/intel_skl_events.h, src/libpfm4/lib/events/intel_skx_unc_cha_events.h, src/libpfm4/lib/events/intel_skx_unc_iio_events.h, src/libpfm4/lib/events/intel_skx_unc_imc_events.h, src/libpfm4/lib/events/intel_skx_unc_irp_events.h, src/libpfm4/lib/events/intel_skx_unc_m2m_events.h, .../lib/events/intel_skx_unc_m3upi_events.h, src/libpfm4/lib/events/intel_skx_unc_pcu_events.h, src/libpfm4/lib/events/intel_skx_unc_ubo_events.h, src/libpfm4/lib/events/intel_skx_unc_upi_events.h, src/libpfm4/lib/pfmlib_common.c, src/libpfm4/lib/pfmlib_intel_bdx_unc_pcu.c, src/libpfm4/lib/pfmlib_intel_hswep_unc_pcu.c, src/libpfm4/lib/pfmlib_intel_ivbep_unc_pcu.c, src/libpfm4/lib/pfmlib_intel_knl.c, src/libpfm4/lib/pfmlib_intel_skx_unc_cha.c, src/libpfm4/lib/pfmlib_intel_skx_unc_iio.c, src/libpfm4/lib/pfmlib_intel_skx_unc_imc.c, src/libpfm4/lib/pfmlib_intel_skx_unc_irp.c, src/libpfm4/lib/pfmlib_intel_skx_unc_m2m.c, src/libpfm4/lib/pfmlib_intel_skx_unc_m3upi.c, src/libpfm4/lib/pfmlib_intel_skx_unc_pcu.c, src/libpfm4/lib/pfmlib_intel_skx_unc_ubo.c, src/libpfm4/lib/pfmlib_intel_skx_unc_upi.c, src/libpfm4/lib/pfmlib_intel_snbep_unc.c, .../lib/pfmlib_intel_snbep_unc_perf_event.c, src/libpfm4/lib/pfmlib_intel_snbep_unc_priv.h, src/libpfm4/lib/pfmlib_intel_x86.c, src/libpfm4/lib/pfmlib_intel_x86_perf_event.c, src/libpfm4/lib/pfmlib_intel_x86_priv.h, src/libpfm4/lib/pfmlib_priv.h, src/libpfm4/lib/pfmlib_s390x_cpumf.c, src/libpfm4/perf_examples/perf_util.c, src/libpfm4/python/self.py, src/libpfm4/python/src/pmu.py, src/libpfm4/python/sys.py, src/libpfm4/tests/validate.c, src/libpfm4/tests/validate_x86.c: Update libpfm4 Current with ------------ commit 7987ff8978d4ceef07a539e822c1b582f8924720 (HEAD -> master, origin/master, origin/HEAD) fix 32-bit compile on skx_cha_filt0 The bit field was too wide, so break it in two to keep gcc -m32 happy. commit d60d8955580e7f27c5a269c636d6dcb50eef287d Add support for Intel KNM core events This patch adds Intel Knights Mill core event support for libpfm4. It is based on the Knights Landing event table, which is shared with Knights Mill. commit 3fdae82b5e028a388510798f1f0c84d1139a1735 fix headers on Intel Skylake Uncore PMU files Fix the header with proper copyright line. commit e26ca9492ae26a0150b81828306af3a6e132e488 Fix empty event descriptions for Intel Broadwell-EP uncore PMUs Now that we have empty description detection, fix the ones detected in the Intel Broadwell-EP uncore PMUs. commit 05fc5910b78526d3cba3160713f467d9dcc0774b detect empty event/umask descriptions on Intel processors This patch adds a validation test to detect empty descriptions for events and umasks on Intel X86 processors. 2018-02-28 John Henry * INSTALL.txt: Fixed typo --with_bitmode=32 changed to --with- bitmode=32 2018-02-23 Vince Weaver * src/ctests/hl_rates.c, src/ctests/inherit.c, src/papi_hl.c: ctests: change a few more test results from FAIL to SKIP when paranoid=3 there are some more that fail, but their failure errors make no sense and are coming from deep within PAPI so I am not sure I can easily fix things without making it worse. * src/ctests/attach2.c, src/ctests/attach3.c, src/ctests/attach_cpu.c: ctests: attach tests, skip instead of fail if not enough permissions * src/papi_internal.c: papi_internal: whitespace cleanup (no code changes) * src/utils/papi_avail.c, src/utils/papi_native_avail.c: utils: papi_avail/native_avail suggest papi_component_avail if no events detected If no events are detected, let the user know they should use papi_component_avail to find out why. * src/papi_internal.c: papi_internal: comment the error generation code I'm not sure why we generate things this way, but it makes it really confusing when adding a new error. * src/genpapifdef.c, src/papi.h, src/papi_common_strings.h, src/papi_internal.c: add new PAPI_ECMP_DISABLED error We can return this error if an event is added but the component involved is disabled. If a user moves working code to a system where perf_event_paranoid is set to 3 (all perf events disabled) they will now get an error indicating the component is disabled rather than an "event not found" error which was confusing. * src/ctests/zero.c: ctests: zero: print full error message if cannot add 2018-02-21 Vince Weaver * src/utils/papi_component_avail.c: utils: papi_component_avail: fix the NAME field in the auto-generated manpage Steve Kaufmann noticed that the papi_component_avail NAME field for the auto- generated manpage for some reason had info for papi_native_avail instead. 2018-02-16 Heike Jagode * src/configure, src/configure.in: Fixed compilation error that occurs with deprecated option '-openmp' when using a more current icc compiler. Replaced with '-qopenmp'. Tested with: icc/2016.0 icc/2017.4 icc/2018 icc/2018.1 Reported by Preeti Suman from Intel. Wed Feb 7 09:51:16 2018 -0800 Stephane Eranian * src/libpfm4/lib/events/intel_skl_events.h, src/libpfm4/lib/events/s390x_cpumf_events.h, src/libpfm4/lib/pfmlib_s390x_cpumf.c, src/libpfm4/lib/pfmlib_s390x_priv.h: Update libpfm4 Current with commit 8f2653b8e2e18bad44ba1acc7f92c825f226ef71 s390/cpumf: add support for IBM z14 counters Add counter definitions for the IBM z14 hardware model. With z14, the counters in the problem-state set are reduced and the counter first number version is increased accordingly. Now, the counters are processed depending on the counter facility versions. commit 96c0847f524b0b23e189478315587abf35cbf774 add CORE_SNOOP_RESPONSE event for Intel Skylake This is a newly disclosed event of Intel Skylake Core PMU. Based on download.01.org skylakex_core_v1.06.json event table. Thu Jan 25 19:23:45 2018 -0800 Stephane Eranian * src/libpfm4/config.mk, src/libpfm4/debian/changelog, src/libpfm4/lib/events/perf_events.h, src/libpfm4/lib/pfmlib_perf_event_pmu.c, src/libpfm4/tests/Makefile, src/libpfm4/tests/validate.c, src/libpfm4/tests/validate_perf.c: Update libpfm4 Current with commit 18e3c1f0254ab9323ac848643b8e042e65cf5259 Add minimal perf_events generic events validation This patch adds a small validation tests suite for the generic PMU event provide by the perf_events interface. This is specific to Linux. This patch modifies the validate.c file to handle the new perf_events test suite. 2018-01-24 Vince Weaver * src/components/Makefile_comp_tests.target.in, src/components/perf_event_uncore/tests/Makefile, src/ctests/Makefile.recipies, src/ctests/Makefile.target.in, src/ftests/Makefile.target.in, src/utils/Makefile.target.in, src/validation_tests/Makefile.target.in: build: fix various LDFLAGS/CFLAGS issues issues were reported by Andreas Beckmann 2018-01-22 Vince Weaver * src/utils/papi_cost.c: utils: papi_cost: uset getopt() to parse command line rather than open-coding one The existing code was fragile and also as far as I can tell the -b option hadn't worked for a long time. * src/utils/papi_cost.c: utils: papi_cost: various minor cleanups to the code * src/utils/cost_utils.c, src/utils/cost_utils.h, src/utils/papi_cost.c: utils: papi_cost: add -p option for printing boxplot percentages makes generating boxplots from the results much easier 2018-01-05 John Henry * release_procedure.txt: Fix typo in release_procedure.txt. Missing do