* 8f524875 RELEASENOTES.txt: Prepare release notes for a 5.4.0 release
* a8b4613b man/man1/papi_avail.1 man/man1/papi_clockres.1
man/man1/papi_command_line.1...: Rebuild the doxygen manpages
* fbea4897 src/run_tests_exclude.txt: Remove omptough from standard
run_tests.sh testing On some platforms (e.g. some AMD machines), if
OMP_NUM_THREADS is not set, then this test does not complete in a reasonable
time. That is because on these platforms too many threads are spawned inside
a large loop.
* 23c2705b src/components/bgpm/CNKunit/linux-CNKunit.c
src/components/bgpm/IOunit/linux-IOunit.h...: Fix number of counters and
events for each of the 5 BGPM units as well as emon on BG/Q
* 93c69ded src/linux-timer.c: Patch linux-timer.c to provide cycle counter
support on aarch64 (64-bit ARMv8 architecture) Thanks to William Cohen for
this patch and the message below: --- The aarch64 has cycle counter available
to userspace and this resource should be made available in papi. --- This
patch is not tested by the PAPI team (no easily available hardware).
* 038b2f31 src/components/rapl/linux-rapl.c: Extension of the RAPL energy
measurements on Intel via msr-safe.
(https://github.com/scalability-llnl/msr-safe) msr-safe is a linux kernel
module that allows user access to a whitelisted set of MSRs. It is nearly
identical in structure to the stock msr kernel module, with the important
exception that the "capabilities" check has been removed. The LLNL sysadmins
did a security review for the whitelist.
* 67e0b3f6 src/components/rapl/tests/rapl_basic.c: Fixed string null
* 2a1805ec src/components/perf_event/pe_libpfm4_events.c
src/components/perf_event/perf_event.c...: Patch to reduce cost of using
PAPI_name_to_code and add list of supported pmu names to papi_component_avail
output Thanks to Gary Mohr for this patch and its documentation: --- This
patch file contains code to look for either pmu names or component names on
the front of event strings passed to PAPI_name_to_code calls. If found the
name will be compared to values in each component info structure to see if
the component supports this event. If the pmu name or component name does
not match the values in the component info structure then there is no need to
call this component for this event. If the event string does not contain
either a pmu name or a component name then all components will be called.
This reduces the overhead in PAPI when converting event names to event codes
when either component names or pmu names are provided in the event name
string. To support the above checks, there is also code in this patch to add
an array of pmu names to the component info structure and modifications to
the core and uncore components to save the pmu names supported by each of
these components in this new array. This patch also adds code to the
papi_component_avail tool to display the pmu names supported by each active
* a91db97b src/components/net/linux-net.c src/components/nvml/linux-nvml.c
src/components/perf_event/perf_event.c...: This patch file contains
additional changes to resolve defects reported by Coverity. Thanks to Gary
Mohr for this patch. ------ This patch file contains additional changes to
resolve defects reported by Coverity. Mostly these just make sure that
character buffers get null terminated so they can be used as C strings.
There is also a change in the RAPL component to improve the message to
identify why the component may have been disabled. ------
* 3f913658 src/ctests/tenth.c: Fix percent error calculation in
ctests/tenth.c Thanks to Carl Love for this patch and the following
documentation: Do the division first then multiply by 100 when calculating
the percent error. This will keep the magnitude of the numbers closer. If
you multiply by 100 before dividing you may exceed the size the representable
number size. Additionally, by casting the values to floats then dividing we
get more accuracy in the calculation of the percent error. The integer
division will not give us a percent error less then 1.
* ba5ef24a src/papi_events.csv: PPC64, fix L1 data cache read, write and all
access equations. Thanks to Carl Love for this patch and the following
documentation: The current POWER 7 equations for all accesses over counts
because it includes non load accesses to the cache. The equation was changed
to be the sum of the reads and the writes. The read accesses to the two
units, can be counted with the single event PM_LD_REF_L1 rather then counting
the events to the two LSU units independently. The number of reads to the L1
must be adjusted by subtracting the misses as these become writes. Power 8
has four LSU units. The same equations can be used since PM_LD_REF_L1 counts
across all four LSU units.
* 882f5765 src/utils/native_avail.c: This patch file fixes two problems and
adds a performance improvement in "papi_native_avail. Thanks to Gary Mohr
for this patch and the following information: First it corrects a problem
when using the -i or -x options. The code was putting out too many event
divider lines (lines with all '-' characters). This has been corrected.
Second it improves the results from "papi_native_avail --validate" when being
used on SNBEP systems. This system has some events which require multiple
masks to be provided for them to be valid. The validate code was showing
these events as not available because it did not try to use the event with
the correct combination of masks. The fix checks to see if a valid form of
the event has not yet been found and if so then it tries the event with
specific combinations of masks that have been found to make these events
work. It also adds a check before trying to validate the event with a new
mask to see if a valid form of the event has already been found. If it has
then there is no need to try and validate the event again.
* 94985c8a src/config.h.in src/configure src/configure.in...: Fix build error
when no fortan is installed Thanks to Maynard Johnson for this patch. Fix up
the build mechanism to properly handle the case where no Fortran compiler is
installed -- i.e., don't build or install testlib/ftest_util.o or the ftests.
* de05a9d8 src/linux-common.c: PPC64 add support for the Power non
virtualized platform Thanks to Carl Love for this patch and the following
description: The Power 8 system can be run as a non-virtualized machine. In
this case, the platform is "PowerNV". This patch adds the platform to the
possible IBM platform types.
* 547f4412 src/ctests/byte_profile.c: byte_profile.c: PPC64 add support for
PPC64 Little Endian to byte_profile.c Thanks to Carl Love for this patch and
the following description: The POWER 8 platform is Little Endian. It uses
ELF version 2 which does not use function descriptors. This patch adds the
needed #ifdef support to correctly compile the test case for Big Endian or
Little Endian. This patch is untested by the PAPI developers (hardware not
* 14f70ebc src/ctests/sprofile.c: PPC64 add support for PPC64 Little Endian
to sprofile.c Thanks to Carl Love for this patch and the following
description: The POWER 8 platform is Little Endian. It uses ELF version 2
which does not use function descriptors. This patch adds the needed #ifdef
support to correctly compile the test case for Big Endian or Little Endian.
* 6d41e208 src/linux-memory.c: PPC64 sys_mem_info array size is wrong Thanks
to Carl Love for this patch and the following description: The variable
sys_mem_info is an array of type PAPI_mh_info_t. It is statically declared
as size 4. The data for POWER8 is statically declared in entry 4 of the
array which is beyond the allocated array. The array should be declared
without a size so the compiler will automatically determine the correct size
based on the number of elements being initialized. This patch makes the
* 061817e0 src/papi_events.csv: Remove stray Intel Haswell events from Intel
Ivy Bridge presets Thanks to William Cohen for this patch and the following
description: 'Commit 4c87d753ab56688acad5bf0cb3b95eae8aa80458 added some
events meant for Intel Haswell to the Intel Ivy bridge presets. This patch
removes those stray events. Without this patch on Intel Ivy Bridge machines
would see messages like the following: PAPI Error: papi_preset: Error finding
event L2_TRANS:DEMAND_DATA_RD. PAPI Error: papi_preset: Error finding event
L2_RQSTS:ALL_DEMAND_REFERENCES.' This patch was not tested by the PAPI team
(no appropriate hardware).
* 8bc1ff85 src/papi_events.csv: Update papi_events.csv to match libpfm
support for Intel family 6 model 63 (hsw_ep) Thanks to William Cohen for
this patch and its information: 'A recent September 11, 2014 patch (98c00b)
to the upstream libpfm split out Intel family 6 model 63 into its own name of
"hsw_ep". The papi_events.csv needs to be updated to support that new name.
This should have no impact for older libpfms that still identify Intel family
6 model 63 as "hswv" and "hsw_ep" map to the same papi presets.'
* 32a8b758 src/papi_events.csv: Support for the ARM X-Gene processor. Thanks
to William Cohen for this patch. The events for the Applied Micro X-Gene
processor are slightly different from other ARM processors. Thus, need to
define those presets for the X-Gene processor. Note: This patch is not
tested by the PAPI team because we do not have the appropriate hardware.
* 0a97f54e src/components/perf_event/pe_libpfm4_events.c
.../perf_event_uncore/peu_libpfm4_events.c src/papi_internal.c...: Thanks to
Gary Mohr for the patch --------------------- Fix for bugs in
PAPI_get_event_info when using the core and uncore components:
PAPI_get_event_info returns incorrect / incomplete information. The errors
were in how the code handled event masks and their descriptions so the errors
would not lead to program failures, just the possibility of incorrect
labeling of output. ---------------------
* 77960f71 src/components/perf_event/pe_libpfm4_events.c: Record
encode_failed error by setting attr.config to 0xFFFFFFF. This causes any
later validate attempts to fail, which is the desired behavior. Note: as of
2014/10/09 this has not been thoroughly tested, since a failure case is not
known. This patch simply copies a fix that was applied to the
* 00ae8c1e src/components/perf_event_uncore/peu_libpfm4_events.c: Based on
Gary Mohr's suggestion. If an event fails when we try to add it
(encode_failed), then we note that error by setting attr.config = 0xFFFFFF
for that event. Then, if there is a later check to validate this event, the
check will correctly return an error.
* 3801faaf src/utils/native_avail.c: Adding the NativeAvailValidate patch
provided by Gary Mohr. The problem being addresed is that if there were any
problems validating event masks, then those problems would result in the
entire event being invalid. The desired action was to test each event mask,
and if any basic event mask can make the event succeed, then the event should
be returned as valid and available. The solution is to create a large buffer
and write events and masks into this buffer as they are processed, tracking
their validity. At the end go back and mark the validity of the entire
event. This matches the standard output of PAPI.
* 6abc8196 src/components/emon/README src/components/emon/Rules.emon
src/components/emon/linux-emon.c: Emon power component for BG/Q
* 62b9f2a9 .../perf_event_uncore/perf_event_uncore.c: perf_event_uncore.c:
Check scheduability of events Patch by Gary Mohr, ------------------- This
patch file adds code to the uncore component to check and make sure that the
events being opened from an event set can be scheduled by the kernel. This
kind of code exists in the core component but was not moved into the uncore
component because it was felt that it would not be an issue with uncore.
Turns out the kernel has the same kind of issues when scheduling uncore
events. The symptoms of this problem will show that the kernel will report
that all events in the event set were opened successfully but when trying to
read the results one (or more) of the events will get a read failure. Seen
in the traces and on stderr (if papi configured with debug=yes) as a "short
read" error. The logic is slightly different than what is in the core
component because the events in the core component are grouped and the ones
in the uncore are not grouped. When events are grouped, you only need to
enable/disable and read results on the group leader. But when they are not
grouped you need to do these operations on each event in the event set.
* 8e6bf887 src/components/cuda/linux-cuda.c
src/components/lustre/linux-lustre.c...: Address coverity defects in
src/components Thanks Gary Mohr ---------------- This patch file contains
fixes for defects reported by Coverity in the /src/components directory.
Mostly these changes just make sure that char buffers get null terminated so
that when they get used as a C string (they usually do) we will not end up
with unpredictable results. A problem has been reported by one of our
testers that the lustre component produced very long event names with lots of
unprintable garbage in the names. It turns out this was caused by a buffer
that filled up and never got null terminated. Then string functions were
used on the buffer which picked up the whole buffer and lots more. These
changes fixed the problem.
* 266c61a4 src/linux-common.c src/papi_hl.c src/papi_internal.c...: Address
coverity reported issues in src/ Thanks to Gary Mohr -------------------
Changes in this patch file: linux-common.c: Add code to insure that cpu
info vendor_string and model_string buffers are NULL terminated strings.
Also insure that the value which gets read into mdi->exe_info.fullname gets
NULL terminated. This makes it safe to use the 'strxxx' functions on the
value (which is done immediately after it is read in). papi_hl.c: Fix call
to _hl_rate_calls() where the third argument was not the correct data type.
papi_internal.c: Add code to insure that event info name, short_desc, and
long_desc buffers are NULL terminated strings. papi_user_events.c: While
processing define symbols, insure that the 'local_line', 'name', and 'value'
buffers get NULL terminated (so we can safely use 'strxxx' functions on
them). Insure that the 'symbol' field in the user defined event ends up NULL
terminated. Rearrange code to avoid falling through from one case to the next
in a switch statement. Coverity flagged falling out the bottom of a case
statement as a potential defect but it was doing what it should.
sw_multiplex.c: Unnecessary test. The value of ESI can not be NULL when
this code is reached. x86_cpuid_info.c: The variable need_leaf4 is set but
not used. The only place it gets set returns without checking its value.
The place that checks its value never could have set its value non-zero.
* d72277fc release_procedure.txt: Update release procedure, check buildbot!
* a0e4f9a7 src/components/perf_event_uncore/perf_event_uncore.c: Uncore
component fix: By Gary Mohr, The line that sets exclude_guests in the uncore
component is there because it also there in the core component. But when I
look at it closer, it is an error in both cases. I will submit a patch to
remove them and get rid of some commented out code that no longer belongs in
the source. The uncore events do not support the concept of excluding the
host or guest os so we should never set either bit. But the core events do
support this concept and libpfm4 provides event masks "mg" and "mh" to
control counting in these domains. By default if neither is set then libpfm4
excludes counting in the guest os, if either "mh" or "mg" is provided as an
event mask then it is counted but the other is excluded, and if both are
provided then both are counted. So when the code forces the exclude_guest
bit to be set it breaks the ability to fully control what will happen with
the masks. I did not notice the uncore part of this problem when testing on
my SNB system, probably because we use an older kernel which tolerated the
bit being set (or maybe because HSW is handled differently).
* 4499fee7 src/papi_internal.c: Thanks to Gary Mohr for the patch
--------------------- Fix in papi_internal.c where it was trying to look up
the event name. The RAPL component found the event and returned a code but
papi_internal.c exited the enum loop for that component but failed to exit
the loop that checks all of the components. This caused it to keep looking
at other components until it fell out of the outer loop and returned an
error. In addition to the actual change, some formatting issues were fixed.
* f5835c26 src/components/perf_event/perf_event_lib.h: Bump NUM_MPX_COUNTERS
for linux-perf Uncore on SNB and newer systems has enough counters to go
beyond the 64 array spaces we allocate. This needs a better long term
solution. Reported by Gary Mohr --------------------- When running on snbep
systems with the uncore component enabled, if papi is configured with
debug=yes then the message "Warning! num_cntrs is more than num_mpx_cntrs"
gets written to stderr. This happens because the snbep uncore pmu's have a
total of 81 counters and PAPI is set to only accept a maximum of 64 counters.
This change increases the amount PAPI will accept to 100 (prevents the
warning message from being printed). ---------------------
* 07990f85 src/ctests/branches.c src/ctests/calibrate.c
src/ctests/describe.c...: ctests/ Address coverity reported defects Thanks
to Gary Mohr for the patch --------------------------------- he contents of
this patch file fix defects reported by Coverity in the directory
'papi/src/ctests'. The defect reported in branches.c was that a comparison
between different kinds of data was being done. The defect reported in
calibrate.c was that the variable 'papi_event_str' could end up without a
null terminator. The defects reported in describe.c, get_event_component.c,
and krentel_pthreads.c were that return values from function calls were being
stored in a variable but never being used. I also did a little clean-up in
describe.c. This test had been failing for me on Intel NHM and SNBEP but now
it runs and reports that it PASSED. ---------------------------------
* 74cb07df src/testlib/test_utils.c: testlib/test_util.c: Check enum return
value Addresses an issue found by Coverity. Thanks Gary Mohr,
---------------- The changes in this patch file fixes the only defect in the
src/testlib directory. The defect reported that the return value from a call
to PAPI_enum_cmp_event was being ignored. This call to enum events is to get
the first event for the component index passed into this function. It turns
out that the function that contains this code is only ever called by the
overflow_allcounters ctest and it only calls once and always passes a
component index of 0 (perf_event). So I added code to check the return value
and fail the test if an error was returned. ----------------
* 74041b3e src/utils/event_info.c: event_info utility: address coverity
defect From Gary Mohr -------------- This patch corrects a defect reported
by Coverity. The defect reported that the call to PAPI_enum_cmp_event was
setting retval which was never getting used before it got set again by a call
to PAPI_get_event_info. After looking at the code, I decided that we should
not be trying to get the next event inside a loop that is enumerating masks
for the current event. It makes more sense to break out of the loop to get
masks and let the outer loop that is walking the events get the next event.
* 62dceb9b src/utils/native_avail.c: Extend 'papi_native_event --validate' to
check for umasks.
* a5c2beb2 src/components/perf_event/pe_libpfm4_events.c
src/components/perf_event/perf_event.c...: perf_event[_uncore]: switch to
libpfm4 extended masks Patch due to Gary Mohr, many thanks.
------------------------------------ This patch file contains the changes to
make the perf_event and perf_event_uncore components in PAPI use the libpfm4
extended event masks. This adds a number of new masks that can be entered
with events supported by these components. They include a mask 'u' which can
be used control if counting in the user domain should be enabled, a mask 'k'
which does the same for the kernel domain, and a mask 'cpu' which will cause
counting to only occur on a specified cpu. There are also some other new
masks which may work but have not been tested yet.
* e76bbe66 src/components/perf_event/pe_libpfm4_events.c
.../perf_event_uncore/perf_event_uncore.c...: General code cleanup and
improved debugging Thanks to Gary Mohr ------------------- This patch file
does general code cleanup. It modifies the code to eliminate compiler
warnings, remove defects reported by coverity, and improve traces.
* 8f2a1cee src/utils/error_codes.c: error_codes utility: remove internal bits
Remove dependency on _papi_hwi_num_errors, just keep calling PAPI_strerror
until it fails. We shouldn't be using internal symbols anyways.
* a7136edd src/components/nvml/README: Update nvml README We changed the
options to simplify the configure line. Bad information is worse than no
* a37160c1 src/components/perf_event/perf_event.c: perf_event.c: cleanup
error messages Thanks to Gary Mohr ------------------- This patch contains
general cleanup code. Calls to PAPIERROR pass a string which does not need
to end with a new line because this function will always add one. New lines
at the end of strings passed to this function have been removed. These
changes also add some additional debug messages.
* bf55b6b7 src/papi_events.csv: Update HSW presets Thanks to Gary Mohr
------------------- Previously we sent updates to the PAPI preset event
definitions to improve the preset cache events on Haswell processors. In
checking the latest source, it looks like the L1 cache events changes did not
get applied quite right. Here is a patch to the latest source that will make
it the way we had intended.
* eeaef9fa src/papi.c: papi.c: Add information to API entry debuging Thanks
to Gary Mohr ------------------- This patch contains the results of taking a
second pass to cleanup the debug prints in the file papi.c. It adds entry
traces to more functions that can be called from an application. It also
adds lots of additional values to the trace entries so that we can see what
is being passed to these functions from the application.
* ee736151 src/run_tests.sh: run_tests.sh: more exclude cleanups Thanks Gary
Mohr ---------------- This patch removes an additional check for Makefiles in
the script. The exclude files are now used to prevent Makefiles from getting
run by this script. I missed this one when providing the previous patch to
make this change.
* c37afa23 src/papi_internal.c: papi_internal.c: change SUBDBG to INTDBG
Thanks to Gary Mohr ------------------- This patch contains changes to
replace calls to SUBDBG with calls to INTDBG in this source file. This
source file should be using the Internal debug macro rather than the
Substrate debug macro so that the PAPI debug filters work correctly. These
changes also add some new debug calls so that we will get a better picture of
what is going on in the PAPI internal layer. There are a few calls to the
SUBDBG macro that are in code that I have modified to add support for new
event level masks which are not converted by this patch. They will be
corrected when the event level mask patch is provided.
* e43b1138 src/utils/native_avail.c: native_avail.c: Bug fixes and updates
Thanks to Gary Mohr -------------------------------------------------- This
patch fixes a couple of problems found in the papi_native_avail program.
First change fixes a problem introduced when the -validate option was added.
This option causes events to get added to an event set but never removes
them. This change will remove them if the add works. This change also fixes
a coverity detected error where the return value from PAPI_destroy_eventset
was being ignored. Second change improves the delimitor check when
separating the event description from the event mask description. The
previous check only looked for a colon but some of the event descriptions
contain a colon so descriptions would get displayed incorrectly. The new
check finds the "masks:" substring which is what papi inserts to separate
these two descriptions. Third change adds code to allow the user to enter
events of the form pmu:::event or pmu::event when using the -e option in the