1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249
|
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" >
<head>
<title>Component PAPI Technology Pre-Release</title>
</head>
<body style="text-align: left">
<h1 style="text-align: center">
Component PAPI Technology Pre-Release (PAPI 3.9.0)</h1>
<h2>
</h2>
<h2>
Introduction</h2>
<p>
Component PAPI, or PAPI-C is an attempt to make the PAPI performance monitoring
programming interface available for more than just the hardware performance counters
found on the cpu. Performance counters are finding their way into a number of other
components of High Performance computing systems, such as network or memory controllers,
power or temperature monitors or even specialized processing units that may find
their way into future multicore processor implementations.</p>
<p>
The primary technical challenge for PAPI is to sever the very tight coupling between
the hardware independent layer of PAPI code and the hardware specific code necessary
to interface with the counters, and to do this without sacrificing performance.
Secondarily, once these two code layers have been functionally separated, the hardware
independent, or <em>Framework</em> layer must be modified to simultaneously support
multiple hardware dependent substrate layers, or <em>Components</em>.</p>
<p>
These changes cannot be accomplished without some modification of the PAPI user
interface. We have tried to keep these modifications minimal and transparent, and
have been successful at preserving most backward compatibility for applications
and tools that just want access to the cpu counters. We have introduced a small
number of new APIs and functionality to support the new abstractions of multiple
components. We have also modified the function of some APIs and data structures
to support a multi-Component landscape. These changes have been tabluated at the
bottom of this document, and are discussed below.</p>
<p>
This release of PAPI-C is a technology pre-release, implemented and tested on a
small number of platforms and components. The platforms include Intel Pentium III,
Pentium 4, Core2Duo, Itanium (I and II) and AMD Opteron. Work is underway to port
the other platforms currently supported by PAPI. Components (beside the cpu component)
currently available include an ACPI component for monitoring temperature where available;
a Myrinet MX component; and a 'toy' component that monitors network traffic as reported
in the linux/unix sbin/ifconfig directory.</p>
<h2>
API and Abstraction Changes</h2>
<h3>
</h3>
<h3>
EventSets</h3>
<p>
One of the key organizing data structures in PAPI is the EventSet. This serves as
a repository for all the events and settings necessary to define a counting regime.
EventSets are created, modified, added to, deleted from, and disposed of over the
life of a PAPI counting session. In traditional PAPI, multiple EventSets can exist
simultaneously, but only one can be active at any time. PAPI-C extends the concept
of an EventSet by binding it to a specific numbered Component. This component index
then signals which component the EventSet is paired with. Multiple EventSets can
be defined and active simultaneously, but only one EventSet per Component can be
enabled. We have adopted a <em>late-binding</em> model for associating an EventSet
with a Component. No changes are needed in the API call for creating an EventSet,
and the Set is bound to a Component when the first event is added. Any additional
events must then belong to the same Component. Occasionally it is desirable to modify
settings in an EventSet before an event is added. In this case, a new API, PAPI_assign_eventset_component(),
has been introduced to make this binding explicit. For now, PAPI Preset events are
only defined for the cpu component, which by convention is always component 0.</p>
<h3>
Events</h3>
<p>
For now, PAPI Preset events are only defined for the cpu component, which by convention
is always component 0. Since these event names and codes are available directly
in papi.h, they will continue to work with no modifications. Event codes for other
components are always mapped to native events available on that component and are
bound to the component with a 4-bit component ID field embedded in the event code
itself. These codes cannot be determined a priori, since they are an opaque id used
only by PAPI. They must be obtained by a call to PAPI_event_name_to_code(), which
will search all available native event tables and return a properly encoded value
if the event exists. As described above, the first event added binds an EventSet
to a Component; all following added events must belong to the same Component.</p>
<h3>
Component Housekeeping</h3>
<p>
A number of changes were made to support various housekeeping chores associated
with multiple Components. A new API, PAPI_num_components(), was added to provide
the number of active components in the current library. Also, PAPI_get_component_info()
replaces PAPI_get_substrate_info() and provides detailed information for a given
component. As mentioned above, since the cpu component is always assumed to exist,
it is always assigned as component 0. In addition, component 0 is always relied
on to provide the high resolution timer functionality behind the following APIs:
PAPI_get_real_cyc(), PAPI_get_virt_cyc(), PAPI_get_real_usec(), qnd PAPI_get_virt_usec().
One call, PAPI_num_hwctrs(), still functions as it did in traditional PAPI to provide
the number of physical cpu counters. It has been augmented by the new PAPI_num_cmp_hwctrs(),
to provide the number of counters for a specified component.</p>
<h3>
PAPI Options</h3>
<p>
The bulk of the visible changes in PAPI-C have occured in the general area of setting
and getting option values. Options can be either system-wide or component-specific.
This didn't matter in traditional PAPI with only one component. Now it does. In
order to preserve backward compatibility with code that only accesses the cpu component,
the PAPI_get_opt() and PAPI_set_opt() calls behave as before, with an implicit component
index of 0 for those options that are bound to a component. For those options that
are component specific, PAPI_get_cmp_opt() and PAPI_set_cmp_opt() take an addition
component index argument. Futher, two new convenience functions, PAPI_set_cmp_domain()
and PAPI_set_cmp_granularity() have been added for component specific setting of
these options. More subtly, two of the cases handled by PAPI_set_opt() now have
additional information included in the passed data structures. Both PAPI_DEFDOM
and PAPI_DEFGRN cases now require a component index to be provided in the passed
data structure, since available domains are component dependent and may differ widely
between cpu domains and, for example, network domains.</p>
<h2>
Building and Linking</h2>
<p>
There are very few visible changes in the build environment. As before, cpu components
are automatically detected by <em>configure</em> and included in the build. As new
components are added, each is supported by a<br />
<strong>--with-<cmp> = yes<br />
</strong>option on the <em>configure</em> command line. Currently supported component
options include:<br />
<strong>--with-acpi = yes</strong><br />
<strong>--with-mx = yes</strong><br />
<strong>--with-net = yes<br />
</strong>
</p>
<p>
It is intended that in the future, where possible, component support will be autodetected
by <em>configure</em> in a fashion similar to cpu architectures and automatically
included in the <em>make</em>.</p>
<p>
The <em>make</em> process currently compiles and links the sources for all requested
components into a single binary. This process is automatic and transparent once
the components are specified in the <em>configure</em> step. It is intended that
future releases will <em>make</em> each component independently and allow for dynamic
component loading at runtime.<br />
</p>
<h2>
Application Changes</h2>
<p>
Very few changes are needed to run existing PAPI-enabled applications under PAPI-C.
The discussion below highlights the changes we found necessary in porting our test
applications to the modified API:</p>
<ul>
<li>Any calls to PAPI_get_substrate_info() must be converted to calls to PAPI_get_component_info()
with a corresponding change in the type of the returned data structure. Correspondingly,
calls to PAPI_get_opt(SUBSTRATE_INFO) should be changed to PAPI_get_opt(COMPONENT_INFO)
or PAPI_get_cmp_opt(COMPONENT_INFO,0).</li>
<li>If an application creates an EventSet and then tries to set the domain or the mutliplex
options before adding events, the code will error. The fix is to call PAPI_assign_eventset_component()
with the desired component prior to setting options.</li>
<li>Calls to PAPI_set_opt() with either PAPI_DEFDOM or PAPI_DEFGRN options must set
the def_cidx field in the passed data structure.<br />
</li>
</ul>
<h2>
Summary of Changes</h2>
<h3>
New APIs:</h3>
<ul>
<li>const PAPI_component_info_t *PAPI_get_component_info(int
cidx)<br />
given a valid index, returns a component info structure as defined in papi.h<br />
returns NULL if out of range.<br />
Replaces PAPI_get_substrate_info()</li>
<li>int PAPI_num_components(void)</li>
<li>int PAPI_assign_eventset_component(int EventSet, int cidx)<br />
Explicitly bind an eventset to a component before events are added.<br />
Occasionally needed prior to manipulating eventset parameters like domain or multiplexing.</li>
<li>int PAPI_set_cmp_domain(int domain, int cidx) </li>
<li>int PAPI_set_cmp_granularity(int granularity, int cidx) </li>
<li>int PAPI_num_cmp_hwctrs(int cidx) </li>
<li>int PAPI_get_cmp_opt(int option, PAPI_option_t * ptr, int cidx)<br />
Handles options that explicitly require a component index:<br />
PAPI_DEF_MPX_USEC (shouldn't this one be system level?)<br />
PAPI_MAX_HWCTRS<br />
PAPI_MAX_MPX_CTRS<br />
PAPI_DEFDOM<br />
PAPI_DEFGRN<br />
PAPI_SHLIBINFO (shouldn't this one be system level?)<br />
PAPI_COMPONENTINFO </li>
</ul>
<h3>
</h3>
<h3>
Modified API Functionality:</h3>
<ul>
<li>int PAPI_enum_event(int *EventCode, int modifier)
<br />
Parses EventCode for component index.<br />
Enumerates only across component specified in EventCode</li>
<li>int PAPI_create_eventset(int *EventSet)
<br />
Eventsets are bound to components. This is ordinarily a late-binding process that
occurs when an event is added.</li>
<li>int PAPI_set_domain(int domain)
<br />
Implicitly sets domain of component 0; deprecated - maintained for backward compatibility.</li>
<li>int PAPI_set_granularity(int granularity)
<br />
Implicitly sets granularity of component 0; deprecated - maintained for backward
compatibility.</li>
<li>int PAPI_set_opt(int option, PAPI_option_t * ptr)
<br />
The PAPI_DEFDOM and PAPI_DEFGRN options now include a mandatory component index
field in the data structure</li>
<li>int PAPI_num_hwctrs(void)
<br />
Implicitly returns number of counters for component 0; deprecated - maintained for
backward compatibility.</li>
<li>int PAPI_get_opt(int option, PAPI_option_t * ptr)<br />
Behaves as before for options that don't require a component index;<br />
Implicitly returns values for component 0 for the following options:<br />
PAPI_DEF_MPX_USEC<br />
PAPI_MAX_HWCTRS<br />
PAPI_MAX_MPX_CTRS<br />
PAPI_DEFDOM<br />
PAPI_DEFGRN<br />
PAPI_SHLIBINFO (shouldn't this one be system level?)<br />
PAPI_COMPONENTINFO</li>
<li>The following 4 APIs always call the timer functions found in the cpu component
(component 0):<br />
PAPI_get_real_cyc()<br />
PAPI_get_virt_cyc()<br />
PAPI_get_real_usec()<br />
PAPI_get_virt_usec()</li>
</ul>
<h3>
Structural Changes:</h3>
<h3>
</h3>
<ul>
<li>Component 0 is always assumed to be the traditional cpu counter component.</li>
<li>Event codes now contain an embedded 4 bit COMPONENT_INDEX field to id one of 16
components<br />
No error checking is done yet to guarantee less than 16 components.</li>
<li>The PAPI_SUBSTRATEINFO case for PAPI_get_opt has been changed to PAPI_COMPONENTINFO<br />
multiplex info was moved from the component level to a separate PAPI_mpx_info_t
structure included at the hardware info level.</li>
<li>The PAPI_domain_option_t and PAPI_granularity_option_t structures now have a component
index field, def_cidx, required when setting default domains or granularities.</li>
</ul>
<h3>
New Error Messages:</h3>
<ul>
<li>PAPI_ENOINIT - PAPI hasn't been initialized yet </li>
<li>PAPI_ENOCMP - Component Index isn't set or out of rangeint PAPI_create_eventset(int
*EventSet) </li>
</ul>
</body>
</html>
|