A Program-level GrADS approach to using the Grid

Mark Mazina, Rice University

August 24, 2000

Much of this is an expansion of the August 6, 2000 Compiler/Resource Selector Interaction Scenario by Holly Dail, Otto Sievert, & Graziano Obertelli which documented various discussions at the Chicago Workshop, including an extended discussion of resource selection between Holly Dail, Otto Sievert, Mark Mazina and John Mellor-Crummey at the end of the workshop. Without their work, my task would have driven me to hide under the covers.

This document proposes an alternative to the library call grid invocation approach proposed by Jack Dongarra, et.al. at UTK. I understand Jack's proposal is focused on that critical first demo; and do not mean to suggest the GrADS PPS crowd has split into program-level vs library-level camps. (I obviously believe the PPS effort includes compiler people and library people)

We believe there is flexibility in the program-level approach to encapsulate the library-level approach while sparing grid-aware library writers from much "glue-it-together" work by providing standard object interfaces and default object instantiations. Besides reducing work for the library writers, we hope to address the following issues:

This description of the program-level approach is weak on the actual contract binding and runtime monitoring phases - now that we feel we have some understanding of how to get *to* the contract binding phase, I plead for help from the experts (or future experts) in that area.

All comments / suggestions will be greatly appreciated.




Definitions:



The functionality / services provided by each component:


A very brief scenario.

Our User provides the Builder with source code (may be annotated with resource selection or performance behavior information) *or* a handle to an existing IR code object previously created for the user.

The Builder constructs any required objects and returns a handle to the COP. Recall the COP includes the IR Code Object, AART (resource selection) Object, Mapper Object, and the Performance Model Object.

The User starts the Application Manager. This may be the standard GrADS Application Manager or a user designed one. The Application Manager needs to be passed the handle to the COP, I/O location information, the problem size information (specifically, information to allow calculation of memory requirements), plus any desired performance metrics, such as completion time.

The Application Manager uses the handle to the COP to retrieve a pointer to the AART, feeds it (the AART) information about the actual data size and passes the more completely specified AART, which has, in effect, become an abstract virtual machine model, to the Resource Selector.

The Resource Selector takes the incoming model, plus information about the state of the Grid resources, and develops a proposed concrete virtual machine with a topology that maps onto the virtual topology the AART presented it.

The Application Manager then calls the PPS Binding Phase, passing it the COP handle and the user's run-time information.

The PPS Binding Phase invokes the Mapper object to develop IR code for actual data layout, does final instantiation of the Performance Model Object and creates optimized binaries. For some Grid-aware libraries, it may need to arrange for dynamic linking to pre-built libraries for specific platforms. As early in it's effort as possible, the Binding Phase should verify the performance objectives can be met assuming model correctness. Fail-restart occurs by notification to the Application Manager that some objective can not be met based on the model.

The Application Manager, assuming we are not re-starting, then passes pointers to the Performance Model object and the binaries to the Executor. I assume the binaries know where the input data is and where the output is to go; the Executor should not need to be concerned with that piece. Recall that at this point, baring dynamic re-configuration due to poor performance, our topology is known. Even with dynamic re-configuration, it seems more appropriate to make it the executable's task to get input - perhaps by first getting a handle from the Grid Information Repository as to where the data is now.

Then BOB finishes everything (recall Big Opaque Box described above) Fail-restart should NOT discard the Performance Model object, off-line analysis will be desired.

Back to GrADS