GrADSoft -- A Program-level approach to using the Grid

Mark Mazina, Rice University; Otto Sievert, Holly Dail of UCSD; with Graziano Obertelli of UCSD and John Mellor-Crummey of Rice.

February 24, 2001

Earlier source documents:

This document proposes an alternative to the library-managed grid invocation approach of the SC00 demo. Cooperative code development has been ongoing since late September 2000. The current code base documentation is here.

We believe there is flexibility in the program-level approach to encapsulate the library-level approach while sparing grid-aware library writers from much "glue-it-together" work by providing standard object interfaces and default object instantiations. Besides reducing work for the library writers, we hope to address the following issues:

This description of the program-level approach is weak on the actual contract binding and runtime monitoring phases - now that we feel we have some understanding of how to get *to* the contract binding phase, I plead for help from the experts (or future experts) in that area.

All comments / suggestions will be greatly appreciated.


The functionality / services provided by each component:

How does this compare to the July 2000 diagram developed at the ANL workshop?.

This paper addresses an overlapping view of the GrADS universe as compared to the above diagram that came out of the July workshop at ANL (aka Fran's diagram). First, the July diagram is more a high-level partitioning of the universe into logical blocks of responsibilities; while we focus more on object interactions just above the actual API level. We don't have the expertise to address in any significant way

The upper left intimate knowledge of application area on our diagram fills out the "mist" of what the PPS is doing in the ANL diagram. We also consolidated the Performance Modeling and Mapping and the Executable Preparation functionality into a single PPS binding phase. This doesn't address how Contract Development interacts with the binding phase as already noted.

Our design adds an Application Manager which can be thought of as sitting between all the other components, coordinating information flow. While we show objects in the COP as being pulled directly out of repositories by the Application Manager and the binding phase, in practice, we expect the Information Management System will provide pointers or handles for retrieval of repository objects.

A very brief scenario.

Our User provides the Builder with source code (may be annotated with resource selection or performance behavior information) *or* a handle to an existing IR Code object previously created for the user.

The Builder constructs any required objects and returns a handle to the COP. Recall the COP includes the IR Code object, AART Model object, Mapper object, and the Performance Model object.

The User starts the Application Manager. This may be the standard GrADS Application Manager or a user designed one. The Application Manager needs to be passed the handle to the COP, I/O location information, the problem size information (specifically, information to allow calculation of memory requirements), plus any desired performance metrics and other run-specific parameters desired or required.

The Application Manager uses the handle to the COP to retrieve the AART Model via a pointer, uses it (the AART Model) plus information about the actual data size to query the Resource Selector.

The Resource Selector takes the incoming model, plus information about the state of the Grid resources, and develops a proposed virtual machine. For simplicity, we assume just one in this example.

The Application Manager invokes methods in the Mapper object to fully instantiate said Mapper. It uses the AART Model and Virtual Machine as input in this process.

The Application Manager invokes methods in the Performance Model object to fully instantiate said Performance Model. It uses the Mapper, AART Model, and Virtual Machine as input in this process. At this point, the Application Manager can "run" the Performance Model and determine if the user's problem can be solved with the Grid resources available.

The Application Manager then calls the PPS Binding Phase, passing it the COP handle and the user's run-time information.

The PPS Binding Phase invokes the Mapper object for actual data layout, and creates optimized binaries. For some Grid-aware libraries, it may need to arrange for dynamic linking to pre-built libraries for specific platforms.

The Application Manager then tell the Binding Phase to pass pointers to the Performance Model object and the binaries to the Executor. I assume the binaries know where the input data is and where the output is to go; the Executor should not need to be concerned with that piece. Recall that at this point, baring dynamic re-configuration due to poor performance, our topology is known. Even with dynamic re-configuration, it seems more appropriate to make it the executable's task to get input - perhaps by first getting a handle from the Grid Information Repository as to where the data is now.

Then BOB finishes everything (recall Big Opaque Box described above). Fail-restart should NOT discard the Performance Model object, off-line analysis will be desired.

Back to GrADS