Description ----------- LoopTool is a source to source transformation tool designed to improve the memory performance of large scale scientific applications. A core set of loop transformations has been implemented in the tool in an integrated fashion. LoopTool is aimed for use by scientists and application programmers who use hand-transformation methods to improve the performance of applications. Many of these hand-transformation techniques are performed automatically in this tool. One of the key features of LoopTool is the precise control that is provided for each optmization parameter. Through the use of source code annotation and command line parameters, a user can select individual transformations and parameters for each loop in the program. Thus LoopTool provides much more fine-grain control of transformations than is available in most commercial compilers. Current Release _______________ The current release is a i686-Linux binary. The language supported by this version is F77. Usage: _____ LoopTool [-h] -M [-options] - h Prints usage information - M modulename modulename is the name of an annotated F77 source file. See the annotation section for a description of the annotation mechanism. - g apply guard-free code generation. This option must be turned on if unroll-and-jam is being applied to one or more loops. The guard-free core generation flag can be set for individual loops using source code annotation - j apply unroll-and-jam. The unroll factors for each loop is taken from the source code annotation. The -g flag needs to be turned on. - r apply storage reduction - b apply blocking. The blocking factors for each loop is taken from source code annotation. - fusion is performed automatically. Agressiveness of the algorithm and the candidate loops for fusion are specified using source code annotation Source Code Annotation ______________________ Information about transformations and their parameters is supplied to LoopTool through annotations in the source code. An annotation statement appears in the code as a comment line that begins with the directive keyword dir$. An annotation statement can be associated with any loop in the program. To associate an annotation statement with a particular loop the statement has to be placed right before the loop header statment. The format of the annotation statements is as follows: cdir$ transformation_name transformation_param where the pair transformation_name and transformation_parameter is one of the following fuse fuse_id uj factor block factor clip flag The semantics of each of these annotations is described below. fuse fuse_id: The fuse directive is used to specify a fusion group for each loop in the program. LoopTool will attempt to fuse loops within the same fusion group at the same nesting depth. Any integer >= 1 is a valid value for fuse_id. If no fuse_id is specified then that loop is put into its own fusion group and is not fused with any other loops. uj factor: The uj directive is used to specify the unroll amounts for loops. If the directive is associated with an outer loop then the value of factor is used for unroll-and-jamming the outer loop. In the case of the innermost loop it is used for unrolling of that loop. A value of n for factor implies that the loop in quesion will be unrolled n - 1 times, resulting in n loop bodies. No unrolling is performed if the unroll directive is not present. block factor: The block directive is used to specify the blocking factor for loops. Any value >= 1 is legal for the block factor. clip flag: The clip flag is used to specify if guards will be lifted along a particular dimension of a loop nest. A value of 1 turns on guard-free code generation. The default is set to 0. The clip flag should be set to 1 if performing unroll-and-jam on. This is required for generating preloops for loops that are not exact multiples of unroll amounts Examples: ________ Following is an example of using the LoopTool with the Livermore 14 kernel (liv14.f). 1. Annotate code with fusion, blocking and unroll-and-jam directives: c dir$ fuse 1 c dir$ block 16 c dir$ uj 4 do k= 1,n vx(k)= 0.0d0 ... enddo c dir$ fuse 1 do k= 1,n vx(k)= vx(k) + ex1(k) + (xx(k) - xi(k)) * dex1(k) ... enddo c dir$ fuse 1 do k= 1,n rh(INT(ir(k)))= rh(INT(ir(k))) + fw - rx(k) ... enddo 2. Run LoopTool on the annotated F77 file LoopTool -M liv14.f -g -j -b 3. Transformed source code written to liv14.gen.f