GBIS Benchmark Header File: poly3


   ==================================================================
   ===                                                            ===
   ===      GENESIS/PARKBENCH Distributed Memory Benchmarks       ===
   ===                                                            ===
   ===                          POLY3                             ===
   ===                                                            ===
   ===                    R-infhat and F-half                     ===
   ===                  Communication Bottleneck                  ===
   ===                                                            ===
   ===               Versions:  PVM + Std F77                     ===
   ===                                                            ===
   ===               Author     : Roger Hockney                   ===
   ===     Department of Electronics and Computer Science         ===
   ===               University of Southampton                    ===
   ===               Southampton SO9 5NH, U.K.                    ===
   ===     fax.:+44-703-593045   e-mail:rwh@uk.ac.soton.pac       ===
   ===                                  vsg@uk.ac.soton.ecs       ===
   ===                                                            ===
   ===          Last update: November 1993; Release: 1.0          ===
   ===                                                            ===
   ==================================================================


1. Description
--------------

 This benchmark tests severity of communication bottlenecks by varying 
 the amount of arithmetic per word communicated, which is called the 
 computational intensity of the loop. The performance for long loop 
 (vector) lengths, RINF, is represented as :
 
               RINF = RHAT/(1 + FHALF/F)                          (1)

 where   RHAT = peak Mflop/s rate of arithmetic pipeline
                approached as F goes to infinity
   and      F = computational intensity
              = ratio floating operations/memory references
        FHALF = F required to obtain RINF=RHAT/2

 The loop executed is polynomial evaluation by Horners rule, where the
 computational intensity is equal to the order of the polynomial.

 The order and F is increases from 1 to 10, and the results for RINF
 for each value of F are fitted by least squares to equation (1), giving
 the best value of the parameters RHAT (R-infinity-hat) and FHALF
 (half-performance intensity) for this fit. 

 In POLY3, a master nose sends a vector of data over a communication link 
 to a slave processor, which evaluates the polynomial for each element 
 of the vector. The result vector is then returned to the master node.
 FHALF is then a measure of the ratio of the arithmetic performance 
 (Mflop/s) to the communication performance (Mword/s). FHALF measures
 an unwanted overhead, and a high value means that the computational
 intensity (or grain size) of the problem must be similarly high, 
 otherwise the problem performance will be limited by communication 
 bottleneck.

 For further details of the FHALF characterisation, Hockney and Jesshope,
 Parallel Computers-2, IOP Publishing, Bristol and New York, Chapter-1.
 
2. Operating Instructions
-------------------------

To compile and link the benchmark type: `make' . On some systems it
may be necessary to allocate the appropriate resources before running the
benchmark, eg. on the iPSC/860 to reserve a single processor, 
type:    getcube -t1. 

To run the benchmark type:     poly3

Output from the benchmark is written to the file "poly3.res"
 
NITER in the file poly3.inc can be varied to alter the number of repeats
made, and increase the accuracy of the time measurement. Values of 100
or 1000 would be usual when taking measurements. Values of 1 or 10 might
be used for short runs to test execution, but are probably too small for
satisfactory timing.

The order of executing of the kernel loop should be as specified in the
Fortran code (in SUBROUTINE DOALL). Nonesense results (e.g. negative FHALF)
may be produced if the compiler tampers with the loop ordering or does 
software pipelining. The polynomial must be completely evaluated for one 
value of the loop index-I (e.g. DO 310 loop) before the next value of I is 
taken.

$Id: ReadMe,v 1.2 1994/05/24 13:27:41 igl Exp igl $
High Performance Computing Centre
Submitted by Mark Papiani,
last updated on 10 Jan 1995.