Next: About this document ... Up: Comparison of various runs Previous: Results

Discussion

The code listing and the table of comparisons along with corresponding graphs are included with this report. The contour plots for the 500 and 1000 iteration cases (for $800\times 800$ grid) are also shown. The 1 iteration case is omitted as nothing wortwhile appears in the plot and it reflects just the boundary conditions which are not clearly seen.

From the PSU-SP2 run, the code appears to be approximately $90 \%$ parallel (which seems to be increasing slowly with the increasing in the number of nodes). Since the problem size is not fixed, it is not expected that Amdahl's law will be followed. It is also seen that the parallel fraction of the code seems to be increasing with the increase in the number of processors, which is an indication of the Gustafson's law being followed. However, Gustafson's law is also not completely followed as the parallel efficiency seems to be going down with the number of processors rather than remaining constant. One of the possible reasons is that the communication to computation ratio is not fixed in all the runs. The 8 processor case uses a $4\times 2$ grid of processors which causes communication workload on each node to be more than for the $2\times 2=4$ node run or the $4\times 4=16$ node run. It is also observed that although the computational time remains almost constant in each case ( $\approx 3.6$ seconds for the PSU-SP2 and $\approx 0.83$ seconds for the NPACI-SP2), the communication time seems to be rising thus affecting the efficiency. Thus the trend from this minimal data is seen to be somewhat between Amdahl's law and Gustafson's law. More runs on higher number of processors are recommended for a better picture of the trend.

The comparison between HPF and MPI clearly indicates MPI to be much superior to HPF code. The MPI run not only has better computational time ( $40\%$ faster) for the same run, but also has offers better communication times (which seem to differ only by a constant for each run).

The MFLOPs per processor for the scaled problem case is seen to be decreasing but averages around 10 ( $4.0\%$ of peak) for the PSU-SP2 and 50 ( $7.8\%$ of peak) for the NPACI-SP2. This is somewhat in contradiction to the HPF run on the PSU-SP2 in which the MFLOPs appears to be more or less constant around 5 for the scaled problem case.

Next: About this document ... Up: Comparison of various runs Previous: Results

Anirudh Modi
4/10/1998