next up previous
Next: Discussion Up: Comparison of various runs Previous: Program Listing

Results

The timing runs are carried on two different SP-2 machines, the NPACI-SP2 and the Penn State SP-2. The comparisons for the PSU-SP2 runs are also provided.

The Penn State SP2 (mcnally.cac.psu.edu) has 32 POWER2 Super Chip (PSC model 390) processors with 128 MBytes of memory on each processor running at 66 Mhz and are capable of a peak performance of 250 MFLOPS each. It is capable of a peak bi-directional data transfer rate of 35 MB/second between each node pair.

The NPACI SP2 (sp.npaci.edu) has 128 thin node POWER2 Super Chip (P2SC) processors with 256 MBytes of memory on each processor running at 160 Mhz and are capable of a peak performance of 640 MFLOPS each. It is capable of a peak bi-directional data transfer rate of 110 MB/second between each node pair.


 
Table 1: Data for MPI run after 100 iterations on PSU-SP2
P N Comm Calc Total Mflops/P $\%$ R $\%$ Eff MB/s
1 400 0.14 3.63 3.77 16.99 - 100 -
4 800 1.76 3.85 5.61 11.40 83.7 67.1 2.92
8 1100 2.75 3.48 6.24 9.69 89.2 57.0 3.85
16 1600 4.59 3.31 7.90 8.11 92.7 47.7 4.49



 
Table 2: Data for MPI run after 100 iterations on NPACI-SP2
P N Comm Calc Total Mflops/P $\%$ R $\%$ Eff MB/s
1 400 0.050 0.839 0.889 71.98 - 100 -
4 800 0.115 0.828 0.943 67.84 98.0 94.2 44.86
8 1100 0.460 0.800 1.260 48.00 92.9 66.7 23.05
16 1600 0.289 0.834 1.123 57.00 98.2 79.2 71.35
32 1600 0.729 0.872 1.601 41.24 97.6 57.3 60.87



 
Table 3: Data for HPF run after 100 iterations on PSU-SP2
P N Comm Calc Total Mflops/P $\%$ R $\%$ Eff
1 400 6.35 5.36 11.73 5.46 - 100
8 1100 8.06 5.15 13.26 4.57 97.2 83.6
16 1600 10.62 5.39 16.07 3.99 97.5 73.0
32 2300 8.67 5.81 14.52 4.55 99.4 83.4



 
Table 4: Comparison table for HPF run vs MPI run after 100 iterations
Case P HPF Timing MPI Timing $\%$ Gain
1 1 11.73 3.77 311
2 4 - 5.61 -
3 8 13.26 6.24 213
4 16 16.07 7.90 203
5 32 14.52 - -



  
Figure 1: MFLOPs/proc vs Number of Processors
\begin{figure}
 \centerline{
\psfig 
{figure=mflops.ps,angle=-90,height=7cm,width=9cm}
}\end{figure}


  
Figure 2: Parallel fraction vs Number of Processors
\begin{figure}
 \centerline{
\psfig 
{figure=R.ps,angle=-90,height=7cm,width=9cm}
}\end{figure}


  
Figure 3: Parallel efficiency vs Number of Processors
\begin{figure}
 \centerline{
\psfig 
{figure=eff.ps,angle=-90,height=7cm,width=9cm}
}\end{figure}


  
Figure 4: Execution time comparison for MPI vs HPF case
\begin{figure}
 \centerline{
\psfig 
{figure=time.ps,angle=-90,height=7cm,width=9cm}
}\end{figure}


  
Figure 5: Communication speed vs Number of Processors
\begin{figure}
 \centerline{
\psfig 
{figure=comm.ps,angle=-90,height=7cm,width=9cm}
}\end{figure}


  
Figure 6: Temperature contours after 500 iterations
\begin{figure}
 \centerline{
\psfig 
{figure=500.eps,angle=0,height=8cm,width=9cm}
}\end{figure}


  
Figure 7: Temperature contours after 1000 iterations
\begin{figure}
 \centerline{
\psfig 
{figure=1000.eps,angle=0,height=8cm,width=9cm}
}\end{figure}


next up previous
Next: Discussion Up: Comparison of various runs Previous: Program Listing
Anirudh Modi
4/10/1998