Results

The Penn State SP2 (mcnally.cac.psu.edu) has 32 POWER2 Super Chip (PSC model 390) processors with 128 MBytes of memory on each processor running at 66 Mhz and are capable of a peak performance of 250 MFLOPS each. It is capable of a peak bi-directional data transfer rate of 35 MB/second between each node pair.

The NPACI SP2 (sp.npaci.edu) has 128 thin node POWER2 Super Chip (P2SC) processors with 256 MBytes of memory on each processor running at 160 Mhz and are capable of a peak performance of 640 MFLOPS each. It is capable of a peak bi-directional data transfer rate of 110 MB/second between each node pair.

**Table 1:** Data for MPI run after 100 iterations on PSU-SP2
P	N	Comm	Calc	Total	Mflops/P	$\%$ R	$\%$ Eff	MB/s
1	400	0.14	3.63	3.77	16.99	-	100	-
4	800	1.76	3.85	5.61	11.40	83.7	67.1	2.92
8	1100	2.75	3.48	6.24	9.69	89.2	57.0	3.85
16	1600	4.59	3.31	7.90	8.11	92.7	47.7	4.49

**Table 2:** Data for MPI run after 100 iterations on NPACI-SP2
P	N	Comm	Calc	Total	Mflops/P	$\%$ R	$\%$ Eff	MB/s
1	400	0.050	0.839	0.889	71.98	-	100	-
4	800	0.115	0.828	0.943	67.84	98.0	94.2	44.86
8	1100	0.460	0.800	1.260	48.00	92.9	66.7	23.05
16	1600	0.289	0.834	1.123	57.00	98.2	79.2	71.35
32	1600	0.729	0.872	1.601	41.24	97.6	57.3	60.87

**Table 3:** Data for HPF run after 100 iterations on PSU-SP2
P	N	Comm	Calc	Total	Mflops/P	$\%$ R	$\%$ Eff
1	400	6.35	5.36	11.73	5.46	-	100
8	1100	8.06	5.15	13.26	4.57	97.2	83.6
16	1600	10.62	5.39	16.07	3.99	97.5	73.0
32	2300	8.67	5.81	14.52	4.55	99.4	83.4

**Table 4:** Comparison table for HPF run vs MPI run after 100 iterations
Case	P	HPF Timing	MPI Timing	$\%$ Gain
1	1	11.73	3.77	311
2	4	-	5.61	-
3	8	13.26	6.24	213
4	16	16.07	7.90	203
5	32	14.52	-	-

**Figure 1:** MFLOPs/proc vs Number of Processors
$\begin{figure} \centerline{ \psfig {figure=mflops.ps,angle=-90,height=7cm,width=9cm} }\end{figure}$

**Figure 2:** Parallel fraction vs Number of Processors
$\begin{figure} \centerline{ \psfig {figure=R.ps,angle=-90,height=7cm,width=9cm} }\end{figure}$

**Figure 3:** Parallel efficiency vs Number of Processors
$\begin{figure} \centerline{ \psfig {figure=eff.ps,angle=-90,height=7cm,width=9cm} }\end{figure}$

**Figure 4:** Execution time comparison for MPI vs HPF case
$\begin{figure} \centerline{ \psfig {figure=time.ps,angle=-90,height=7cm,width=9cm} }\end{figure}$

**Figure 5:** Communication speed vs Number of Processors
$\begin{figure} \centerline{ \psfig {figure=comm.ps,angle=-90,height=7cm,width=9cm} }\end{figure}$

**Figure 6:** Temperature contours after 500 iterations
$\begin{figure} \centerline{ \psfig {figure=500.eps,angle=0,height=8cm,width=9cm} }\end{figure}$

**Figure 7:** Temperature contours after 1000 iterations
$\begin{figure} \centerline{ \psfig {figure=1000.eps,angle=0,height=8cm,width=9cm} }\end{figure}$