next up previous
Next: Conclusion Up: AE 597: HW #3 Previous: Program listing

Results

Here is the timing for 16 processor runs on the NPACI IBM-SP2 (sp.npaci.edu).
Case Parallel run time (sec) Elements comm. Bytes comm.
1 0.03526 16384 131072
2 0.03004 0 0
3 0.05720 65536 524288
4 0.03524 16384 131072
5 0.03516 16384 131072


The NPACI SP2 has 128 thin node POWER2 Super Chip (P2SC) processors with 256 MBytes of memory on each processor running at 160 Mhz and are capable of a peak performance of 640 MFLOPS each. It is capable of a peak bi-directional data transfer rate of 110 MB/second between each node pair. Cases 1, 2 and 3 are for CSHIFT(A,1,1) with (block,block), ($\ast$,block) and (block,$\ast$) distribution respectively. Case 4 is for CSHIFT(A,3,1) with (block,block) distribution and case 5 is for triplet notation with (block,block) distribution. The program uses double precision elements in the array each of 8 bytes, hence the bytes communicated is simply the number of array elements communicated multiplied by 8.

Case Time (msec) Bytes MB/sec $\%$ of peak MB/s
1 0.522 131072 25.11 22.8
2 0.000 0 - -
3 2.716 524288 19.30 17.5
4 0.520 131072 25.21 22.9
5 0.512 131072 25.60 23.3



The communication time is calculated by the difference between the timings of cases 1, 3, 4 and 5 with case 2 as it is the case involving no communication.


next up previous
Next: Conclusion Up: AE 597: HW #3 Previous: Program listing
Anirudh Modi
3/20/1998