<!doctype linuxdoc system>

<article>

<title>COCOA FAQ
<author>Anirudh Modi &lt;anirudh-modi@psu.edu&gt;
<date>v1.1, 1st February 1999
<abstract>
This is the FAQ (Frequently Asked Questions) for 
COCOA, the inexpensive Beowulf 
supercomputer of the Aerospace Department at Pennsylvania State University.
</abstract>

<sect>Introduction
<p><descrip>
<tag/What is <bf>COCOA</bf>?/
   <bf>COCOA</bf> stands for 
<bf>CO</bf>st effective <bf>CO</bf>mputing <bf>A</bf>rray.
It is a Beowulf class supercomputer. 
Beowulf is a multi computer architecture which can be used for parallel
computations. It is a system which usually consists of one server
node, and one or more client nodes connected together via Ethernet or
some other fast network. It is a system built using commodity hardware
components, like any office desktop PC with standard Ethernet adapters,
and switches. It does not contain any custom hardware components and is
trivially reproducible. <url url="http://cocoa.ihpca.psu.edu/">.

<tag/What hardware was used to build COCOA?/
26 WS-410 workstations from Dell <url url="http://www.dell.com">, each 
consisting of:
<enum>
<item>Dual 400 MHz Intel Pentium II Processors w/512K L2 cache 
<item>512 MB SDRAM 
<item>4 GB UW-SCSI2 Disk 
<item>3COM 3c509B Fast Ethernet adapter (100 Mbits/sec)
<item>32x SCSI CD-ROM drive
<item>1.44 MB floppy drive
<item>Cables
</enum>
In addition, the following were also used:
<enum>
<item>One Baynetworks 450T 24-way 100 Mbits/sec switch 
<item>Two 12-way Monitor/keyboard/mouse switches 
<item>Four 500 kVa Uninterruptible Power Supplies from APC.
<item>One monitor, keyboard, mouse and 54 GB of extra UW-SCSI2 hard disk space for one PC which was used as the server.
</enum>

<tag/What is the operating system on COCOA?/
	<bf>Linux</bf>! In specific, RedHat Linux 5.1 distribution 
<url url="http://www.redhat.com">.

Linux is a free version of the Unix operating system, and it runs on
all PC/i386 compatible computers (and now also on PowerPCs, Alphas,
Sparcs, Mips, Ataris, and Amigas). The Linux kernel is written by Linus
Torvalds &lt;torvalds@transmeta.com&gt; and other volunteers. 
Most of the programs running under Linux are generic Unix freeware, many
of them from the GNU project.

<tag/What software is installed on COCOA?/
On the server, the following software is installed:
<enum>
<item>Base packages from RedHat Linux 5.1 distribution 
<url url="http://www.redhat.com">
<item>Freeware GNU C/C++ compiler as well as Pentium optimized GNU C/C++ compiler 
(<it>gcc, pgcc</it>)
<item>Fortran 77/90 compiler and debugger by Portland Group
<item>Freeware <bf>M</bf>essage <bf>P</bf>assing <bf>I</bf>nterface (MPI) libraries for parallel programming
in C/C++/Fortran 77/Fortran 90.
<item>Scientific Visualization Software TECPLOT from Amtec Corporation
<url url="http://www.amtec.com">
</enum>

<tag/How much did COCOA cost?/
	Approximately <it>$100,000</it>!

</descrip>

<sect>Details on how COCOA was built
<sect1>Setting up the hardware
<p>
Setting up the hardware was fairly straight-forward. Here are the
main steps:
<enum>
<item>Unpacked the machines, mounted them on the rack and numbered them.

<item>Set up the 24-port network switch and connected one of the 100
Mbit ports to the second ethernet adapter of the server which was meant
for the private network. The rest of the 23 ports were connected to
the ethernet adapters of the clients. Then an expansion card with 2
additional ports was added on the switch to connect the remaining 2 clients.

<item>Stacked the two 16-way monitor/keyboard switches and connected the
video-out and the keyboard cables of each of the 25 machines and the
server to it. A single monitor and keyboard were then hooked to the switch
which controlled the entire cluster.

<item>Connected the power cords to the four UPS.

</enum>

<sect1>Setting up the software
<p>
Well, this is where the real effort came in! Here are the main steps:
<enum>

<item>The server was the first to be set up. RedHat Linux 5.1 was 
installed on it using the bundled CD-ROM. Most of the hardware
was automatically detected (including the network card), so the
main focus was on partitioning the drive and choosing the relevant
packages to be installed. A 3 GB growable root partition was created for
the system files and the packages to be installed. Two 128 MB swap
partitions were also created and the rest of the space (50 GB) was used
for various user partitions. It was later realised that a separate
<tt>/tmp</tt> partition of about 1 GB was a good idea.

<item>The latest stable Linux kernel (then <tt>#2.0.36</tt>) was downloaded
and compiled with SMP support using the Pentium GNU CC compiler
<tt>pgcc</tt> <url url="http://www.goof.com/pcg/"> (which generates
highly optimised code specifically for the Pentium II chipset) with
only the relevant options required for the available hardware. The
following optimisation options were used: <tt>pgcc -mpentiumpro -O6
-fno-inline-functions</tt>. Turning on SMP support was just a matter
of clicking on a button in the <it>Processor type and features</it>
menu of the kernel configurator (started by running <tt>make xconfig</tt>).

<item>The new kernel-space NFS server for linux (<it>knfsd</it>)
<url url="http://www.csua.berkeley.edu/~gam3/knfsd/"> was installed
to replace the earlier user-space NFS server to obtain improved NFS
performance. For quick and hassle-free installation, a RedHat RPM
package was obtained from <url url="http://rufus.w3.org/linux/RPM/">, a
popular RPM repository. The default options were used.

<item><tt>ssh</tt> was downloaded from <url url="http://www.cs.hut.fi/ssh/">,
compiled and installed for secure access from the outside world. 
<tt>ssh-1.2.26</tt> was preferred over the newer <tt>ssh-2.0.11</tt> 
as <tt>ssh v2.x</tt> was much slower as well as backward incompatible.
<tt>sshd</tt> daemon was started in runlevel 3 under <tt>/etc/rc.d/rc3.d</tt>.
Recently, RedHat RPMs for <tt>ssh</tt> have started appearing in
<url url="http://rufus.w3.org/linux/RPM/"> and several other RPM
repositories, which make it much easier to install.


<item>Both the 3c905B ethernet adapters were then configured;
one that connected to the outside world (<tt>eth1</tt>) with the
real IP address <tt>128.118.170.11</tt>, and the other
which connected to the private network (<tt>eth0</tt>) using a dummy
IP address <tt>10.0.0.1</tt>. Latest drivers for the 3COM
3c905B adapters written by Donald Becker (3c59x.c v0.99H <url
url="http://cesdis.gsfc.nasa.gov/linux/drivers/vortex.html">)
were compiled into the kernel to ensure 100 Mbit/sec Full-duplex
connectivity. This was checked using the <tt>vortex-diag</tt> utility
<url url="http://cesdis.gsfc.nasa.gov/linux/diag/vortex-diag.c">.
For the configuration, the following files were modified: 
<tt>/etc/sysconfig/network</tt>,
<tt>/etc/sysconfig/network-scripts/ifcfg-eth0</tt> and
<tt>/etc/sysconfig/network-scripts/ifcfg-eth1</tt>. Here is how
they looked for me after modification:
<p>
<tt>/etc/sysconfig/network</tt>:
<tscreen><verb>
NETWORKING=yes
FORWARD_IPV4=no
HOSTNAME=cocoa.ihpca.psu.edu
DOMAINNAME=ihpca.psu.edu
GATEWAY=128.118.170.1
GATEWAYDEV=eth1
NISDOMAIN=ihpca.psu.edu
</verb></tscreen>

<tt>/etc/sysconfig/network-scripts/ifcfg-eth0</tt>:
<tscreen><verb>
DEVICE=eth0
IPADDR=10.0.0.1
NETMASK=255.255.255.0
NETWORK=10.0.0.0
BROADCAST=10.0.0.255
ONBOOT=yes
BOOTPROTO=none
USERCTL=no
</verb></tscreen>

<tt>/etc/sysconfig/network-scripts/ifcfg-eth1</tt>:
<tscreen><verb>
DEVICE=eth1
IPADDR=128.118.170.11
NETMASK=255.255.255.0
NETWORK=128.118.170.0
BROADCAST=128.118.170.255
ONBOOT=yes
BOOTPROTO=none
USERCTL=no
</verb></tscreen>

<item>For easy and automated install, I decided to boot each of the PCs
from the network using the BOOT protocol. The BOOTP server was enabled
by uncommenting the following line in <tt>/etc/inetd.conf</tt>:

<tscreen><verb>
bootps  dgram   udp     wait    root    /usr/sbin/tcpd  bootpd
</verb></tscreen>

A linux boot floppy was prepared with the kernel support for 3c905B network
adapter which was used to boot each of the client nodes to note down
their unique 96-bit network hardware address (eg. 00C04F6BC052). Using
these address,
the <tt>/etc/bootptab</tt> was edited to look like:

<tscreen><verb>
.default:\
        :hd=/boot:bf=install.ks:\
        :vm=auto:\
        :dn=hpc.ihpca.psu.edu:\
        :gw=10.0.0.1:\
        :rp=/boot/client/root:

node1:ht=ethernet:ha=00C04F6BC0B8:ip=10.0.0.2:tc=.default
node2:ht=ethernet:ha=00C04F79AD76:ip=10.0.0.3:tc=.default
node3:ht=ethernet:ha=00C04F79B5DC:ip=10.0.0.4:tc=.default
.
.
.
node25:ht=ethernet:ha=00C04F79B30E:ip=10.0.0.26:tc=.default
</verb></tscreen>

<item>The <tt>/etc/hosts</tt> file was edited to look like:
<tscreen><verb>
127.0.0.1       localhost       localhost.localdomain
# Server [COCOA]
128.118.170.11 cocoa.ihpca.psu.edu cocoa.aero.psu.edu cocoa

# IP address <--> NAME mappings for the individual nodes of the cluster
10.0.0.1        node0.hpc.ihpca.psu.edu node0		# Server itself!
10.0.0.2        node1.hpc.ihpca.psu.edu node1
10.0.0.3        node2.hpc.ihpca.psu.edu node2
.
.
.
10.0.0.26       node25.hpc.ihpca.psu.edu node25
</verb></tscreen>

The <tt>/etc/host.conf</tt> was modified to contain the line:
<tscreen><verb>
order hosts,bind
</verb></tscreen>
This was to force the lookup of the IP address in the <tt>/etc/hosts</tt>
file before requesting information from the DNS server.

<item>The filesystems to be exported were added to <tt>/etc/exports</tt>
file which looked like:

<tscreen><verb>
/boot 		node*.hpc.ihpca.psu.edu (ro,link_absolute)
/mnt/cdrom	node*.hpc.ihpca.psu.edu (ro,link_absolute)
/usr/local  	node*.hpc.ihpca.psu.edu (rw,no_all_squash,no_root_squash)
/home1 		node*.hpc.ihpca.psu.edu (rw,no_all_squash,no_root_squash)
/home2 		node*.hpc.ihpca.psu.edu (rw,no_all_squash,no_root_squash)
/home3 		node*.hpc.ihpca.psu.edu (rw,no_all_squash,no_root_squash)
/home4 		node*.hpc.ihpca.psu.edu (rw,no_all_squash,no_root_squash)
</verb></tscreen>

<item>For rapid, uniform and unattended installation on each of the
client nodes, RedHat 5.1 KickStart installation was ideal.
Here is how my kickstart file <tt>/boot/install.ks</tt> looked like:
<tscreen><verb>
lang en
network --bootproto bootp
nfs --server 10.0.0.1 --dir /mnt/cdrom
keyboard us
zerombr yes
clearpart --all
part / --size 1600
part /local --size 2048
part /tmp --size 400 --grow
part swap --size 127
install
mouse ps/2
timezone --utc US/Eastern
rootpw --iscrypted kQvti0Ysw4r1c
lilo --append "mem=512M" --location mbr
%packages
@ Networked Workstation
%post
rpm -i ftp://10.0.0.1/pub/CLUSTER/RPMS/wget-1.5.0-2.i386.rpm
rpm -i ftp://10.0.0.1/pub/CLUSTER/RPMS/xntp3-5.93-2.i386.rpm
/usr/bin/wget ftp://10.0.0.1/pub/CLUSTER/kernel/vmlinuz -O/boot/vmlinuz
/usr/bin/wget ftp://10.0.0.1/pub/CLUSTER/conf/lilo.conf -O/etc/lilo.conf
/sbin/lilo
/usr/bin/wget ftp://10.0.0.1/pub/CLUSTER/conf/hosts.equiv -O/etc/hosts.equiv
sed "s/required\(.*securetty\)/optional\1/g" /etc/pam.d/rlogin > /tmp/rlogin
mv /tmp/rlogin /etc/pam.d/rlogin
</verb></tscreen>

For more info on RedHat KickStart installation, look at: <url
url="http://wwwcache.ja.net/dev/kickstart/KickStart-HOWTO.html">. In
one of the post installation commands above, the first line of the
<tt>/etc/pam.d/rlogin</tt> file is modified to contain:

<tscreen><verb>
auth       optional     /lib/security/pam_securetty.so
</verb></tscreen>

This is to required enable <tt>rlogin/rsh</tt> access from the server
to the client without password which is very useful for the software
maintenance of the clients. Also, the <tt>/etc/hosts.equiv</tt> file
mentioned above looks like this:

<tscreen><verb>
node0
node1
node2
node3
.
.
.
node25
</verb></tscreen>

The RedHat Linux 5.1 CD-ROM was then mounted as <tt>/mnt/cdrom</tt> on
the server which was NFS exported to the client nodes. A new kernel with
SMP support was compiled for the client nodes in very much the same way
as for the server and was used to replace the existing kernel in the
RedHat book diskette. This kernel however had lesser options compiled
in as it was only meant to act as a client. Additionally, option for
``kernel level autoconfiguration using BOOTP'' was enabled in the
<it>Networking options</it> menu of the kernel configurator. This was
required in order for the node to automatically get its IP address from
the server at boot time. Support for The configuration file of the boot
diskette was modified so as to boot directly in the KickStart mode.
All that was needed to configure each client now was to insert the
boot diskette, power-up the workstation and wait until the automatic
installation was completed. Simple, eh ?!

<item>As soon as all the clients were rebooted after installation, the
cluster was up and running! Some useful utilities like <tt>brsh</tt>
(<url url="http://www.beowulf.org/software/RPMS/beobase-2.0-1.i386.rpm">)
were installed to enable <tt>rsh</tt> a single identical command to each
of the client nodes. This was then used to make any fine changes to the
installation. NIS could have been installed to manage the user logins
on every client node, but instead a simple shell script was written
to distribute a common <tt>/etc/passwd</tt>, <tt>/etc/shadow</tt> and
<tt>/etc/group</tt> file from the server.

<item>Most of the services were disabled in <tt>/etc/inetd.conf</tt> for
each of the client nodes as they were unnecessary. The stripped down
<tt>/etc/inetd.conf</tt> for the client nodes finally looked like:

<tscreen><verb>
shell   stream  tcp     nowait  root    /usr/sbin/tcpd  in.rshd
auth    stream  tcp     nowait  nobody  /usr/sbin/in.identd in.identd -l -e -o
</verb></tscreen>

<item><it>automount</it> package was installed on each of the nodes to
automatically mount the various user partitions on demand. Although this
gave slightly improved NFS performance, it was found to be buggy and
unstable. Finally, it was decided that <it>automount</it> for Linux was not 
yet ready for prime-time and was removed in favor of conventional NFS
mounts.

<item>The Portland Group Fortran 77/90 and HPF compilers (commercial)
were then installed on the server.

<item>Source code for freeware implementation of MPI library, MPI-CH
was downloaded from <url url="http://www.mcs.anl.gov/mpi/"> and
compiled using <tt>pgcc</tt>. Installing it on the server on the
<tt>/usr/local</tt> partition was quite straight-forward with no
major hassles. The <tt>mpif77</tt> script was modified to suit our needs
and a similar <tt>mpif90</tt> was created. The 
<tt>/usr/local/mpi/util/machines/machines.LINUX</tt> was then modified
to add two entries for each client node (as they were dual-processor
SMP nodes). Jobs could now be run on the cluster using interactive
<tt>mpirun</tt> commands!

<item>A queueing system, DQS v3.0 was downloaded from 
<url url="http://www.scri.fsu.edu/~pasko/dqs.html">, compiled and
installed as <tt>/usr/local/DQS/</tt> making it available to all
the client nodes through NFS. Appropriate server and client changes
were then made to get it functional (i.e. adding the relevant services
in <tt>/etc/services</tt>, starting <tt>qmaster</tt> on the server and
<tt>dqs_execd</tt> on the clients) , although a few minor irritants were
encountered. These were mainly owing to the bad documentation for DQS.
It took a long time for me to figure out exactly how to configure the DQS
to recognize a slave node, but once it was done, setting up the same for 
rest of the nodes was trivial. Wrapper shell scripts were then written
by me for <tt>qsub</tt>, <tt>qstat</tt> and <tt>qdel</tt> which not only
beautified the original DQS output (which was <it>ugh</it> to begin
with!), but also added a few enhancements. For example, <tt>qstat</tt>
was modified to show the number of nodes requested by each pending
job in the queue. Also, three additional shell scripts <tt>qinfo</tt>,
<tt>qload</tt> and <tt>qmem</tt> were written to give some useful load
data for the nodes and the cluster resource utilization.


<item>COCOA was now fully-functional, up and running and ready for
benchmarking and serious parallel jobs! As with the kernel, use of
<tt>pgcc</tt> compiler was recommended for all the C/C++ codes. In
particular, using <tt>pgcc</tt> with options ``<tt>-mpentiumpro -O6
-funroll-all-loops</tt>'' for typical FPU intensive number crunching
codes resulted in <it>30 %</it> increase in execution speed over the
conventional <tt>gcc</tt> compiler.

</enum>
<p>
</sect>

<tt>This document is maintained by Anirudh Modi &lt;anirudh-modi@psu.edu&gt;.
Mail me if you have any questions and/or suggestions.</tt>

</article>

