Running Cactus on the NCSA IA32 Cluster (Platinum)

This is a short introduction on how to get Cactus running on the IA32 Linux Cluster at the NCSA.

This machine will have 960 IA32 1GHz processors, each with .75GB memory, so that the whole machine has 720GB. Each node has two processors, and the nodes are connected with myrinet.

At the moment, half the machine is available for production use, and the rest is in various stages of testing.

The web pages for Platinum are at

http://www.ncsa.uiuc.edu/UserInfo/Resources/Hardware/IA32LinuxCluster/

Getting an Account

If you have an account on Platinum you will be able to logon with your usual NCSA (that is modi4) username and password. If you find you don't have an account, fill out the form linked here.

Logging in to platinum

ssh platinum.ncsa.uiuc.edu -l <user_name>

Note that because the name "platinum" maps to different nodes with different IP addresses you often need to edit your file .ssh/known_hosts to remove the line with platinum in it, which is a real pain.

The are no Globus GSI tools installed yet.

Setup

I followed the instructions in the initial message for adding a .forward file. You probably want to edit the .bashrc file to add the following so as not to delete files accidently:

alias rm='rm -i'
alias cp='cp -i'
alias mv='mv -i'

Filesystem

I think there is 100MB in your home directory, which also is set up with links to scratch space. See the web pages for the limits and purge time for scratch space.

Here are some instructions from Thomas Radke for speeding up output on platinum by using all the four NFS fileservers in parallel:

It assumes that Cactus is writing chunked files, one per node. Each I/O processor has associated with it a different filesystem. This is achieved by using the template "proc%u" somewhere in the IOHDF5::outdir_HDF5 parameter setting.

Since there are only four filesystems $HOME/storage[1357] available a mapping to the actual processor is needed. This cannot be done by Cactus itself and should be set up beforehand. I wrote this little perl script to do this. You just give it the number of nodes and procs, and it will create symlinks in the current directory, pointing to the individual filesystems.

So, what I'd suggest is the following:

Compiling Cactus

The standard compilation options on Platinum can be found on our Machine Configurations page.

Interactive Jobs

type
qsub -I -V -l walltime=00:30:00,nodes=2:ppn=2:prod
and wait

once you got the shell, you are in your home directory. So cd to where you want to and type
vmirun ./executable parfile
the system uses all the resources it got (in this case 2 nodes and 2 procs)

when you are done type
exit

you can also use the debug queue which has up to 32 (or maybe 64) processors and up to half an hour of run time. So far it takes a few minutes up to a few hours to wait for jobs to run on this queue.
There also is a sample script in /usr/local/doc/pbs/batch.sample you could look at

Submitting to Queues

Look at the NCSA pages for information about the different queues, briefly there are three queues, standard, weekend and debug.

You need to have a batch script to submit to the queues, I didn't work this out yet, but used instead the "submitjob" command, with something like

submitjob 4 2 00:10:00 debug pt test test.out "time /u/ac/gallen/Cactus/exe/cactus_benchmpi /u/ac/gallen/Cactus/examples/benchmpi/BenchADM_40l.par > /u/ac/gallen/out8"

The general form is something like:

submitjob 
<number of nodes> (that is half the number of procs you want)
<number of procs per node>(probably always 2, unless you're testing on one processor)
<run time>(hours:minutes:seconds)
<queue> (debug, standard, weekend)
<pt>(I think this is always pt)
<queue name>(This appears on qstat -a)
<test.out>(I never found this file!)
<run command> (needs quotes around it, I used full paths to be safe, but I think it is relative to the directory you submit from)

Check how your jobs are doing with

qstat -a

and remove them with

qdel <PID>

This page last modified: $Date: 2004/03/02 14:14:38 $