This is a short introduction on how to get Cactus running on the Compaq (lemieux) at the Pittsburgh Supercomputing Center. The Compaq machine has 750 4-processor nodes, with each node having 4GB, each processor runs at 1 GHz.
The PSC web pages for lemieux are at http://www.psc.edu/machines/tcs/lemieux.html
We have a 500000 SU allocation on lemieux. Fill out the form
linked here to get
a new account. To see the account details type xbanner.
ssh lemieux.psc.edu -l <user_name>
Your user account will have a 1Gb quota on $HOME.
Temporary data should be stored on the available scratch filesystems which are globally visible on all service and compute nodes. Use $SCRATCH and $SCRATCH2 to point to your own scratch space.
Data on $SCRATCH and $SCRATCH2 is subject to purging if the filesystem becomes full. For longer-term storage you should move your data to PSC's mass storage system (see this page for details).
Each compute node also has a $LOCAL filesystem which is
shared only between the 4 processors of that node. Access to this filesystem
is very fast and should be used in your runs for output of chunked
Cactus data (just change into $LOCAL before starting Cactus
in your qsub batch script, and then use a relative directory name for
IOHDF5::out_dir in your parameter file).
Once your Cactus job has finished you need to copy back its output files
from $LOCAL to some global filesystem using the
tcscp command:
tcscp -v -r -p ${RMS_NODES} '{compute}:$LOCAL' $SCRATCH
The standard configuration options can be found on the Cactus configurations page. Just put these options in your ${HOME}/.cactus/config configuration file on lemieux.
Single processor (non-MPI) jobs can be run as usual interactively. For MPI jobs you have to first request processors, using
qsub -I [-q <queue>] [-l rmsnodes=<#nodes>:<total #procs>]
with the appropriate number of compute nodes and processors (default is one node with 4 processors on the standard queue). Once the qsub command gives you an interactive shell on the requested nodes you can then run jobs using
prun [-N <#nodes>] [-n <total #procs>] cactus_<config> <parameter file>
(default is to run the job on all requested nodes and processors).
Note that it sometimes takes a long time to request nodes for
interactive jobs. Then you should try both the standard and the debug queue.
Check the status of queues using qstat (you can use
-a, -f, or -u <login name> for more info).
Check the status of the machine using rinfo.
To submit to the queues you need to create a qsub script
following the example below:
#!/bin/csh
# your job's runtime in HH:MM:SS
#PBS -l walltime=0:05:00
# the number of nodes:processors requested
#PBS -l rmsnodes=1:4
# use the projects command to find out your project name
#PBS -l rmsproject=<projectname>
# notify by email when the job has finished
#PBS -m e
set cactus=${HOME}/cactus/exe/cactus_hdf5
set parfile=${HOME}/cactus/par/hdf5.par
# Cactus is started by a temporary shell script which also cd's into $LOCAL
echo 'cd ${LOCAL}' > ${SCRATCH}/shell.$$
echo "${cactus} ${parfile}" >> ${SCRATCH}/shell.$$
prun /bin/sh ${SCRATCH}/shell.$$
/bin/rm ${SCRATCH}/shell.$$
# recursively copy all files on $LOCAL back to $SCRATCH, using parallel I/O
tcscp -v -r '{compute}:$LOCAL' $SCRATCH
Important: You must make sure in your parameter file that your Cactus job terminates before your batch job used up all of its walltime. Otherwise there will be no time left to copy back your data files from $LOCAL to $SCRATCH and, since $LOCAL is subject to purging, this basically means you lose all of your job's output data !!!
This page last modified: $Date: 2004/02/09 12:07:28 $