Running Cactus on the NERSC SP3 (Seaborg)

This is a short introduction on how to get Cactus running on the IBM SP3 at NERSC in Berkeley California.

Seaborg has 380 compute nodes, and each node has 16 POWER3 processors. Additional nodes are available to provide logins, networking, access to filesystems etc.

The web pages for Seaborg are at http://hpcf.nersc.gov/computers/SP/

Getting an Account

Talk to Ed about getting an account on this machine.

Logging in to Seaborg

ssh seaborg.nersc.gov -l <user_name>

Setup

If you don't already have the tcsh set up as your default shell (type echo $SHELL to see what shell you have), do this by logging into sadmin.nersc.gov (same username and password) type chsh, answer yes to the "Change?" question, and select /usr/bin/tcsh.

You need to load environement modules to access various software. Type

 module avail
to get a list of available modules (see man module for more information). Environment modules provide a way to manage environment variables. In this way it is possible to easily use different versions of the same libraries, programs, ... A useful module is in particular the GNU one. To load this automatically add the line

module load GNU

to .login.ext in your home directory.

Filesystem

There are two scratch options. The environment variable $SCRATCH points to a GPFS filesystem. Each node also has a tmp directory.

Compiling Cactus

Cactus compiles out of the box, for MPI use MPI=NATIVE. See the Cactus configurations page for other options.

Interactive Jobs

You can submit jobs interactively using the poe command with the -procs option if you need more than 1 processor. Alternatively the debug queue is usually quick for 5 min test jobs. See the next section for this.

Often you won't get the resource immediately. You can use the -retry option.

poe ./cactus_test par-file.par -retry 90 -retrycount 100 -nodes 1 -procs 8

This will resubmit the job every 90 seconds and try 100 times to get 8 processors on a single node.

Submitting to Queues

The qs2 script can be used to submit jobs on seaborg. If you want to create the script on your own, the following information is useful

LoadLeveller is used to submit jobs, with this basic script

#@ class = debug       
#@ shell = /usr/bin/csh
#@ node = 16 
#@ tasks_per_node = 16
#@ network.MPI = csss,not_shared,us 
#@ wall_clock_limit = 0:05:00
#@ notification = always
#@ job_type = parallel
#@ output = $(host).$(jobid).$(stepid).out
#@ error = $(host).$(jobid).$(stepid).err
#@ queue

exe/cactus_bench arrangements/CactusBench/BenchADM/par/BenchADM_40l.par

This submits to the debug queue with 256 processors, with the maximum of 16 processors on each node. Just save something like this in a file MyScript and then submit it to the queue with

llsubmit MyScript

In order to get more information about available queues use llclass.

Use llq to see information about submitted jobs in the queues.

The NERSC web page about queues is at http://hpcf.nersc.gov/running_jobs/ibm/.

Open Ports for HTTPD Connections

Port numbers 40000-40016 have been opened up as inbound ports for Cactus network services like HTTPD.

Other Commands

Performance Issues

In order to find out how your code performs you can use hpmcount.

Note that BSSN_Benchmark results are very poor on seaborg. We typically achive 10% peak performance. However on seaborg one can just use many nodes and make up for the speed problem ;-)


Version: $Id: Seaborg.html,v 1.14 2004/02/09 12:07:27 swhite Exp $