This is a short introduction on how to get Cactus running on the IBM Regatta at the Leibniz Rechenzentrum in Garching.
We currently have dedicated access to one node of this machine. The node has 32 processors (SP4) in 96GB memory in principle, though it seems that only 64GB are accessible at the current time.
The LRZ web pages for the Regatta system are at:
http://www.rzg.mpg.de/computing/IBM_P/Fill out the form linked from this page in order to get an account.
You can directly login to psi from any machine in the world using
ssh <user_name>@psi.rzg.mpg.de
Psi19 is behind a firewall, direct external access is only possible
from our origin and from all machines on AEI's virtual private
network (172.16.*.*) - this includes all Xeons and most peoples'
laptops. Otherwise you have to log into psi first, and from there
ssh to psi19.
Download Cactus as usual, using GetCactus.
Create a ~/.cactus directory, and create a file called config there, which contains the lines listed for the Regatta on the configurations page. You should be able to compile your Cactus source tree using these options.
The job manager on the Regatta is poe (Parallel Operating Environment). In order to use this, you will need to set up a host.list file in your home directory listing all of the nodes on which you would like to run. You will also need to add the host to your local rhosts.
psi19.rzg.mpg.de
psi19.rzg.mpg.de <user_name>
The command
octop -c -n <node name>
will display the CPU usage on the node node name.
Type octop -h to see more options.
You can run an interactive using a command such as:
poe ./cactus_test test.par -procs 4
Four queues have been set up on the psi19, which differ
in the length of jobs they will allow:
| Queue name | Time limit |
|---|---|
| short | 1 hour |
| huge | 12 hours |
| lhuge | 24 hours |
| infinite | 14 days |
llclass command to list the available
queues.
Batch jobs are submitted to the LoadLeveler system. A simple submission script looks like this:
#!/bin/sh # @ output = test-192.out # @ error = test-192.err # @ initialdir = /afs/ipp-garching.mpg.de/home/p/pollney/runs/test/ # @ class = huge # @ job_type = parallel # @ environment= COPY_ALL # @ node_usage= shared # @ node = 1 # @ tasks_per_node = 8 # @ resources = ConsumableCpus(1) # @ queue poe ./cactus_bhrun test-192.parYou can submit this script using the
llsubmit command:
llsubmit test.llwhere
test.ll is the name of the submission script
containing the above lines.
The qs2
script can be used to automate job submission. Use
qs2 16 cactus_test test.par 2:00:00
to submit a job on 16 nodes for 2 hours. If you leave out the time,
then the job will be submitted to the infinite queue with
a two-week time limit.
Use the command llq to see the queue.
Use the command llcancel job_# to cancel a job.
The local /batch filesystem has about 350 GB of space, and the local /scratch filesystem has 70GB. You can create your own directories under these.
In order to access a Cactus job using a web browser:
CactusConnect/HTTPD CactusConnect/HTTPDExtra CactusConnect/Socket CactusIO/IOJpeg CactusExternal/IOJpegand include these thorns in the
ActiveThorns of
your parameter file. See the instructions in the Webserver-HOWTO
for details of how to set up the HTTPD thorns.
httpd::port = 10000
Server started on http://psi19.rzg.mpg.de:10001/
Note: Connections to the Regatta system are only allowed from the addresses 194.94.224.*** and 80.86.11.[25,26,27,33,34,35,36]. For instance, you can start up a Netscape on sshserv and view your job, but not from your local Xeon.
If you include the thorn
AEIDevelopment/Announce
then you will also be able to connect to the run via the
ASC-portal.
This page last modified: $Date: 2004/03/02 14:14:38 $