LRZ Links
This page was compiled by:
DP <pollney@aei.mpg.de>
This page can be accessed and modified by anyone in the AEI numerical group via CVS:
cvs -d :pserver:your_id@cvs.aei.mpg.de co AEIWeb
Last modified:
$Date: 2004/02/27 23:37:04 $

Using the Leibniz-Rechenzentrum Hitachi (sr8000)

Getting an Account

Fill out the form here to get a new account. The accounts have a default password which you will be asked to change the first time you log in. Get the password from Denis.

Logging in:

The Hitachi is behind a firewall and does not allow connects from arbitrary machines. You can only log in from the Origin and from all machines on AEI's private internal network 172.16.*.* (this includes all Xeons and most peoples' laptops). Connect to one of these machines first and then type:

ssh -l username sr8000.lrz-muenchen.de

The default shell for new accounts is ksh. Use chsh to change this to something sensible. The path for tcsh is /usr/local/bin/tcsh and bash is usr/local/gnu/bin/bash. (See the Usage page for the sr8000 for more information).

Crosscompiling Cactus

WARNING. There is no C++ compiler available on hitcross and Hitachi has currentl y (23 May 2002) no plans to release a C++ cross compiler. Hence you cannot compi le any thorns which require C++ on hitcross, in particular FlexIO and the thorns which depend on it (Zorro for example). A workaround might be to compile the ot her libs on hitcross and then copy them to the Hitachi directory, build the C++ libs on Hitachi and link everything, but I have not tested this.

As of 20 Mar 2002 the recommended way to compile on the Hitachi is by using the machine hitcross. Compiling is much faster on this machine. You might need to update your Cactus src tree. Login to the Origin and then connect to hitcross.lrz-muenchen.de via

ssh -1 -l username hitcross.lrz-muenchen.de

Your home directory is mounted via NFS. NFS can be a bottleneck, however by creating a symlink to for the Cactus/configs directory to a local disk you will run into the problem of not being able to run xar, because it is remotely executed on the Hitachi.

In order to compile you need to download the hitcross config file from the cactuscode.org architecture page. Your executable will be build in Cactus/exe and you should be able to use it from the Hitachi.

Compiling and Running Cactus on the Hitachi

The processors on the Hitachi are grouped into partitions. You need to specify the partitiion on which you want to run any interactive or batch job. For interactive runs, specify IAPAR. For batch runs, use PARALLEL. You can specify the default processor partition and job-type using environment variables:

csh:
setenv JOBTYPE SS
setenv DEFPART IAPAR
bash:
export JOBTYPE=SS
export DEFPART=IAPAR

Configuring Cactus:

Copy the machine configuration file from the cactus website.

gmake hiux-config options=options-filename

Compiling:

Please note that compiling on the Hitachi itself is a lengthy procedure. It is recommended to crosscompile on a Linux PC, which is much faster. See the documentation above.

To compile in parallel use the prun command for executing parallel non-MPI jobs:

prun -p IAPAR gmake hiux FJOBS=2 TJOBS=4

Running interactively:

To run in parallel interactively use mpiexec. The JOBTYPE environment variable has to be set, as mentioned above.

mpiexec -p IAPAR -n 2 ./cactus_hiux brbr.par

Batch jobs:

The qs2 script has been modified to work on sr8000. It is recommended to use qs2, since it automatically sets the environment variable:

setenv _MALLOC_ALGORITHM 0301

to avoid the memory problems with the new malloc routine. If you want to submit batch jobs and not use qs2, DO NOT forget to set this environment variable yourself. For more information of the batch system on sr8000 see:

http://www.lrz-muenchen.de/services/compute/hlrb/jobs/

to use streaming you need to specify a port in the range 1030 to 1040:

httpd::port = 1031

If you are getting very poor performance on many nodes, it might be because the machine is putting adjacent processors on different nodes. Try using the option:

setenv MPIR_RANK_NO_ROUND yes

in your submission script.

Useful tools

To see your jobs in the queue do:

qstat

To see an overview of the usage of the different partitions do:

hpstatus

To see a continously updated (every 5 minutes) report of the performance of your job do:

userflops -j job_id

where job_id is reported by qstat.

To see a graphical overview of the performance of all nodes on the machine do:

sr8000view

An overview of activity on the machine is given by hpstatus or via a web interface:

http://www.lrz-muenchen.de/services/compute/hlrb/betriebszustand/

More detailed information on the queues can be found via web:

http://www.lrz-muenchen.de/services/compute/hlrb/betriebszustand/usageovw.html

which is updated every 10 minutes.

In the first table the status for the job classes is listed. More information on job classes is available. Essentially the interesting entries of the table are the NX where X stands for 8,16,..64 nodes. So if you requested a job with 10 nodes, then it will go to N16 and this would be the relevant class to watch. Longest Wait Time is the current estimate for which the queue algorithm checks past run times and max. time requested.

A version of xgraph can be found in Denis' home directory:

/home/h/h015zaj/bin/xgraph

The filesystems and backups/archiving are described at: http://www.lrz-muenchen.de/services/compute/hlrb/files/

Information on running interactive and batch jobs can be found at:

http://www.lrz-muenchen.de/services/compute/hlrb/files/

For some of the next webpages you need a username and password.

You get it by typing get_manuals_passwd on the sr8000.

Installed software is listed at:

http://www.lrz-muenchen.de/services/compute/hlrb/software1/

Support email address

The generic email address for all kinds of trouble is:

HLRB-Admin@lists.lrz-muenchen.de

If you are reporting an error please include the approximate time, the node and the relevant sections of log files if available.

If you want to contact a specific person, then a list of support people is also available.


This page last modified: $Date: 2004/02/27 23:37:04 $