| LRZ Links |
|---|
| -
Hitachi sr8000
- Basic usage - Queue system - File system |
|
This page was compiled by:
DP <pollney@aei.mpg.de> |
This page can be accessed and modified by anyone in the AEI numerical
group via CVS:
cvs -d :pserver:your_id@cvs.aei.mpg.de co AEIWeb
|
| Last modified:
$Date: 2004/02/27 23:37:04 $ |
Fill out the form here to get a new account. The accounts have a default password which you will be asked to change the first time you log in. Get the password from Denis.
The Hitachi is behind a firewall and does not allow connects from arbitrary machines. You can only log in from the Origin and from all machines on AEI's private internal network 172.16.*.* (this includes all Xeons and most peoples' laptops). Connect to one of these machines first and then type:
ssh -l username sr8000.lrz-muenchen.de
The default shell for new accounts is ksh.
Use chsh to change this to something sensible.
The path for tcsh is /usr/local/bin/tcsh and
bash is usr/local/gnu/bin/bash. (See the
Usage
page for the sr8000 for more information).
WARNING. There is no C++ compiler available on hitcross and Hitachi has currentl y (23 May 2002) no plans to release a C++ cross compiler. Hence you cannot compi le any thorns which require C++ on hitcross, in particular FlexIO and the thorns which depend on it (Zorro for example). A workaround might be to compile the ot her libs on hitcross and then copy them to the Hitachi directory, build the C++ libs on Hitachi and link everything, but I have not tested this.
As of 20 Mar 2002 the recommended way to compile on the Hitachi is by using the machine hitcross. Compiling is much faster on this machine. You might need to update your Cactus src tree. Login to the Origin and then connect to hitcross.lrz-muenchen.de via
ssh -1 -l username hitcross.lrz-muenchen.de
Your home directory is mounted via NFS. NFS can be a bottleneck, however
by creating a symlink to for the Cactus/configs directory to a
local disk you will run into the problem of not being able to run
xar, because it is remotely executed on the Hitachi.
In order to compile you need to download the hitcross config file from the cactuscode.org architecture page. Your executable will be build in Cactus/exe and you should be able to use it from the Hitachi.
The processors on the Hitachi are grouped into partitions. You need
to specify the partitiion on which you want to run any interactive or batch
job. For interactive runs, specify IAPAR. For batch runs,
use PARALLEL.
You can specify the default processor partition and job-type using environment
variables:
setenv JOBTYPE SS setenv DEFPART IAPAR
export JOBTYPE=SS export DEFPART=IAPAR
Copy the machine configuration file from the cactus website.
gmake hiux-config options=options-filename
Please note that compiling on the Hitachi itself is a lengthy procedure. It is recommended to crosscompile on a Linux PC, which is much faster. See the documentation above.
To compile in parallel use the prun command for executing
parallel non-MPI jobs:
prun -p IAPAR gmake hiux FJOBS=2 TJOBS=4
To run in parallel interactively use mpiexec. The
JOBTYPE environment variable has to be set, as mentioned above.
mpiexec -p IAPAR -n 2 ./cactus_hiux brbr.par
The qs2 script has been modified to work on sr8000. It is recommended to use qs2, since it automatically sets the environment variable:
setenv _MALLOC_ALGORITHM 0301
to avoid the memory problems with the new malloc routine. If you want to submit batch jobs and not use qs2, DO NOT forget to set this environment variable yourself. For more information of the batch system on sr8000 see:
http://www.lrz-muenchen.de/services/compute/hlrb/jobs/
to use streaming you need to specify a port in the range 1030 to 1040:
httpd::port = 1031
If you are getting very poor performance on many nodes, it might be because the machine is putting adjacent processors on different nodes. Try using the option:
setenv MPIR_RANK_NO_ROUND yes
in your submission script.
To see your jobs in the queue do:
qstat
To see an overview of the usage of the different partitions do:
hpstatus
To see a continously updated (every 5 minutes) report of the performance of your job do:
userflops -j job_id
where job_id is reported by qstat.
To see a graphical overview of the performance of all nodes on the machine do:
sr8000view
An overview of activity on the machine is given by hpstatus
or via a web interface:
http://www.lrz-muenchen.de/services/compute/hlrb/betriebszustand/
More detailed information on the queues can be found via web:
http://www.lrz-muenchen.de/services/compute/hlrb/betriebszustand/usageovw.html
which is updated every 10 minutes.
In the first table the status for the job classes is listed. More information on job classes is available. Essentially the interesting entries of the table are the NX where X stands for 8,16,..64 nodes. So if you requested a job with 10 nodes, then it will go to N16 and this would be the relevant class to watch. Longest Wait Time is the current estimate for which the queue algorithm checks past run times and max. time requested.
A version of xgraph can be found in Denis' home directory:
/home/h/h015zaj/bin/xgraph
The filesystems and backups/archiving are described at: http://www.lrz-muenchen.de/services/compute/hlrb/files/
Information on running interactive and batch jobs can be found at:
http://www.lrz-muenchen.de/services/compute/hlrb/files/
For some of the next webpages you need a username and password.
You get it by typing get_manuals_passwd on the sr8000.
Installed software is listed at:
http://www.lrz-muenchen.de/services/compute/hlrb/software1/
The generic email address for all kinds of trouble is:
HLRB-Admin@lists.lrz-muenchen.de
If you are reporting an error please include the approximate time, the node and the relevant sections of log files if available.
If you want to contact a specific person, then a list of support people is also available.
This page last modified: $Date: 2004/02/27 23:37:04 $