Cactus on the NCSA Origin 2000

host name: modi4.ncsa.uiuc.edu
login info:
  • The NCSA Origin2000 consists of a pool of multi-processor machines
  • Only modi4 can be used for interactive work.
  • default shell: tcsh
specs:
  • modi4: 16 processors, 4 GB memory
  • all machines in total: 512 processors, 160 GB memory (technical summary).
  • The O2K operates its queues with the Load Sharing Facility (LSF) - no NQS
Queues:
  • Timeshared Queues: (queue name syntax: s,m,l= short,med,large t,j=time,job )
    maxMem.-5h CPU h
    per proc
    -50h CPU h
    per proc
    50-200h CPU hs
    per proc
    200-400h CPU hours
    per proc
    -2 Gb -8PEs vst_sj -8PEs st_sj -8PEs mt_sj -8PEs lt_sj
    2-4 Gb 9-16PEs vst_mj 9-16PEs st_mj 9-16PEs mt_mj 9-16PEs lt_mj
    4-8 Gb 17-64PEs vst_lj 17-64PEs st_lj 17-64PEs mt_lj 17-64PEs lt_lj

  • Timeshared Queues: (queue name syntax: s,m,l= short,med,large t,j=time,job )
    maxMem.# of PEs / Queue name hours (wall-clock)
    -64 Gb 128 PEs 128_ded_short 1:00h
    -64 Gb 128 PEs 128_ded_med 15:00h
    -64 Gb 128 PEs 128_ded_long 50:00h
    (weekend only)

  • Interactive: 16 PEs per user, 256 MB memory limit, 15 CPUminutes time limit

  • Important:
    • The CPU-time limit in the table has to be divided by the number PEs to get the "wall clock time" limit.
    • For the qs2 script, you specify the total cpu-time if you run in the normal queues (0-64 PEs). Multiply the per proc CPU hours in the table with number of procs you requested!
    • You specify wall-clock time for the dedicated queues (128 PEs) and the qs2script.
      Dedicated queues are tough on the allocation quota.

  • qs2: Since the jobs are distributed to different machines, who's /scratch partition is writeable for the batch system only, the qs2 script uses the following mechanism to spool jobs:
    • the script generates a directory on the permanent storage facility, named basedir_outdir, where basedir and outdir are the Cactus parameters ("nameofparfile" is expanded), it mimicks the directory hierrachy by underscores, since ftp cannot generate multiple dirs.
    • the executable and the parameterfile are transferred to this directory. The exe is renamed to outdir_exe.
    • at execution time, these files are transferred to the local scratch and are executed.
    • after execution, 1D,2D,3D and checkpoint files are tarred up independently and moved to the permanent storage directory.
    • NOTE: qs2 will only look in the outdir directory, if you put your checkpoint files someplace else, you need to modify the submission script.
    • Note: since this mechanism involves spooling of the submitted job, changes to the local parameter file will have no affect. If you don't want to loose you position in the queue, change the files in the perm.storage directory.
Job submission:
  • Running Jobs on the Origin
  • qs2 16 bhole.par 80:00 512M [optional args ...]
    will submit a job requesting 16 PEs, 512 MB of memory and 80 hours of CPU time for all procs. Since this is 5:00 hours per process, you end up in the vst_sj queue, which is ideal for debugging, etc.
  • qs2 128 bhole.par 15:00 1G [optional args ...]
    submits a job to the 128 PEs dedicated queue runtime 15:00h wall-clock, reqeusting 1 Gigbyte of memory.
  • busage gives job resource statistic on running and completed jobs.
  • bjobs displays the status of jobs, queues, and the system.
  • bpeekjobID# gives you the output written by the job.
  • bkilljobID# deletes a job from the queue.
  • bqueues [-l name] displays queue information
  • HINT: to see if your jobs starts of properly request cpu time of about a minute and less equal 9 procs, and your jobs will be served nearly instantenously in a special debug queue.
Filesystem:
Typemount pointcapacityquotas
home/u 17GB 25MB (limit for 7 days: 50MB)
scratch~/scratch-modi4/ 107GB

no quota - Purge!
File Size Removed after
> 1 MB      3 days
< 1 MB     14 days
permanent storage UniTree unlimited no quota
AFS/afs/aei.mpg.de/ - -
  • except for modi4, the scratch areas of all other machines in the cluster are read-only for user. Only from within the batch system you can write to those filesystems.
  • HINT: for interactiv runs keep the executable in your home directory and use the parameter basedir to have cactus send its output to the modi4-scratch area, use outdir as usual.
Documentation

The information on this page was originally compiled by Gerd Lanfermann. If you find places where this document doesn't correspond with your experience, please let me know.
Denis Pollney, pollney@aei.mpg.de
Last modified: 2000.03.19

This page last modified: $Date: 2004/03/02 14:14:38 $