Job scheduling on HPC resources

Architecture of a HPC Cluster

Modern High Performance Computing (HPC) resources are usually composed of a cluster of computing nodes that provide the user the ability to parallelize tasks and greatly reduce the time it takes to perform complex operations. A node is usually defined as a discrete unit of a computer system that runs its own instance of an operating system. Modern nodes have multiple chips, often known as Central Processing Units or CPUs, which each contain multiple cores each capable of processing a separate stream of instructions (such as a single Monte Carlo run). An example cluster configuration is shown in Figure 1.

garbage

Figure 1. An example cluster configuration

To efficiently make use of a cluster’s computational resources, it is essential to allow multiple users to use the resource at one time and to have an efficient and equatable way of allocating and scheduling computing resources on a cluster. This role is done by job scheduling software. The scheduling software is accessed via a shell script called in the command line. A scheduling  script does not actually run any code, rather it provides a set of instructions for the cluster specifying what code to run and how the cluster should run it. Instructions called from a scheduling script may include but are not limited to:

  • What code would you like the cluster to run
  • How would you like to parallelize your code (ie MPI, openMP ect)
  • How many nodes would you like to run on
  • How many core per processor would you like to run (normally you would use the maximum allowable per processor)
  • Where would you like error and output files to be saved
  • Set up email notifications about the status of your job

This post will highlight two commonly used Job Scheduling Languages, PBS and SLURM and detail some simple example scripts for using them.

PBS

The Portable Batching System (PBS) was originally developed by NASA in the early 1990’s [1] to facilitate access to computing resources.  The intellectual property associated with the software is now owned by Altair Engineering. PBS is a fully open source system and the source code can be found here. PBS is the job scheduler we use for the Cube Cluster here at Cornell.

An annotated PBS submission script called “PBSexample.sh” that runs a C++ code called “triangleSimulation.cpp” on 128 cores can be found below:

#PBS -l nodes=8:ppn=16    # how many nodes, how many cores per node (ppn)
#PBS -l walltime=5:00:00  # what is the maximum walltime for this job
#PBS -N SimpleScript      # Give the job this name.
#PBS -M email.cornell.edu # email address for notifications
#PBS -j oe                # combine error and output file
#PBS -o outputfolder/output.out # name output file

cd $PBS_O_WORKDIR # change working directory to current folder

#module load openmpi/intel # load MPI (Intel implementation)
time mpirun ./triangleSimulation -m batch -r 1000 -s 1 -c 5 -b 3

To submit this PBS script via the command line one would type:

qsub PBSexample.sh

Other helpful PBS commands for UNIX can be found here. For more on PBS flags and options, see this detailed post from 2012 and for more example PBS submission scripts see Jon Herman’s Github repository here.

SLURM

A second common job scheduler is know as SLURM. SLURM stands for “Simple Linux Utility Resource Management” and is the scheduler used on many XSEDE resources such as Stampede2 and Comet.

An example SLURM submission script named “SLURMexample.sh” that runs “triangleSimulation.cpp” on 128 core can be found below:


#!/bin/bash
#SBATCH --nodes=8             # specify number of nodes
#SBATCH --ntasks-per-node=16  # specify number of core per node
#SBATCH --export=ALL
#SBATCH -t 5:00:00            # set max wallclock time
#SBATCH --job-name="triangle" # name your job #SBATCH --output="outputfolder/output.out"

#ibrun is the command for MPI
ibrun -v ./triangleSimulation -m batch -r 1000 -s 1 -c 5 -b 3 -p 2841

To submit this SLURM script from the command line one would type:

sbatch SLURM

The Cornell Center  for Advanced Computing has an excellent SLURM training module within the introduction to Stampede2 workshop that goes into detail on how to most effectively make use of SLURM. More examples of SLURM submission scripts can be found on Jon Herman’s Github. Billy also wrote a blog post last year about debugging with SLURM.

References

  1. https://en.wikipedia.org/wiki/Portable_Batch_System

3 thoughts on “Job scheduling on HPC resources

  1. Hi, nice post!

    We can also install the Torque scheduler on any multicore machine. I have it installed on my core i7 (which has a total of 8 cores) machine and I use it to run an atmospheric global circulation model. The scheduler is useful because it let me test and make low resolution experiments in my desktop machine using almost the same scripts. By doing that I avoid the long queued jobs on our supercomputer.

    Cheers,

    Carlos

  2. Hey Dave, I am following this blogpost and your presentation and in the SLURM script in the presentation you also included
    #SBATCH –partition=compute
    Is that something that’s necessary?

  3. Pingback: Ringing in the year with a lil’ bit if Tkinter – Water Programming: A Collaborative Research Blog

Leave a comment