Job Submission

Introduction

NOTE

Assuming that you are logged in to the Arun HPCC's master node via terminal and you are comfortable with linux bash commands. Also assuming that you are familiar with basic slurm commands to check available resources, running jobs, node states etc.

Before you begin, you should be able to understand interactive jobs and batch jobs. Interactive jobs are performed in real time as like normal user-session on a computer but in a different fashion on HPCCs. On the other hand, batch jobs are performed with schedular's queue system. To test, run GUI applications etc in real time you might use the interactive session.

To start an interactive job session you have to ask the job schedular along with your required session-time, node(s), other configuaration via the srun command provided by SLURM. To learn about interactive jobs click here. Batch jobs are controlled via the sbatch command. To learn about the dirrerences between batch jobs and interactive jobs see the results here.

Interactive job

NOTE

Assuming that you are logged in to the HPCC and familiar with srun and module command. To explore more visit the official doc here.

If you want to run GUI applications on the HPCC during your interactive session don't forget to login via ssh with the -X flag.

[user@localmachine ~]$ ssh -X user@arun-hpcc.ru.ac.bd
1

To start an interactive session to perform real time jobs, tests. Try the below command

[user@master ~]$ srun --time=20:00 --account=raihan.k --nodes=1 --ntasks=16 --pty $SHELL
1

Basically you just asked the HPCC for an 20 minutes session with 1 node, 32 tasks per node with a pseudo bash shell for the session. The srun command and its arguments are explained below -

  • --time : Time period of the session.
  • --nodes : Number of node(s) for the interactive session.
  • --ntasks : Number of tasks will run on the node(s).
  • --account : Your account.
  • --pty : Execute task zero in pseudo terminal mode.

You'll be given a session with specified configuaration if available. In this case we got node1 as soon as we executed the above srun command -

[user@node1 ~]$ 
1

Now that you are given a session you can start executing your interactive jobs. First of all you would like to clear the modules to start a fresh environment by the below command.

[user@allocatednode ~]$ module purge
1

Now you can load your required modules and test, execute your intereactive jobs.

Running a job in interactive session

OVERVIEW

We are about to test a MPI job in this session. The job is aimed to caculate the PI using Monte Carlo method. In this case, the codes are placed in mpi_pi directory. The main MPI program is named mpi_pi.c and the job script is named as run.sh. To test this MPI job we need couple of modules namely openmpi and gcc.

First we need to load openmpi and gcc modules into our session.

[user@allocatednode ~]$ module load openmpi gcc
1

Now that we loaded our modules, we ar ready to test/run our job -

[user@allocatednode ~]$ bash mpi_pi/run.sh
1

Batch job

NOTE

Assuming that you have prior knowledge about batch jobs and related SLURM commands.

Job directory

Before you begin to write your first Batch Job Script, create a directory namely my_jobs or something like that to keep your jobs organized (recommended).

Create job directory

[user@master ~]$ mkdir my_jobs/
1

Switch to the job directory

[user@master ~]$ cd my_jobs/
1

Script file

NOTE

For this part assuming that you are familiar with terminal based text editor vi or vim. If you have prior experience with vi or vim it's a plus. Otherwise simply follow the file editing instructions with vim below. The nano text editor is also available on Arun-HPCC. You can make do with the nano text editor to write your script as well.

Create job script with vim

[user@master ~]$ vim first_job.sh
1

Sample Job Script

NOTE

At this point assuming that you are familiar with sbatch command and its arguments. The arguments for sbatch command is generally specified via the job script with #SBATCH term followed by the argument name & corresponding value.

Job script for generating random numbers job-script-writing

This sample job scripts uses some key arguments of sbatch command provided by SLURM. The sbatch arguments used in the job script are -

  • --job-name : Name of the job.
  • --nodes : Minimum number of nodes needed for the job.
  • --ntasks : Maximum of number tasks will run on per node.
  • --cpus-per-task : Number of CPUs needed per task.
  • -t : Runtime for the job.
  • -o : Job output reporting file
  • -e : Job error reporting file

To learn more about sbatch arguments see sbatch command here. The job error(s) will be reported in a file like slurm.random_number_generator.node1.71.err and similarly output in slurm.random_number_generator.node1.71.out file. Here %N specifies the node name where job ran and %j denotes the job ID provided by SLURM.

Submiting job script

WARNING

Make sure you wrote job script correctly and provided sbatch arguments correctly with #SBATCH term followed by the arguments and corresponding values. Reconsider setting the amount of time (dd:hh:mm) for the job to run. If the task is still running after the specified run time is reached, it will be automatically canceled by the scheduler.

Before you submit your job, you can test your script with the below command

[user@master my_jobs]$ sbatch --test-only first_job.sh
1

Submit the job with sbatch command

[user@master my_jobs]$ sbatch first_job.sh 
Submitted batch job 60
[user@master my_jobs]$
1
2
3

Here 60 is your job ID provided by the scheduler.

Managing jobs

NOTE

Assuming that you have prior knowledge about scancel and squeue commands.

Check job queue

[user@master ~]$ squeue
1

Queued jobs for a user

[user@master ~]$ squeue -u user
1

Cancel a submitted job with the batch job id

[user@master ~]$ scancel 60
1

Cancel all jobs for a user

[user@master ~] scancel -u user
1