Assuming that you are logged in to the Arun HPCC's master node via terminal and you are comfortable with linux bash commands. Also assuming that you are familiar with basic slurm commands to check available resources, running jobs, node states etc.
Before you begin, you should be able to understand interactive jobs and batch jobs. Interactive jobs are performed in real time as like normal user-session on a computer but in a different fashion on HPCCs. On the other hand, batch jobs are performed with schedular's queue system. To test, run GUI applications etc in real time you might use the interactive session.
To start an interactive job session you have to ask the job schedular along with your required session-time, node(s), other configuaration via the
srun command provided by SLURM. To learn about interactive jobs click here. Batch jobs are controlled via the
sbatch command. To learn about the dirrerences between batch jobs and interactive jobs see the results here.
Assuming that you are logged in to the HPCC and familiar with
module command. To explore more visit the official doc here.
If you want to run GUI applications on the HPCC during your interactive session don't forget to login via
ssh with the
[user@localmachine ~]$ ssh -X email@example.com
To start an interactive session to perform real time jobs, tests. Try the below command
[user@master ~]$ srun --time=20:00 --account=raihan.k --nodes=1 --ntasks=16 --pty $SHELL
Basically you just asked the HPCC for an 20 minutes session with 1 node, 32 tasks per node with a pseudo
bash shell for the session. The
srun command and its arguments are explained below -
- --time : Time period of the session.
- --nodes : Number of node(s) for the interactive session.
- --ntasks : Number of tasks will run on the node(s).
- --account : Your account.
- --pty : Execute task zero in pseudo terminal mode.
You'll be given a session with specified configuaration if available. In this case we got node1 as soon as we executed the above
srun command -
Now that you are given a session you can start executing your interactive jobs. First of all you would like to clear the modules to start a fresh environment by the below command.
[user@allocatednode ~]$ module purge
Now you can load your required modules and test, execute your intereactive jobs.
Running a job in interactive session
We are about to test a MPI job in this session. The job is aimed to caculate the PI using Monte Carlo method. In this case, the codes are placed in mpi_pi directory. The main MPI program is named mpi_pi.c and the job script is named as run.sh. To test this MPI job we need couple of modules namely openmpi and gcc.
First we need to load openmpi and gcc modules into our session.
[user@allocatednode ~]$ module load openmpi gcc
Now that we loaded our modules, we ar ready to test/run our job -
[user@allocatednode ~]$ bash mpi_pi/run.sh
Assuming that you have prior knowledge about batch jobs and related SLURM commands.
Before you begin to write your first Batch Job Script, create a directory namely
my_jobs or something like that to keep your jobs organized (recommended).
Create job directory
[user@master ~]$ mkdir my_jobs/
Switch to the job directory
[user@master ~]$ cd my_jobs/
For this part assuming that you are familiar with terminal based text editor
vim. If you have prior experience with
vim it's a plus. Otherwise simply follow the file editing instructions with
vim below. The
nano text editor is also available on Arun-HPCC. You can make do with the
nano text editor to write your script as well.
Create job script with
[user@master ~]$ vim first_job.sh
Sample Job Script
At this point assuming that you are familiar with
sbatch command and its arguments. The arguments for
sbatch command is generally specified via the job script with
#SBATCH term followed by the argument name & corresponding value.
Job script for generating random numbers
This sample job scripts uses some key arguments of
sbatch command provided by SLURM. The
sbatch arguments used in the job script are -
- --job-name : Name of the job.
- --nodes : Minimum number of nodes needed for the job.
- --ntasks : Maximum of number tasks will run on per node.
- --cpus-per-task : Number of CPUs needed per task.
- -t : Runtime for the job.
- -o : Job output reporting file
- -e : Job error reporting file
To learn more about
sbatch arguments see sbatch command here. The job error(s) will be reported in a file like
slurm.random_number_generator.node1.71.err and similarly output in
slurm.random_number_generator.node1.71.out file. Here
%N specifies the node name where job ran and
%j denotes the job ID provided by SLURM.
Submiting job script
Make sure you wrote job script correctly and provided sbatch arguments correctly with #SBATCH term followed by the arguments and corresponding values. Reconsider setting the amount of time (dd:hh:mm) for the job to run. If the task is still running after the specified run time is reached, it will be automatically canceled by the scheduler.
Before you submit your job, you can test your script with the below command
[user@master my_jobs]$ sbatch --test-only first_job.sh
Submit the job with
[user@master my_jobs]$ sbatch first_job.sh Submitted batch job 60 [user@master my_jobs]$
Here 60 is your job ID provided by the scheduler.
Assuming that you have prior knowledge about
Check job queue
[user@master ~]$ squeue
Queued jobs for a user
[user@master ~]$ squeue -u user
Cancel a submitted job with the batch job
[user@master ~]$ scancel 60
Cancel all jobs for a user
[user@master ~] scancel -u user