Slum Quick Reference Guide
Slum Quick Reference Guide
- Commands
- Slum Scripts
- Slum Positions
- Job Submission
- Monitoring jobs
- Job Deletion
Quick Reference Guide
Commands
Table with most used Slurm commands. See complete listing here <https://slurm.schedmd.com/pdfs/summary.pdf>
Command | Description |
sacct | Display Accounting Data on jobs |
salloc | Allocate resources required for a job |
srun | Obtain job allocation and execute job |
sbatch | Submit a job script for execution |
scancel | Cancel a job |
sinfo | View information about nodes and partitions |
squeue | View information about jobs |
Man pages exist for all Slurm commands. The command option --help also provides a brief summary of options. Note that the command options are all case sensitive.
Slurm Scripts
Slurm jobs are usually send via a shell script that does the following:
• Describes the processing to be done (Input-Process-Output)
• Requests resources to use for processing
Example of a simple Slurm Script <testslurm.sh>
#!/bin/bash
# set the number of nodes
#SBATCH --nodes=4
# set max wallclock time
#SBATCH --time=10:00:00
# set name of job
#SBATCH --job-name=test123
# mail alert at start, end and abortion of execution
#SBATCH --mail-type=ALL
# send mail to this address
#SBATCH --mail-user=john.doe@email.edu
# run the application
srun hostname
Once the script has been saved it can then be submitted as a job using the sbatch command. Eg.
$ sbatch –o my.output ./testslurm.sh
Upon submission, slurm will generate a job id. Jobs will remain in the queue until enough resources can be allocated for execution.
The command squeue will allow you to see the jobs in the queue and the scancel <job id> command will allow you to cancel your job if necessary.
Slurm Partitions
Slurm partitions are the various queues that can handle jobs. Each partition consists of a set of nodes. SPARKS has the following partitions defined:
• defq – The default partition consisting of 32 compute nodes (2x8 core Xeon)
• gpuq – This partition consists of 4 compute nodes with GPU accelerators (2xNVidia K80)
• trainq – This partition consists of 2 compute nodes (2x8 core Xeon). This partition is used for testing jobs (ensuring that scripts work) before submission to the default partition.