The job scheduler Slurm can report statistics for jobs that can help you understand what resources to request. Here is a short guide to the steps for resource monitoring.
The job scheduler Slurm can report statistics for jobs that can help you understand what resources to request.
_note_: Slurm makes measurements every 60 seconds, which may result in instantaneous spikes not being measured.
For currently running jobs, you can use [`squeue`](https://slurm.schedmd.com/squeue.html), and specifically for your own jobs: `squeue -u $USER` or `squeue --me`. The output of this command shows the **JOBID** in the first column (for running jobs only).
For pending, currently running, and completing jobs you can use [`squeue`](https://slurm.schedmd.com/squeue.html), and specifically for your own jobs: `squeue -u $USER` or `squeue --me`. The output of this command shows the **JOBID** in the first column.
You can check only your running jobs with `squeue -u $USER --states=RUNNING`.
With the **JOBID** for your currently running job, you can run [`sstat`](https://slurm.schedmd.com/sstat.html).
...
...
@@ -157,26 +159,26 @@ $ scontrol show job <JOBID>
### Access the Corresponding Compute Node
You cannot connect to the compute nod via ssh. You can access the node similar to [running an interactive session](https://docs.s3it.uzh.ch/how-to_articles/how_to_run_an_interactive_session/):
You cannot connect to the compute node via ssh. You can access the node similar to [running an interactive session](https://docs.s3it.uzh.ch/how-to_articles/how_to_run_an_interactive_session/):
```bash
$ srun --pty--interactive--jobid <JOBID> bash -l
```
If the job is using multiple nodes, you can request a given one using `--nodelist=<NODE>`.
If you run a multi-node job, you can request a given node using `--nodelist=<NODE>`.
### Completed Jobs
To find jobIDs for jobs that have stopped running, use [`sacct`](https://slurm.schedmd.com/sacct.html), and in this example, my user's jobs from the last 30 days:
To find the JobID for jobs that have finished running, use [`sacct`](https://slurm.schedmd.com/sacct.html), and in this example, my user's jobs from the last 30 days:
From the first column, select a jobID for a job. A job that ran for longer may be easier to understand, so if you have one, choose a job with a longer elapsed time.
From the first column, select a JobID for a job. A job that ran for longer may be easier to understand, so if you have one, choose a job with a longer elapsed time.