@@ -28,9 +28,9 @@ The aim of this guide is to introduce techniques to that will help to:
- measure the total execution time of your application
- understand your code performance by examining the system utilization (CPU, Memory, GPU)
- use Slurm to understand the resource usage of your jobs (running or completed)
- find the bottlenecks by measure the runtime and memory usage of an isolated piece of code (Python/R)
- avoid common pitfalls in assessing the performance on various systems
- use Slurm to understand the resource usage of your jobs (running or completed)
- optimize the cost
## Benchmarking
...
...
@@ -155,6 +155,16 @@ While a job runs you can also use `scontrol` to check what resources Slurm has a
$ scontrol show job <JOBID>
```
### Access the Corresponding Compute Node
You cannot connect to the compute nod via ssh. You can access the node similar to [running an interactive session](https://docs.s3it.uzh.ch/how-to_articles/how_to_run_an_interactive_session/):
```bash
$ srun --pty--interactive--jobid <JOBID> bash -l
```
If the job is using multiple nodes, you can request a given one using `--nodelist=<NODE>`.
### Completed Jobs
To find job IDs for jobs that have stopped running, use [`sacct`](https://slurm.schedmd.com/sacct.html), and in this example, my user's jobs from the last 30 days: