Skip to content
Snippets Groups Projects
Verified Commit 6d1cfe16 authored by Andrei Plamada's avatar Andrei Plamada
Browse files

add GPU to monitoring

parent 55eddca7
No related branches found
No related tags found
No related merge requests found
......@@ -27,7 +27,7 @@ Moreover, with Science IT computing infrastructure, the cost contribution is bas
The aim of this guide is to introduce techniques to that will help to:
- measure the total execution time of your application
- understand your code performance by examining the system utilization (CPU and Memory)
- understand your code performance by examining the system utilization (CPU, Memory, GPU)
- find the bottlenecks by measure the runtime and memory usage of an isolated piece of code (Python/R)
- avoid common pitfalls in assessing the performance on various systems
- use Slurm to understand the resource usage of your jobs (running or completed)
......@@ -70,7 +70,7 @@ When benchmarking:
- it is hard to compared when using different inputs (we should avoid doing it)
- IO operations make the benchmarking less predictable (more later)
## Resource Monitoring (CPU and Memory)
## Resource Monitoring
First we want to understand the utilization of the system and what processes are active.
Each operating system has established tools for this:
......@@ -119,6 +119,16 @@ $ exit # the session terminates when you exit (run it in the session)
More info at https://linuxize.com/post/how-to-use-linux-screen/ .
### `nvtop`
[`nvtop`](https://github.com/Syllo/nvtop) is a `htop` like for GPUs and accelerators.
### `nvidia-smi`
> The NVIDIA System Management Interface (`nvidia-smi`) is a command line utility, based on top of the NVIDIA Management Library (NVML), intended to aid in the management and monitoring of NVIDIA GPU devices.
Source: https://developer.nvidia.com/system-management-interface
## Finding the Bottlenecks
Adding timestamps in our code to identify the bottleneck can be tedious and time consuming. We should should rather use dedicated tools able to measure the instructions or memory usage at the function or code line level. This activity is called **profiling** and the tools are know as **profilers**.
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment