Update section about performance in parallelization.md

55eddca7 · Maria d'Errico · 50a76303 · 55eddca7
Commit 55eddca7 authored 5 months ago by Maria d'Errico
--- a/parallelization.md
+++ b/parallelization.md
@@ -218,7 +218,7 @@ Refer to the ["Resource\_Monitoring\_and\_Benchmarking" directory](/devin.routh/
 You can similarly execute the analytical code in a background process then run the `nvidia-smi` command in the foreground.

 ```bash
-python Lightning_Tutorial_Jupyter.py 3 5 100000 tfb &
+python Lightning_Tutorial_Jupyter.py 1 5 1000 monit &
 nvidia-smi -i $CUDA_VISIBLE_DEVICES -l 2 --query-gpu=gpu_name,memory.used,memory.free,timestamp --format=csv -f nvidia-smi.log
 ```

@@ -279,9 +279,13 @@ More information about [job arrays](https://docs.s3it.uzh.ch/cluster/parallelisa
 > - **efficiency** is the ratio between the speedup and the number of processors used


-In technical terms, [Amdahl's law](https://en.wikipedia.org/wiki/Amdahl%27s_law) states that "the overall performance improvement gained by optimizing a single part of a system is limited by the fraction of time that the improved part is actually used". Otherwise stated, Amdahl's law means that you can only parallelize your workflow up to a certain maximal point of efficiency (unique to the application / analysis code itself), after which increased resources provided for parallelization (i.e., adding more CPUs or GPUs) will not result in any greater efficiency. In reality, after reaching the "optimal threshold", greater increases in provided resources may actually **significantly decrease** the efficiency of the code's execution. This dropoff in efficiency can be seen in the Speedup charts within the `TensorFlow\_Benchmark\_Results.ipynb` notebook.
+In technical terms, [Amdahl's law](https://en.wikipedia.org/wiki/Amdahl%27s_law) states that "the overall performance improvement gained by optimizing a single part of a system is limited by the fraction of time that the improved part is actually used". Otherwise stated, Amdahl's law means that you can only parallelize your workflow up to a certain maximal point of efficiency (unique to the application / analysis code itself), after which increased resources provided for parallelization (i.e., adding more CPUs or GPUs) will not result in any greater efficiency. In reality, after reaching the "optimal threshold", greater increases in provided resources may actually **significantly decrease** the efficiency of the code's execution.

-In order to find the optimal level of hardware for efficiency gains, you will simply need to test multiple levels of provided hardware and chart/log the efficiency as resources increase. For this context, efficiency is measured as "speedup", which is the non-parallelized implementation time divided by the parallelized implementation time (see the use of the `mutate` function in cell 5 of the "TensorFlow\_Benchmark\_Results.ipynb" notebook). After the "sweet spot" of parallelization is reached, no greater efficiency can be accomplished (but efficiency can greatly decrease).
+In order to find the optimal level of hardware for efficiency gains, you will simply need to test multiple levels of provided hardware and chart/log the efficiency as resources increase.
+This process is also know as benchmarking and an example is provided in the ["TensorFlow\_Benchmark\_Results.ipynb" notebook](TensorFlow_Benchmark_Results.ipynb).
+For this context, efficiency is measured as "Speedup", which is the non-parallelized implementation time divided by the parallelized implementation time (see the use of the `mutate` function in cell 5 of the "TensorFlow\_Benchmark\_Results.ipynb" notebook).
+After the "sweet spot" of parallelization is reached, no greater efficiency can be accomplished (but efficiency can greatly decrease).
+This dropoff in efficiency can be seen in the Speedup charts within the notebook.

 Reasons can be multiple but often is that: