next up previous contents
Next: Lattice Models Up: Embarrassingly Parallel Computations Previous: Batch Systems with a   Contents

Master-Slave Calculations

A classic example of this is the version of Amdahl's Law for the special case of a master-slave algorithm, where one ``master'' node sends out the task(s) to P ``slave'' nodes8.2. This is in the class of ``simple'' parallel tasks, so the formula above works, often with just one parallelizable subtask that the ``master'' node splits up and sends to the ``slave'' nodes to do. If you sat at the front of a room filled with 10 friends with a stack of 100 model airplanes to build, a master-slave approach is to walk around handing out a kit to each and then kick back, tapping your feet for twenty minutes until they all finish. Then you walk around and pick up the finished airplanes and give them the next kit. You might then even do some work on them yourself (put them in a box to be shipped to model-airplaneless children, for example, or throw them out a window to see if they fly) while they work on the next ones, repeat until done8.3.

From this we see that master-slave calculations (MSC) have an intimate relationship with embarrassingly course grained calculations (ECGC) - one way to implement nearly any ECGC is as a MSC. So to speak. However, not all MSC are ECG. To see this, we have to learn what the word's ``coarse grained'' (or medium grained of fine grained or granularity in general) mean in the general context of parallel computing. We'll do this later.

Amdahl's Law becomes:

\begin{displaymath}
\frac{R(P)}{R(1)} \le \frac{1}{S + (1-S)/P + P/r_1}.
\end{displaymath} (8.1)

In this, $S$ is the serial fraction of the code (the time you spend piling all the boxes up on a table before beginning), $(1 - S)$ is the parallel fraction (building the airplanes and throwing them out the window), and $r_1 = T_s/(P*T_{i,sp})$, the ratio of serial compute time to serial communications time, per node (the

Even THIS isn't pessimistic enough. It assumes ``perfect'' parallelizability. In real code, the parallel tasks may well need to be organized so that they complete in certain orders and parts are available where they are needed just when they are needed. Then there may be random delays on the nodes due to other things they are doing. All of this can further degrade performance.

Which leads us to...


next up previous contents
Next: Lattice Models Up: Embarrassingly Parallel Computations Previous: Batch Systems with a   Contents
Robert G. Brown 2004-05-24