[DBPP] previous next up contents index [Search]
Next: 5.11 Case Study: Channel Library Up: 5 Compositional C++ Previous: 5.9 Modularity

5.10 Performance Issues

 

CC++ programs do not explicitly send and receive messages, but instead   perform read and write operations on global pointers; make remote   procedure calls; and use par, parfor, and spawn statements to create new threads of control. Nevertheless, the communication operations associated with a CC++ program can usually be determined easily. Normally, a read operation on a global pointer, a write operation on a global pointer, or an RPC all result in two communications: one to send the remote request and one to receive an acknowledgment or result. As noted in Chapter 3, the cost of each message can be specified with reasonable accuracy in terms of a startup cost and a per-word cost. It is necessary to distinguish between the communication costs incurred when communicating CC++ processes are located on different processors ( interprocessor communication) or on the same processor ( intraprocessor communication). Both these costs can depend significantly on implementation technology. Typically, interprocessor communication costs are similar to those in Table 3.1 in Chapter 3, and intraprocessor communication is cheaper. However, on some multicomputers with fast interprocessor communication and relatively low memory bandwidth, intraprocessor communication can actually be slower than interprocessor communication.

The following issues must also be considered when examining the performance of CC++ programs.

  Reading and writing global pointers. Reading or writing a global pointer normally involves two communications: one to send the read or write request, and one to return a result and/or signal completion. Hence, global pointers must be used with caution, particularly on computers where communication is expensive. If a data structure is referenced often, it may be worthwhile to move that data structure to where it is used most often, or to replicate it. If a task requires many data items from the same processor object, it may be better to use an RPC to transfer all the required data in a single message.

Remote procedure calls. An RPC normally involves two communications: the first to transmit the procedure call   and its data to the remote processor, and the second to signal completion and to return any result. In many situations, the return message is not required and hence represents overhead. This overhead can be avoided by using the spawn statement to create an asynchronous thread of control. For example, the performance of the following code from Program 5.15 below, which sends a value on a channel,

void send(int val) { inport->enqueue(val); }

can be improved in cases where one does not care when the send operation completes, by rewriting it to eliminate the reply, as follows.

void send(int val) { spawn inport->enqueue(val); }

Fairness. When two or more threads execute in the same processor object, CC++ guarantees that execution is fair: that is, that no   thread that is not blocked waiting for data will be prevented indefinitely from executing. However, the time that a thread waits before executing can vary significantly depending on characteristics of both the application and a particular CC++ implementation. Hence, care must be taken if application performance depends on obtaining timely responses to remote requests.

Remote operations. As a general principle, operations involving global objects (processor object creation, RPC, etc.) are more expensive than operations involving only local objects. However, the cost of these operations can vary significantly from machine to machine. An RPC is typically less expensive than a processor object creation, and a remote read or write operation is typically less expensive than an RPC. The first processor object creation on a processor is often significantly more expensive than subsequent processor object creation operations on the same processor.

Compiler optimization. Because CC++ is a programming language   rather than a library, a compiler may in some situations be able to   reduce communication costs by eliminating replies, coalescing messages, or otherwise reorganizing a program to improve performance.



[DBPP] previous next up contents index [Search]
Next: 5.11 Case Study: Channel Library Up: 5 Compositional C++ Previous: 5.9 Modularity

© Copyright 1995 by Ian Foster