Communication Protocols

Communication Protocols

The various client-server communications described in the previous section consisted of sending a string of characters ending in a carriage-return and receiving another. However simple, this communication pattern defines a protocol. If we wish to communicate more complex values, such as floats, matrices of floats, a tree of arithmetic expressions, a closure, or an object, we introduce the problem of encoding these values. Many solutions exist according to the nature of the communicating programs, which can be characterized by the implementation language, the machine architecture, and in certain cases, the operating system. Depending on the machine architecture, integers can be represented in many different ways (most significant bits on the left, on the right, use of tag bits, and size of a machine word). To communicate a value between different programs, it is necessary to have a common representation of values, referred to as the external representation². More structured values, such as records, just as integers, must have an external representation. Nonetheless, there are problems when certain languages allow constructs, such as bit-fields in C, which do not exist in other languages. Passing functional objects or objects, which contain pieces of code, poses a new difficulty. Is the code byte-compatible between the sender and receiver, and does there exist a mechanism for dynamically loading the code? As a general rule, the problem is simplified by supposing that the code exists on both sides. It is not the code itself that is transmitted, but information that allows it to be retrieved. For an object, the instance variables are communicated along with the object's type, which allows retrieval of the object's methods. For a closure, the environment is sent along with the address of its code. This implies that the two communicating programs are actually the same executable.

A second difficulty arises from the complexity of linked exchanges and the necessity of synchronizing communications involving many programs.

We first present text protocols, later discussing acknowledgements and time limits between requests and responses. We also mention the difficulty of communicating internal values, in particular as it relates to interoperability between programs written in different languages.

Text Protocol

Text protocols, that is, communication in ASCII format, are the most common because they are the simplest to implement and the most portable. When a protocol becomes complicated, it may become difficult to implement. In this setting, we define a grammar to describe the communication format. This grammar may be rich, but it will be up to the communicating programs to handle the work of coding and interpreting the text strings sent and received.

As a general rule, a network application does not allow viewing the different layers of protocols in use. This is typified by the case of the HTTP protocol, which allows a browser to communicate with a Web site.

The HTTP Protocol

The term ``HTTP'' is seen frequently in advertising. It corresponds to the communication protocol used by Web applications. The protocol is completely described on the page of the W3 Consortium:

Link

http://www.w3.org

This protocol is used to send requests from browsers (Communicator, Internet Explorer, Opera, etc.) and to return the contents of requested pages. A request made by a browser contains the name of the protocol (HTTP), the name of the machine (www.ufr-info-p6.jussieu.fr), and the path of the requested page (/Public/Localisation/index.html). Together these components define a URL (Uniform Resource Locator):

http://www.ufr-info-p6.jussieu.fr/Public/Localisation/index.html

When such a URL is requested by a browser, a connection over a socket is established between the browser and the server running on the indicated server, by default on port 80. Then the browser sends a request in the HTTP format, like the following:

GET /index.html HTTP/1.0

The server responds in the protocol HTTP, with a header:

HTTP/1.1 200 OK
Date: Wed, 14 Jul 1999 22:07:48 GMT
Server: Apache/1.3.4 (Unix) PHP/3.0.6 AuthMySQL/2.20
Last-Modified: Thu, 10 Jun 1999 12:53:46 GMT
 
Accept-Ranges: bytes
Content-Length: 3663
Connection: close
Content-Type: text/html

This header indicates that the request has been accepted (code 200 OK), the kind of server, the modification date for the page, the length of the send page and the type of content which follows. Using the GET commmand in the protocol (HTTP/1.0), only the HTML page is transferred. The following connection with telnet allows us to see what is actually transmitted:

$ telnet www.ufr-info-p6.jussieu.fr 80
Trying 132.227.68.44...
Connected to triton.ufr-info-p6.jussieu.fr.
Escape character is '^]'.
GET


<!-- index.html -->
<HTML>
<HEAD>
<TITLE>Serveur de l'UFR d'Informatique de Pierre et Marie Curie</TITLE>
</HEAD>
<BODY>

<IMG SRC="/Icons/upmc.gif" ALT="logo-P6" ALIGN=LEFT HSPACE=30>
Unit&eacute; de Formation et de Recherche 922 - Informatique<BR>
Universit&eacute; Pierre et Marie Curie<BR>
4, place Jussieu<BR>
75252 PARIS Cedex 05, France<BR><P> 
....
</BODY>
</HTML>
<!-- index.html -->

Connection closed by foreign host.

The connection closes once the page has been copied. The base protocol is in text mode so that the language may be interpreted. Note that images are not transmitted with the page. It is up to the browser, when analyzing the syntax of the HTML page, to observe anchors and images (see the IMG tags in the transmitted page). At this time, the browser sends a new request for each image encountered in the HTML source; there is a new connection for each image. The images are displayed when they are received. For this reason, images are often displayed in parallel.

The HTTP protocol is simple enough, but it transports information in the HTML language, which is more complex.

Protocols with Acknowledgement and Time Limits

When a protocol is complex, it is useful that the receiver of a message indicate to the sender that it has received the message and that it is grammatically correct. The client blocks while waiting for a response before working on its tasks. If the part of the server handling this request has a difficulty interpreting the message, the server must indicate this fact to the client rather than ignoring the request. The HTTP protocol has a system of error codes. A correct request results in the code 200. A badly-formed request or a request for an unauthorized page results in an error code 4xx or 5xx according to the nature of the error. These error codes allow the client to know what to do and allow the server to record the details of such incidents in its log files.

When the server is in an inconsistent state, it can always accept a connection from a client, but risks never sending it a response over the socket. For avoiding these blocking waits, it is useful to fix a limit to the time for transmission of the response. After this time has elapsed, the client supposes that the server is no longer responding. Then the client can close this connection in order to go on to its other work. This is how WWW browsers work. When a request has no response after a certain time, the browser decides to indicate that to the user. Objective CAML has input-output with time limits. In the Thread library, the functions wait_time_read and wait_time_write suspend execution until a character can be read or written, within a certain time limit. As input, these function take a file descriptor and a time limit in seconds: Unix.file_descr -> float -> bool. If the time limit has passed, the function returns false, otherwise the I/O is processed.

Transmitting Values in their Internal Representation

The interest in transmission of internal values comes from simplifying the protocol. There is no longer any need to encode and decode data in a textual format. The inherent difficulty in sending and receiving values in their internal representation are the same as those encountered for persistent values (see the Marshal library, page ??). In effect, reading or writing a value in a file is equivalent to receiving the same value over a socket.

Functional Values

In the case of transmitting a closure between two Objective CAML programs, the code in the closure is not sent, only its environment and its code pointer (see figure 12.9 page ??). For this strategy to work, it is necessary that the server possess the same code in the same memory location. This implies that the same program is running on the server as on the client. Nothing, however, prevents the two programs from running different parts of the code at the same time. We adapt the matrix calculation service by sending a closure with an environment containing the data for calculation. When it is received, the server applies this closure to () and the calculation begins.

Interoperating with Different Languages

The interest in text protocols is that they are independent of implementation languages for clients and servers. In effect, the ASCII code is always known by programming languages. Therefore, it is up to the client and to the server to analyze syntactically the strings of characters transmitted. An example of such an open protocol is the simulation of soccer players called RoboCup.

Soccer Robots

A soccer team plays against another team. Each member of the team is a client of a referee server. The players on the same team cannot communicate directly with each other. They must send information through the server, which retransmits the dialog. The server shows a part of the field, according to the player's position. All these communications follow a text protocol. A Web page that describes the protocol, the server, and certain clients:

Link

http://www.robocup.org/

The server is written in C. The clients are written in different languages: C, C++, SmallTalk, Objective CAML, etc. Nothing prevents a team from fielding players written in different languages.

This protocol responds to the interoperability needs between programs in different implementation languages. It is relatively simple, but it requires a particular syntax analyzer for each family of languages.