Communication Protocols
The various client-server communications described in the previous section consisted
of sending a string of characters ending in a carriage-return and receiving another.
However simple, this communication pattern defines a protocol.
If we wish to communicate more complex values, such as floats, matrices of
floats, a tree of arithmetic expressions, a closure, or an object, we introduce
the problem of encoding these values. Many solutions exist according to
the nature of the communicating programs, which can be characterized by
the implementation language, the machine architecture, and in certain
cases, the operating system.
Depending on the machine architecture, integers can be
represented in many different ways (most significant bits on the left, on the
right, use of tag bits, and size of a machine word).
To communicate a value between different programs, it is necessary
to have a common representation of values, referred to as the external
representation2.
More structured values, such as records, just as integers, must have an external
representation.
Nonetheless, there are problems when certain languages allow
constructs, such as bit-fields in C, which do not exist in other
languages.
Passing functional objects or objects, which contain pieces of code, poses
a new difficulty. Is the code byte-compatible between the sender and receiver, and does
there exist a mechanism for dynamically loading the code?
As a general rule, the problem is simplified by supposing that the
code exists on both sides. It is not the code itself that is transmitted,
but information that allows it to be retrieved.
For an object, the instance variables are communicated along with the object's
type, which allows retrieval of the object's methods.
For a closure, the environment is sent along with the address of its code.
This implies that the two communicating programs are actually the same
executable.
A second difficulty arises from the complexity of linked exchanges
and the necessity of synchronizing communications involving
many programs.
We first present text protocols, later discussing acknowledgements
and time limits between requests and responses. We also mention the
difficulty of
communicating internal values, in particular as it relates to
interoperability between programs written in different languages.
Text Protocol
Text protocols, that is, communication in ASCII format,
are the most common because they are the simplest to implement and the
most portable.
When a protocol becomes complicated, it may become difficult
to implement. In this setting, we define a grammar to describe
the communication format. This grammar may be rich, but it will
be up to the communicating programs to handle the work of
coding and interpreting the text strings sent and received.
As a general rule, a network application does not allow viewing
the different layers of protocols in use. This is typified
by the case of the HTTP protocol, which allows a browser
to communicate with a Web site.
The HTTP Protocol
The term ``HTTP'' is seen frequently in advertising.
It corresponds to the communication protocol used by Web applications.
The protocol is completely described on the page of the
W3 Consortium:
Link
http://www.w3.org
This protocol is used to send requests from browsers (Communicator,
Internet Explorer, Opera, etc.) and to return the contents of
requested pages. A request made by a browser contains the name
of the protocol (HTTP), the name of the machine
(www.ufr-info-p6.jussieu.fr),
and the path of the requested page (/Public/Localisation/index.html).
Together these components define a URL (Uniform Resource
Locator):
http://www.ufr-info-p6.jussieu.fr/Public/Localisation/index.html
When such a URL is requested by a browser, a connection over a socket
is established between the browser and the server running on the
indicated server, by default on port 80. Then the browser sends
a request in the HTTP format, like the following:
GET /index.html HTTP/1.0
The server responds in the protocol HTTP, with a header:
HTTP/1.1 200 OK
Date: Wed, 14 Jul 1999 22:07:48 GMT
Server: Apache/1.3.4 (Unix) PHP/3.0.6 AuthMySQL/2.20
Last-Modified: Thu, 10 Jun 1999 12:53:46 GMT
Accept-Ranges: bytes
Content-Length: 3663
Connection: close
Content-Type: text/html
This header indicates that the request has been accepted (code 200 OK), the kind of server,
the modification date for the page, the length of the send page and the type of content which
follows.
Using the GET commmand in the protocol (HTTP/1.0), only the HTML page is transferred.
The following connection with telnet allows us to see what is actually transmitted:
$ telnet www.ufr-info-p6.jussieu.fr 80
Trying 132.227.68.44...
Connected to triton.ufr-info-p6.jussieu.fr.
Escape character is '^]'.
GET
<!-- index.html -->
<HTML>
<HEAD>
<TITLE>Serveur de l'UFR d'Informatique de Pierre et Marie Curie</TITLE>
</HEAD>
<BODY>
<IMG SRC="/Icons/upmc.gif" ALT="logo-P6" ALIGN=LEFT HSPACE=30>
Unité de Formation et de Recherche 922 - Informatique<BR>
Université Pierre et Marie Curie<BR>
4, place Jussieu<BR>
75252 PARIS Cedex 05, France<BR><P>
....
</BODY>
</HTML>
<!-- index.html -->
Connection closed by foreign host.
The connection closes once the page has been copied.
The base protocol is in text mode so that the language may
be interpreted. Note that images are not transmitted with the
page. It is up to the browser, when analyzing the syntax of the
HTML page, to observe anchors and images (see the IMG tags
in the transmitted page). At this time, the browser sends a new
request for each image encountered in the HTML source; there
is a new connection for each image. The images are displayed when
they are received. For this reason, images are often
displayed in parallel.
The HTTP protocol is simple enough, but it transports information
in the HTML language, which is more complex.
Protocols with Acknowledgement and Time Limits
When a protocol is complex, it is useful that the receiver
of a message indicate to the sender that it has received the
message and that it is grammatically correct.
The client blocks while waiting for a response before
working on its tasks. If the part of the server handling this
request has a difficulty interpreting the message, the server
must indicate this fact to the client rather than ignoring the
request. The HTTP protocol has a system of error codes.
A correct request results in the code 200. A badly-formed request
or a request for an unauthorized page results in an error code 4xx or 5xx
according to the nature of the error. These error codes allow
the client to know what to do and allow the server to record the
details of such incidents in its log files.
When the server is in an inconsistent state, it can always
accept a connection from a client, but risks never sending
it a response over the socket. For avoiding these blocking
waits, it is useful to fix a limit to the time for transmission
of the response. After this time has elapsed, the client
supposes that the server is no longer responding.
Then the client can close this connection in order to go
on to its other work. This is
how WWW browsers work. When a request has no response
after a certain time, the browser decides to indicate that
to the user. Objective CAML has input-output with time limits. In
the Thread library, the functions wait_time_read and
wait_time_write suspend execution until a
character can be read or written, within a certain time limit.
As input, these function take a file descriptor and a time limit
in seconds:
Unix.file_descr -> float -> bool.
If the time limit has passed, the function returns
false, otherwise the I/O is processed.
Transmitting Values in their Internal Representation
The interest in transmission of internal values comes from
simplifying the protocol. There is no longer any need to
encode and decode data in a textual format. The inherent
difficulty in sending and receiving values in their
internal representation are the same as those encountered
for persistent values (see the Marshal library,
page ??).
In effect, reading or writing a value in a file is
equivalent to receiving the same value over a socket.
Functional Values
In the case of transmitting a closure between two Objective CAML programs,
the code in the closure is not sent, only its environment and
its code pointer (see figure 12.9 page ??).
For this strategy to work, it is necessary that the server
possess the same code in the same memory location. This implies
that the same program is running on the server as on the client.
Nothing, however, prevents the two programs from running different
parts of the code at the same time. We adapt the matrix calculation service
by sending a closure with an environment containing the data for
calculation. When it is received, the server applies this closure
to () and the calculation begins.
Interoperating with Different Languages
The interest in text protocols is that they are independent
of implementation languages for clients and servers. In effect,
the ASCII code is always known by programming languages.
Therefore, it is up to the client and to the server to analyze
syntactically the strings of characters transmitted.
An example of such an open protocol is the simulation of
soccer players called RoboCup.
Soccer Robots
A soccer team plays against another team. Each member of the team
is a client of a referee server. The players on the same team cannot communicate
directly with each other. They must send information through the server, which
retransmits the dialog. The server shows a part of the field,
according to the player's position.
All these communications follow a text protocol. A Web page that describes the protocol,
the server, and certain clients:
Link
http://www.robocup.org/
The server is written in C. The clients are written in different languages:
C, C++, SmallTalk, Objective CAML, etc.
Nothing prevents a team from fielding players written in different languages.
This protocol responds to the interoperability needs between programs
in different implementation languages. It is relatively simple, but
it requires a particular syntax analyzer for each family of languages.