"I think there is a world market for maybe five computers." - Thomas Watson, chairman of IBM, 1943
Perhaps Watson was off by four.
In the early 1990s, few people had heard of Tim Berners-Lee's World Wide Web, and, of those that had, many fewer appreciated its significance. After all, computers had been connected to the Internet since the 1970s, and transferring data among computers was commonplace. Yet the Web brought something really new: the perspective of viewing the whole Internet as a single information space, where users accessing data could move seamlessly and transparently from machine to machine by following links.
A similar shift in perspective is currently underway, this time with application programs. Although distributed computing has been around for as long as there have been computer networks, it's only recently that applications that draw upon many interconnected machines as one vast computing medium are being deployed on a large scale. What's making this possible are new protocols for distributed computing built upon HTTP, and that are designed for programs interacting with programs, rather than for people surfing with browsers.
There are several kinds of protocols:
We're currently moving from an environment where applications are deployed on individual machines and Web servers, to a world where applications are composed of pieces — called services in the current jargon — that are spread across many different machines, and where the services interact seamlessly and transparently to produce an overall effect. While the consequences of this change could be minor, it's also possible that they could be as profound as the introduction of the Web. In any case, companies are introducing new Web service frameworks that exploit the new infrastructure. Microsoft's .NET is one such framework.
In this chapter, you'll build applications that consume Web services to combine data from from your online learning community with remote data in Google and Amazon. You'll be building SOAP clients to these public services. In the final exercises, you'll be creating your own service that provides information about recent content appearing in your community. You'll make this service available both in the de jure standard of SOAP and the de facto standard of RSS, a breakout from the world of weblogs.
**** insert figure *****
Depending on what tools you're using you might never need to know what SOAP requests and replies actually look like. Nonetheless, let's start with a behind-the-scenes look at SOAP messages, which are typically sent across the network embedded in HTTP POSTs.
Here's a raw SOAP request/response pair for a hypothetical "who's online" service that returns information about users who have been active in the last N seconds:
Request (plus whitespace for readability)Response (plus whitespace for readability)
POST /services/WhosOnline.asmx HTTP/1.1 Host: somehost Content-Type: text/xml; charset=utf-8 Content-Length: length SOAPAction: "http://jpfo.org/WhosOnline" <?xml version="1.0" encoding="utf-8"?> <soap:Envelope xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"> <soap:Body> <WhosOnline xmlns="http://jpfo.org/"> <n_seconds>600</n_seconds> </WhosOnline> </soap:Body> </soap:Envelope>
HTTP/1.1 200 OK Content-Type: text/xml; charset=utf-8 Content-Length: length <?xml version="1.0" encoding="utf-8"?> <soap:Envelope xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"> <soap:Body> <WhosOnlineResponse xmlns="http://jpfo.org/"> <WhosOnlineResult> <user> <first_names>Eve</first_names> <last_name>Andersson</last_name> <email>eve@eveandersson.com</email> </user> <user> <first_names>Philip</first_names> <last_name>Greenspun</last_name> <email>philg@mit.edu</email> </user> <user> <first_names>Andrew</first_names> <last_name>Grumet</last_name> <email>aegrumet@alum.mit.edu</email> </user> </WhosOnlineResult> </WhosOnlineResponse> </soap:Body> </soap:Envelope>
Start by writing a design document that lays out your SQL data model
and how you're going to use the Amazon API (which functions to call?
which values to process?). Your recommended_books
table
probably should be keyed by the International Standard Book Number
(ISBN). For most of your career as a data modeler, it is best to use
generated keys. However, in this case there is an entire
infrastructure to ensure the uniqueness of the ISBN (see www.isbn.org) and therefore it is safe
for use as a primary key.
For each book, your data model ought to be able to record at least the following:
/reading-list/
for
the page scripts that will make up your new module. We suggest
implementing the following URLs:
/reading-list/one-book
page, which will show the
full description, who recommended the book and why
/reading-list/search
page, the target of a text
entry box on the index page, which returns a list of books from the
Amazon API that match a query string; books that are already in the
reading list should be displayed, but greyed-out and somehow marked as
already on the list (and there shouldn't be a button to add them
again!). Books that aren't on the list should be hyperlinks to an
"add-book" URL. (You can make the title of the book be the hyperlink
anchor; remember always to let the information be the interface.)
/reading-list/add-book
page, which solicits a
comment from the suggesting user as to why this particular book is
good for other community members
A good rule of thumb is that every table you add to your data model implies roughly 5 user-accessible URLs and 5 administrative URLs. So far we're up to 4 user pages and if you were to launch this feature you'd need to build some admin pages.
In this exercise, you'll create an alternative post confirmation process that will entail writing two new Web scripts, the search capabilities that you developed in the "Search" chapter, and the Google Web APIs service (http://www.google.com/apis/). The goal is to put some internal and external links in front of Joe Newbie and encourage him to look at them before finalizing his question for presentation to the entire community.
Your new post confirmation process should be invoked only for questions that start a discussion thread, not for answers to a question. Our experience with online communities is that it is more important to moderate the questions that determine what will be discussed rather than individual answers.
If your current post confirmation page is at
/forum/confirm
, we suggest adding a -query
suffix for your new script, e.g., /forum/confirm-query
.
This page should have the following form:
There are a few ways to achieve this. One is to make all of the links
target a separate window using the HTML target=
syntax
for the anchor (<a
) tag. Novice users might become
confused, however, as the extra window pops up on their screen and
they might not know how to use their browser or operating system to
get back to the Confirm/Edit page. A JavaScript pop-up in a small
size might reduce the scale of this problem. Another option is to use
the dreaded Frames feature of HTML, putting the Confirm/Edit page in
one frame and the other stuff in another frame. When Joe finally
decides to Confirm/Edit, the Frames syntax provides a mechanism for the
server to tell the browser "go back to only one window now". A third
option is to do a "server-side frame" in which you build pages of the
form /forum/confirm-follow-link
in which the full posting
with Confirm/Edit buttons is carried through and the content of the
external or internal link is presented inside a single page.
For the purpose of this exercise, you're free to choose any of these methods or one that we haven't thought of. Note that this exercise should not require modifying any of your database tables or existing scripts except for one link from the "ask a new question" page.
How can the server tell which books are related to a question-and-answer exchange? Start by building a procedure that will go through the question and all replies to build a list of frequently occurring words. Your procedure should exclude those words that are in a stopwords list of exceedingly common English words such as "the", "and", "or", etc. Whatever full-text search tool that you used in the "Search" chapter probably contains such a list somewhere in a file system file or a database table. You can use the top few words in this list to query Amazon for a list of matching titles.
For the purpose of this exercise, you can fetch your Amazon data on every page load. In practice, on a production site this would be bad for your users due to the extra latency and bad for your relationship with Amazon because you might be performing the same query against their services several times per second. You'd probably decide to store the related books in your local database, along with a "last message" stamp and rebuild periodically if there were new replies to a thread.
Each related book should have a link to the product page on Amazon.com, optionally keyed with an Amazon Associates ID. Here's an example reference:
<a href="http://www.amazon.com/exec/obidos/ASIN/0240804058/pgreenspun-20"><cite>Basic
Photographic Materials and Processes</cite></a>
The ISBN goes after the "ASIN", and the Associates ID in this example is
"pgreenspun-20".
Wed, 29 Oct 2003 00:09:19 GMT
)new-content
in a directory of
your choice. Note that it should be easy to build this page using a
function drawing on the intermodule API that you defined as part of
your work on the Software Modularity
chapter exercises.
Expose your procedure to the wider world so that other applications can take advantage via remote method invocation. Install a SOAP handler that accomplishes the following:
Your development platform may provide tools that, once you've mapped the external Web service to the internal procedure call, handle the HTTP and SOAP mechanics transparently. If not, you will need to skim the examples in the SOAP specification and read the introductory articles linked below.
- handles HTTP requests to
/services/new-content
and checks for correct SOAP syntax- pulls the
n_items
parameter out of the request, if present- executes the procedure call and fetches the results
- delivers the results as a valid SOAP response containing zero or more "item" records, with the fields listed in Exercise 5 for each item
Write a WSDL contract that describes the inputs and outputs for your
new-content
service. Note that if you are using Microsoft
.NET, these WSDL contracts will be automatically generated in most
cases. You need only expose them.
Your WSDL should be available either by adding a ?WSDL
to
the URL of the service itself (convenient for Microsoft .NET users) or
available by adding a .wsdl
extension to the URL of the
service itself.
Validate your WSDL contract and SOAP methods by inviting another team to test your service. Do the same for them. Alternatively, look for and employ validation tools out on the Web.
Within a decade, however, the Web Consortium was focussing its efforts on the "Semantic Web" and Resource Description Framework (see http://www.w3.org/RDF). Where standards committee members once talked about whether or not to facilitate adding a caption to a photograph, you now hear words like "ontology" thrown around. Web development has thus become as challenging as cracking the Artificial Intelligence problem.
Where do SOAP and WSDL sit on this continuum from the simplicity of HTML to the AI-complete problem of a semantic Web? Apparently they are closer to RDF than to HTML because it is taking many years for SOAP and WSDL to catch on as opposed to the wildfire-like spread of the human-readable Web.
The dynamic world of weblogs has settled on a standard that has spread very quickly indeed and enabled the construction of quite a few computer programs that aggregate information from multiple weblogs. This standard, pushed forward primarily by Userland's Dave Winer, is known as Really Simple Syndication or RSS and is documented at http://blogs.law.harvard.edu/tech/rss.
/services/new-content-rss.xml
. The feed should contain
just the title, description, and a globally unique identifier (GUID)
for each item. You are encouraged to use the fully
qualified URL for the item as its GUID, if it has one.
Validate your feed using a RSS reader or the validator at http://rss.scripting.com.
Template
<?xml version="1.0"?> <rss version="2.0"> <channel> <title>{site name}</title> <link>{site url}</link> <description>{site description}</description> <language>en-us</language> <copyright>Copyright {dates}</copyright> <lastBuildDate>{rfc822 date}</lastBuildDate> <managingEditor>{your email addr}</managingEditor> <pubDate>{rfc822 date}</pubDate> <item> <title>{item1 title}</title> <description>{description for item1}</description> <guid>{guid for item1}</guid> <pubDate>{rfc822 date for when item1 went live}</pubDate> </item> <item> <title>{item2 title}</title> <description>{description for item2}</description> <guid>{guid for item2}</guid> <pubDate>{rfc822 date for when item2 went live}</pubDate> </item> </channel> </rss>
Remember to escape any markup in your titles and descriptions, so
that, for example, <em>Whoa!</em>
becomes
<em>Whoa!</em>
.