Some common kinds of documentation---for example, manuals for computer programs---present a unique challenge to the author. In many cases, it would be nice to be able to provide online documentation in addition to a typeset manual.
One option is to maintain two different documents: one for publication and one for online access. This is difficult to maintain and is prone to error. As the documentation evolves, it is almost inevitable that some changes to one document will not be implemented in the other.
Another option is to include only very limited formatting information in the document designed for publication so that it is easy to “strip out” the formatting commands and produce an online manual. The unfortunate side effect of this approach is that the resulting typeset documentation doesn't have a very professional appearance.
With care, TeX can be the basis for a middle-ground approach to this problem.[109] If you are starting a new documentation project, TeXinfo and LameTeX provide two alternatives for the production of typeset and online documentation from the same source. \LaTeX2HTML and \LaTeX2hy provide alternatives that may be suitable for existing documentation.
You'll find that the best results occur when you plan ahead: if you know that you need both typeset and ASCII documentation, try to use tools that will make the task easier. But, even if you try to plan ahead, it's not uncommon to find out after the fact that you need or want ASCII documentation. If you don't have the TeX sources, you'll just have to take the best results you can get with one of the tools described later in this chapter and do whatever hand-editing is required.
If you have the TeX sources for a document, here are some guidelines that can help improve the quality of the conversion to plain ASCII:
Redefine all font commands to use \tt. This makes TeX work with a fixed-width font. You will probably get many, many overfull and underfull box messages. This can't be helped. Setting \tolerance and \hbadness to large values (10000 with TeX 2.x or 100000 with TeX 3.x, for example) will reduce the number of warnings.
Similarly, redefine all the commands that change font size to select the same size (probably 10pt or 12pt will work best, but larger values may be better if you have wide margins).
Don't use any special fonts---no picture environments in LaTeX, for example. Take out rules, too.
Use \raggedright. There's no point in trying to line up the right margin.
Remove or redefine all mathematics to avoid the use of math-mode. It won't work; don't ask TeX to try.
Remove all tables (\halign in Plain TeX, tabular environments in LaTeX).
Remove floating environments; this may help.
Depending on your level of expertise and the number of documents that you have to convert, redefine footnotes and other environments to give you more marginal improvements.
The ascii.sty style for LaTeX encapsulates many of these rules for you.
Many things that can easily be represented on paper cannot be represented in plain ASCII. One reason for this is that plain ASCII output is not proportionally spaced. Also, in ASCII you can move up or down only by rows, and left or right only by columns; you can't move down “3 points” on a terminal to typeset a subscript, for example.
These differences combine to make many things impossible. For all of its marvelous sophistication, TeX cannot help you typeset mathematics in plain ASCII. It just can't be done. Most tables can't be done in plain ASCII either (at least not if you want lines that are only 80 characters or so long).
The following sections describe tools that may help you achieve the goal of online and typeset documentation from the same sources. Each tool has its own advantages and disadvantages. The ones that work best for you will depend on the type of documentation you are producing and the amount of work you are willing to do.
TeXinfo is the document formatting system adopted by the Free Software Foundation (FSF) and the GNU Project. It is a special TeX format that is very different from standard TeX.[110] The goal of TeXinfo is to devise an input format that can be processed by TeX to produce typeset output and then be processed by another program to produce hypertext output. (The other program in this case is MakeInfo.)
TeXinfo supports ordinary text, sectioning commands, itemize and enumeration environments, footnotes, cross-references, tables of contents, lists of figures and tables, and multiple indexes.
The TeXinfo example from Chapter Chapter 4, Chapter 4, is reproduced in Example Example 10.1 (the TeXinfo input), Figure Figure 10.1 (the typeset page), and Figure Figure 10.2 (the resulting online documentation).
The output from MakeInfo is very nearly pure ASCII. The motivation for {hypertext} output is to make cross references dynamic when the “info” version of the document is used for online reference.[111] The result is close enough to pure ASCII that converting it to pure ASCII is not (usually) too difficult. For example, the comp.fonts newsgroup's Frequently Asked Questions list is maintained as a TeXinfo document. It is posted as an info version that has been processed by a Perl script to “flatten” the hypertext.
The TeXinfo format is well documented, so a brief description here will suffice. First, the backslash, which is ordinarily used to introduce control sequences in TeX, is not special. Instead the “@”-sign is used. Second, a TeXinfo document is divided into “nodes.” A node corresponds roughly to a chapter or a large section of a chapter. In the online documentation, it is easy to jump between related nodes.
Because the info version of the document is ASCII text, many of the special typesetting features of TeX aren't applicable. To support them in the typeset document, TeXinfo allows you to specify that some portions of the input should be seen only by TeX and some should be seen only by MakeInfo. Example Example 10.1 uses this feature to typeset mathematics using the best features of both TeX and MakeInfo.
\LaTeX2HTML attempts to convert LaTeX documents into HTML, the document structuring language used by the World Wide Web (WWW) project. HTML stands for HyperText Markup Language; it is a way of describing documents in terms of their structure (headings, paragraphs, lists, etc.). SGML, the Standard Generalized Markup Language, provides a framework for developing structured documentation; HTML is one specific SGML document type. HTML documents are displayed by special programs called browsers that interpret the markup and present the information in a consistent manner. Because an HTML document is described in terms of its structure and not its appearance, HTML documents can be effectively displayed by browsers in non-graphical environments.
One of the most important features of HTML documents is the ability to form hypertext links between documents. Hypertext links allow you to build dynamic relationships between documents. For example, selecting a marked word or phrase in the current document displays more information about the topic, or a list of related topics.
\LaTeX2HTML preserves many of the features of a LaTeX document in HTML. Elements that are too complex to represent in HTML, such as mathematical equations and logos like “TeX,” are converted into graphic images that can be displayed online by graphical browsers. All types of cross referencing elements (including footnotes) are preserved as hypertext links.
When installed, \LaTeX2HTML understands many basic LaTeX commands, but it can be customized to handle other styles. \LaTeX2HTML is written in Perl.
All in all, \LaTeX2HTML is one of the easiest and most effective tools for translating typeset documentation into a format suitable for online presentation. In a graphical environment like X11, Microsoft Windows, or the Macintosh, HTML documents offer very good support for online documentation.
LameTeX is a PostScript translator for a (very limited) subset of LaTeX. One of its original design goals, the inclusion of sophisticated PostScript commands directly in a LaTeX document, has been superseded by the PSTricks package. However, one of the side effects of a special-purpose translator for LaTeX is the ability of that translator to produce different kinds of output, including plain ASCII.
The primary advantage of this method is that it does not require learning an entirely foreign macro package like TeXinfo. The disadvantage is that it understands only a very small subset of LaTeX. This subset includes only the following commands:
\# | \footnotesize | \ref |
\$ | \hspace | \rm |
\% | \hspace* | \sc |
\ | \huge | \scriptsize |
\Huge | \include | \section |
\LARGE | \input | \section* |
\Large | \it | \setlength |
\_ | \item | \sf |
\addtolength | \itemize | \sl |
\backslash | \label | \small\smallskip |
\begin | \large | \subparagraph |
\bf | \ldots | \subparagraph* |
\bigskip | \medskip | \subsection |
\center | \newlength | \subsection* |
\chapter | \newline | \subsubsection |
\chapter* | \normalsize | \subsubsection* |
\clearpage | \par | \tiny |
\description | \paragraph | \today |
\document | \paragraph* | \tt |
\documentstyle | \part | \verbatim |
\em | \part* | \verbatim* |
\end | \quotation | \verse |
\enumerate | \quote | \vspace |
\flushleft | \raggedleft | \vspace* |
\flushright | \raggedright |
In addition, unlike LaTeX, LameTeX doesn't understand any Plain TeX commands (other than the ones listed).
latex2hy is a LaTeX-to-ASCII converter. It has several options for controlling the input and output character sets. In addition, latex2hy has a number of options for improving the quality of both ASCII and printed documentation. For example, input documents can contain both TeX and ASCII representations for complex objects (like mathematical formulae). The printed documentation uses the TeX version while latex2hy uses the ASCII version. Provision is also made for “fixups,” which allow character sequences from the input text to be translated into different sequences on output. For example, “$9.81\frac{m}{s^2}$” can be automatically translated into 9.81m/s^2. By adding specific fixups to each document that you translate, you can obtain successively better approximations automatically.
latex2hy gives particular attention to cross references. Cross references are translated into “links” between topics in hypertext output formats. Currently, only the TurboVision hypertext format is supported, although several other formats are being considered.
detex is a simple program that does little more than strip control sequences and other TeXish character sequences from your document. Doing this makes the document more amenable to other kinds of processing (like spellchecking) but does a poor job of producing “online documentation.” Still, it's an option.
The dvispell program produces plain text output from a DVI file. It is not a spellchecker, but it was designed to extract the words from a TeX document which were then fed to a spellchecking program that was unable to ignore TeX control sequences.
The dvispell program is part of the emTeX package, and it is remarkably sophisticated. There are many ways in which dvispell can be programmed to perform complex manipulations. One special strength of dvispell is its ability to handle conversion of accented characters and Greek symbols.
dvispell differs from the other programs described here because it works with TeX output, the DVI file. On the one hand, this provides dvispell with more information (character positions, line breaks, floating bodies, etc). On the other hand, all of the formatting commands are missing, so it isn't easy to determine what the user had in mind. (It's a bit like reverse-engineering a piece of software---without the source code, it's not always easy to tell why things look the way they do.) But dvispell gives you access to most of the information that is present in the DVI file.
[109] {Structured markup languages like SGML also address this problem, but they introduce their own set of difficulties. Regardless of whatever advantages they may hold, I'm not going to discuss them here.}
[110] {A LaTeX implementation, called LaTeXinfo, is also available.}
[111] {“Info” is the name of both the output format and a program for displaying the text in a hypertext fashion. Another common way to access info files is with the GNU emacs online help system.}