Previous Section Next Section

19.3 The canonify Rule Set 3

The canonify rule set 3 is the first to process every address. Beginning with V8.10 sendmail, that rule set is declared like this:

Scanonify=3

The name canonify gives a clue to its role, that of putting all addresses into focused or canonical form.

The canonify rule set 3 puts each address it gets into a form that simplifies the tasks of other rule sets. The most common method is to have the canonify rule set 3 focus an address (place angle brackets around the host part). Then later rules don't have to search for the host part because it is already highlighted. For example, consider trying to spot the recipient host in this mess:

uuhost!user%host1%host2

Here, user is eventually intended to receive the mail message on the host uuhost. But where should sendmail send the message first? As it happens, sendmail selects uuhost (unless it is uuhost). Focusing on this address therefore results in the following:

user%host1%host2<@uuhost.uucp>

Note that uuhost was moved to the end, the ! was changed to an @, and .uucp was appended. The @ is there so that all focused parts uniformly contain an @ just before the targeted host. Later, when we take up post-processing, we'll show how final rule set 4 moves the uuhost back to the beginning and restores the !.

In actual practice, the role of the canonify rule set 3 is much more complex than this example. In addition to focusing, it must handle list-syntax addresses (ColonOkInAddr), missing and malformed addresses, the % hack (Section 7.4.2), and more.

See LOCAL_RULE_3 (Section 4.3.3.4) for a way to add rules to the canonify rule set 3.

19.3.1 A Special Case: From:<>

Among the rules in a typical canonify rule set 3 are those that handle empty addresses. These represent the special case of an empty or nonexistent address. Empty addresses should be turned into the address of the pseudo-user that bounces mail, MAILER-DAEMON:

R $@      $@ < @ >       empty becomes special

Here, an empty address is rewritten to be a lone @ surrounded by angle braces. Other rules sets later turn this special token into $n (which contains MAILER-DAEMON as its value).

19.3.2 Basic Textual Canonicalization

Addresses can be legally expressed in a variety of formats:

address
address (full name)
<address>
full name <address>
list:members;

When sendmail preprocesses an address that is in the third and forth formats, it needs to find the address inside an arbitrarily deep nesting of angle braces. For example, where is the address in all this?[2]

[2] We exaggerate for the purpose of this example. Technically this is not a legal RFC2822 address, but it might be a legal RFC733 address.

Full Name <x12<@zy<alt=bob@r.com<bob@r.net>r.r.net>#5>+>

The rules in a typical canonify rule set 3 will quickly cut through all this and focus on the actual address:

R $*                     $: < $1 >                       housekeeping <>
R $+ < $* >                 < $2 >                       strip excess on left
R < $* > $+                 < $1 >                       strip excess on right

Here, the first rule puts angle braces around everything so that the next two rules will still work, even if the original address had no angle braces. The second rule essentially looks for the leftmost < character and throws away everything to the left of that. Because rules are recursive, it does that until there is only one < left. The third rule completes the process by looking for the rightmost > and discarding everything after that.

You can witness this process by running sendmail in -bt rule-testing mode, using something such as the following. Note that some of the lines that sendmail outputs are wrapped to fit the page:

% /usr/sbin/sendmail -bt
ADDRESS TEST MODE (ruleset 3 NOT automatically invoked)
Enter <ruleset> <address>
> -d21.12
> canonify Full Name <x12<@zy<alt=bob@r.com<bob@r.netr.r.net>#5>+> >
... some other rules here
-----trying rule: $*
-----rule matches: $: < $1 >
rewritten as: < Full Name < x12 < @ zy < alt=bob @ r . com < bob @ your . domain > 
relay . domain > #5 > + > >
-----trying rule: $+ < $* >
-----rule matches: < $2 >
rewritten as: < x12 < @ zy < alt=bob @ r . com < bob @ your . domain > relay . domain
> #5 > + > >
-----trying rule: $+ < $* >
-----rule matches: < $2 >
rewritten as: < @ zy < alt=bob @ r . com < bob @ your . domain > relay . domain > #5
> + > >
-----trying rule: $+ < $* >
-----rule matches: < $2 >
rewritten as: < alt=bob @ r . com < bob @ your . domain > relay . domain > #5 > + > >
-----trying rule: $+ < $* >
-----rule matches: < $2 >
rewritten as: < bob @ your . domain > relay . domain > #5 > + > >
-----trying rule: $+ < $* >
----- rule fails
-----trying rule: < $* > $+
-----rule matches: < $1 >
rewritten as: < bob @ your . domain >

Notice that we first put sendmail into debugging mode so that we can watch the rules at work. Then we feed in the canonify rule set 3 followed by the address that was such a mess earlier in this section. The three rules we showed you do their job and isolate the real address from all the other nonaddress pieces of information.

19.3.3 Handling Routing Addresses

Beginning with V8.10, sendmail removes route addresses by default, unless the DontPruneRoutes option (DontPruneRoutes) is set to true.

Route addresses are addresses in the form:

@A,@B:user@C

Here, mail should be sent first to A, then from A to B, and finally from B to C.[3]

[3] Also see the F=d delivery agent flag (F=d) for a way to prevent route addresses from being enclosed in angle braces.

19.3.4 Handling Specialty Addresses

A whole book is dedicated to the myriad forms of addressing that might face a site administrator: !%@:: A Directory of Electronic Mail Addressing & Networks by Donnalyn Frey and Rick Adams (O'Reilly & Associates, 1993). We won't duplicate that work here. Rather, we point out that most such addresses are handled nicely by existing configuration files. Consider the format of a DECnet address:

host::user

The best approach to handling such an address in the canonify rule set 3 is to convert it into the Internet user@host.domain form:

R $+ :: $+        $@ $2 @ $1.decnet

Here, we reverse the host and user and put them into Internet form. The .decnet can later be used by the parse rule set 0 to select an appropriate delivery agent.

This is a simple example of a special address problem from the many that can develop. In addition to DECnet, for example, your site might have to deal with Xerox Grapevine addresses, X.400 addresses, or UUCP addresses. The best way to handle such addresses is to copy what others have done.

19.3.5 Focusing for @ Syntax

The last few rules in our illustration of a typical canonify rule set 3 are used to process the Internet-style user@domain address:

# find focus for @ syntax addresses
R $+ @ $+                $: $1 <@ $2>        focus on domain
R $+ < $+ @ $+ >         $1 $2 <@ $3>        move gaze right
R $+ <@ $+ >             $@ $1 <@ $2>        already focused

For an address such as something@something, the first rule focuses on all the tokens following the first @ as the name of the host. Recall that the $: prefix to the righthand side (RHS) prevents potentially infinite recursion.

Assuming that the workspace started with:

user@host

these rules will rewrite that address to focus on the host part and become:

user<@host>

Any address that has not been handled by the canonify rule set 3 is unchanged and probably not focused. Because the parse rule set 0 expects all addresses to be focused so that it can select appropriate delivery agents, such unfocused addresses can bounce. Many configuration files allow local addresses (just a username) to be unfocused.

    Previous Section Next Section