|
DataMuseum.dkPresents historical artifacts from the history of: DKUUG/EUUG Conference tapes |
This is an automatic "excavation" of a thematic subset of
See our Wiki for more about DKUUG/EUUG Conference tapes Excavated with: AutoArchaeologist - Free & Open Source Software. |
top - metrics - downloadIndex: T p
Length: 40946 (0x9ff2) Types: TextFile Names: »part1.ms«
└─⟦9ae75bfbd⟧ Bits:30007242 EUUGD3: Starter Kit └─⟦aa80fdcbc⟧ »EurOpenD3/mail/ida.5.61.tar.Z« └─⟦4314099ac⟧ └─⟦this⟧ »doc/part1.ms« └─⟦9ae75bfbd⟧ Bits:30007242 EUUGD3: Starter Kit └─⟦bfebc70e2⟧ »EurOpenD3/mail/sendmail-5.65b+IDA-1.4.3.tar.Z« └─⟦f9e35cd84⟧ └─⟦this⟧ »sendmail/ida/doc/part1.ms«
.\" refer -e -l,2 -s paper.ms | tbl | pstroff -ms -*- nroff -*- .AM .RP .ds < \v'0.2m'\s-3 .ds > \s0\v'-0.2m' .de DQ \" Double quoted string \\&\\$3\\*Q\\$1\\*U\\$2 .. .de SQ \" Single quoted string \\&\\$3`\\$1'\\$2 .. .de UC \" Uppercase string (in a smaller font) \\&\\$3\\s-1\\$1\\s+1\&\\$2 .. .de UQ \" Uppercase quoted string (in a smaller font) \\&\\$3\\*Q\\s-1\\$1\\s+1\\*U\\$2 .. .de QQ \" Quoted paragraph (possibly in a sized font) .QP .if !'\\$1'' .ps \\$1 .. .de II \" Indented, auto numbered paragraph .if !'\\$1'' .nr II \\$1-1 1 .IP [\\n+(II] .. .de JB \" Indented paragraph, bold label, extended width .IP "\fB\\$1\fR" 15 .. .de JS \" Indented paragraph, small label .IP "\s-1\\$1\s+1" .. .de AP \" Appendix .if \\n(1T .bp .RT .if \\n(1T .sp .if !\\n(1T .BG .RT .ft 3 .if n .ul 100 APPENDIX \\$1: .. .de DO \" Domain table entry, see Appendix D .br .UC \\$1 \t\\$2 .. .de X1 \" Generate 1st level index entry .br .ie '\\$3'' .ta \\n(LLu-\\w"\\$1"u \\n(LLuR .el .ta \\n(LLu-\\w'\\$3'u-1u \\n(LLu-\\w'\\$3'u \\$2\a\t\\$1 .. .de X2 \" Generate 2nd level index entry .in 3n .nr LL \\n(LL-3n .X1 "\\$1" "\\$2" .nr LL \\n(LL+3n .in 0 .. .\" ***** HERE BEGINS THE ACTUAL CODE (ie TEXT) .ND May 27, 1987 .ie n .ds LH Electronic Mail Addressing .el .ds LH Electronic Mail Addressing with The IDA Sendmail Enhancement Kit .ds CH .ds RH Lennart Lo\\*:vstrand \\(co 1987 .ds LF .ds CF \*- % \*- .ds RF .TL Electronic Mail Addressing in Theory and Practice .SM .br with The IDA Sendmail Enhancement Kit .if t \{\ .SM .br (or The Postmaster's Last Will and Testament) .\} .AU Lennart Lo\*:vstrand* .FS * New address from July 1987: Xerox EuroPARC, 61 Regent Street, Cambridge CB2 1AB, U.K. .FE <lel@ida.liu.se> .AI Department of Computer and Information Science University of Linko\*:ping S-581 83 Linko\*:ping SWEDEN .AB This paper discusses theoretical and practical aspects of handling electronic mail addresses in a heterogeneous environment. It argues for more intelligent Mail Transport Agents that are able to fully format addresses according to different formats and that does not unnecessarily complicate header addresses. Also described is a set of enhancements to the .UX .I sendmail program and accompanying rewriting rules used to fulfill our two main goals: (1) To provide a canonical format for handling all electronic mail addresses in which .DQ replying regularly will work and where local users do not have to depend on the recipient's explicit route or addressing syntax when submitting a message. (2) To design and implement a method for managing mail to and from local users in a machine independent way, allowing them to change their preferred actual mailboxes while maintaining the same visible surface addresses at all times. .FS .ps +1 .sp Report no. LiTH-IDA-Ex-8715 .FE .AE .NH INTRODUCTION .QQ .I While some computer-based mail addressing systems are actually easier to deal with than the paper-based model, they are the exception\*-and not the rule. .br .ti +\n(QIu Why, you might ask, has electronic mail service become so very complex? Most of the problems are simply inherent in reaching beyond a local system to connect with another. .br .R .ad r \& .[[ %A David Crocker %T Networking Considered Harmful %J Unix Review %V 5 %N 3 %D 1987 .]] .br .ad b .LP Sending electronic mail is not always as easy as it ought to be. Too many incompatible mail addressing formats exist, forcing the presumptive user sending a message to know a great deal more than can be thought reasonable about the recipient mail system's idiosyncrasies. This is a widely recognized problem, which can be seen as a consequence of the ever increasing interconnectivity between different computer systems, each subscribing to a different addressing standard. There are gateways that do address transformation on messages passing from one network to another, but it is normally done in a too insufficient manner to get rid of the unintelligible hybrid addresses that often infest us. Even worse are the many systems that assault these mixed format addresses by rewriting them to malformed or incomplete ones. A hybrid address passing several network boundaries is often transformed in such a way that it no longer is possible to use it as a .DQ reply or error return address; not even for a human being, much less for a machine. .PP These problems are especially frequent in the .UX world. Networks like the .UC ARPANET and .UC CSNET have the advantage of being more internally coherent; both follow the Internet mail syntax specifications, described in .UC RFC 822 \& .[[ %A David Crocker %T Standard for the Format of \s-1ARPA\s+1 Internet Text Messages %S \s-1RFC\s+1\&822 %D 1982 .]]. The .UX world used to practice the .SQ ! -path addressing syntax in which all addresses are relative routes, but has recently been moving over to the domain address standard of the Internet. The present problems concern nodes that has not yet done the transition and those that .I cannot change, because their standard mailer software is unable to handle these new format addresses. A typical example of the latter are the System V systems. Berkeley systems have the freedom of .I sendmail (8), which unfortunately not always turns out as a blessing. In a way, it is too easy to rewrite addresses using .I sendmail , but too hard to control the transformations. This often leads to strange and incompatible formats that don't belong in either standard. .PP This paper discusses the most common formats and functions electronic mail addresses have. It argues for more intelligent Mail Transport Agents that are able to fully format addresses according to different formats and that does not unnecessarily complicate header addresses. In the end, it moves over to describe the .I IDA Sendmail Enhancement Kit .R and the work and rationale that lies behind it. The Kit is made up of two parts: First, the configuration file setup and the rewriting rules contained in it. These implement a rewriting strategy based on always .I completely resolving addresses instead of being content by looking at the immediate host. The addresses are then fully transformed again according to the respective mailer's and expected ultimate recipient's format. Second, we describe a set of modifications to the .I sendmail source, giving it an extended functionality that in the opinion of this author should have been implemented long ago. Typical additions are: Direct Access to Dbm(3) Files, Separate Envelope/Header Rewritings, and Multi-Token Class Matches. The configuration file is heavily dependent of these modifications and will not function without them. .PP We have also developed a way of handling mail to or from local users in a machine independent way by hiding their actual sender and recipient addresses behind generic organization oriented addresses. This way, one may have a fixed visible address which is dynamically associated with one or more physical mailboxes. Mails sent from any of a person's .DQ "well known" accounts will appear to come from his generic address. Similarly, mail to any of his generic address will be forwarded to his preferred mailbox(es). Note that the generic addresses as a group have no connection to any particular machine. Instead, they are merely database entries on one or more nodes. .NH NAMES, ADDRESSES, AND ROUTES .LP Larry Kluger and John Shoch has in an excellent article .[ [ %A Larry Kluger %A John Shoch %T Names, Addresses, and Routes %J Unix Review %V 4 %N 1 %D 1986 .]] described the distinction between .I names , .I addresses , and .I routes , in short: .QQ .I The name of a resource refers to what we seek, an address indicates where the resource is, and a route tells us how to get there. .LP When dealing with electronic mail, .I names are typically used in identifying three kinds of entities: (1) The mailbox associated with the sender (originator) and recipient of a message, (2) The name space (domain) in which the sender/recipient is known, and (3) The computer system that houses a Mail Transfer Agent (MTA) able of delivering or forwarding messages. Often, the two latter coincide by associating the domain of a set of mailboxes with the actual machine that implements them. Furthermore, an .I address would be the data structure used in directly connecting to another MTA over a computer network, such as a four-byte Internet number + TCP port number, or an ordinary telephone number. It may well happen that many names map to the same address, or that the same name have more than one address. Lastly, a .I route consists of an ordered sequence of two or more MTA names or addresses, forming an explicit path that the message should take to reach its recipient. Routes can be further divided into .I "system routes," where the MTA itself is the responsible of constructing a useful path and .I "source routes," where that responsibility lies on the person sending the message. .PP The mapping from .I names to .I addresses is essentially beyond the scope of this paper, and will only briefly be mentioned in the following sections. Thus, we have taken the liberty of using the general meaning of the word .I address to it denote both mailbox/domain name pairs as well as complete routes. Also, we are using the words .I system , .I host , and .I node to all denote MTAs somewhere in a network. It is our hope that the reader should not be confused because of this. .NH MAIL ADDRESS FORMATS .LP The absolute majority of today's mailing systems use addresses,\** .FS That is, routes or mailbox/domain name pairs. .FE represented by a simple string of characters. Some of these characters implement operators that are used to divide the address into mailbox/domain/route parts when parsed by an MTA. Different operators have different directions of associativity, making it increasingly difficult to unambiguously parse addresses produced by combining incompatible operators of different mail address syntaxes. It is hoped that at least some of these problems will be solved with the emergence of the structured attribute list addresses of .UC X .400. In the mean time, we have a variety of different formats in use, each subscribing to a different set of delimiting operators. It is not uncommon to see addresses like: .QQ mcvax!enea!liuida!obelix!p_e%seismo.css.gov@relay.cs.net .LP or even .QQ enea!seismo.\s-1CSS.GOV\s+1!!\s-1OZ.AI.MIT.EDU\s+1,!\s-1MC.LCS.MIT.EDU\s+1:ebg!\s-1REAGAN.AI.MIT.EDU\s+1 .LP turn up in message envelopes and headers. The last example comes from the envelope sender address found on a message in which the .UC RFC 822 route was incompletely translated into .UUCP .SQ ! -path syntax. Now, before delving into a discussion about how these may be resolved or preferably avoided, let's take a look at what kind of addressing formats currently exist. .NH 2 Relative Addresses .LP These types of addresses are by necessity all implemented as .I routes . In purely relative addresses, all node names are relative to each other, making path optimization or system routing difficult, if not impossible. For the sender of a message, this means that addresses will look different depending on his location in the network, forcing him to recompute all addresses each time he changes his location. Even worse, in a rapidly growing network, it might even happen that an address becomes invalid overnight because some link far away has been disconnected or replaced by another. All this makes it difficult for a presumptive user to continuously keep his addresses correct and up to date. .PP Relative addresses have since long been in use within the .UX community, but a great deal of work has been done by an organization called .I "The \s-1UUCP\s+1 Mapping Project" in eliminating duplicate host names, thus making it possible to use absolute addresses\** .FS See the following section. .FE in a flat name space. It is presently moving towards utilizing full domain names but is delayed by the fact that some systems, notably .I "System V" systems, cannot handle anything but .UC UUCP source routes with standard mailer software. The addressing syntax for .UX .UC UUCP .SQ ! -paths is as follows: .QQ node!\|.\|.\|.\|!node!user .LP The route sequence is read from the left to the right, with the ultimate recipient on the rightmost end. Other systems that have similar addressing formats are the Berknet and .UC VAX/VMS mail systems, which use: .QQ node:\|.\|.\|.\|:node:user .LP and .QQ node::\|.\|.\|.\|::node::user .LP respectively. .UC RFC 822 also specifies a way of constructing explicit paths using the somewhat complicated syntax: .QQ <@node,@node,\|.\|.\|.\|:user@node> .LP Here, the message should be passed through each successive node from left to right, ending up in the last user@node's mailbox. Note that the less than and greater than brackets are included in the syntax. Another widely used but undocumented format is .I Ye Olde .UC ARPANET .SQ % -Kludge: .R .QQ user%node%\|.\|.\|.\|%node@node .LP which is interpreted from the right to the left by delivering the message to the node after the atsign and then instantiating the rightmost percent sign into a new atsign, etc. .NH 2 Absolute Addresses .QQ .nf .I The Tao that can be told of is not the Absolute Tao; The Names that can be given are not Absolute Names.\k: The Nameless is the origin of Heaven and Earth; The Named is it the Mother of all Things. .br .R \h'|\n:u-\w'[LaotseBC]'u' .[[ %A Laotse %T Tao Te Ching %S Book 1, Verse 1 %D ca 500 BC .]] .br .ad b .LP Absolute addresses have the advantage of being universally unique and thus applicable by any MTA\** .FS At least in theory\*-not all MTAs necessarily know about how to deliver to all addresses. .FE independently of where it is located. Since the names should be uniquely identified, some way of distributing them within their name space needs to be accomplished. The simplest way of doing this is by registering plain node names with some central name directory on a first-come-you-get-it service. The .I "\s-1UUCP\s+1 Project" tried this to avoid duplicate .UC UUCP node names. However, maintaining such a directory and propagating its changes easily becomes too heavy a burden to handle. Another strategy was first adopted by the .UC ARPA Internet community, the hierarchical domain naming system described by .UC RFC 882 \& .[[ %A Paul Mockapetris %T Domain Names\*-Concepts and Facilities %S \s-1RFC\s+1\&882 %D 1983 .]], .UC RFC 920 \& .[[ %A Jon Postel %A Joyce Reynolds %T Domain Requirements %S \s-1RFC\s+1\&920 %D 1984 .]] and others. .PP In this system, a labelled tree is built with each node in the tree denoting a specific domain. Some nodes correspond to actual hosts, typically the leaves in the tree, while others simply map to some organizational entity, like a group, department, or institution. The purpose of the domain naming system is to distribute the naming authority throughout the tree. Letting each domain have the responsibility of naming the domains immediately beneath it guarantees the uniqueness of all simple domain names relative to their parents. The full, qualified domain names are constructed by concatenating each level's simple domain name with a dot in between. For example, there might exist a certain mail computer named .UQ MC within the Laboratory of Computer Science of the Massachusetts Institute of Technology, an Educational organization. A possible domain name for this computer would be: .QQ -1 MC.LCS.MIT.EDU .LP There might be many hosts named .UQ MC, but only one within the .UQ LCS.MIT.EDU domain. The same goes for the .UQ LCS domain within the .UQ MIT.EDU domain. The global uniqueness of each fully qualified domain is thus guaranteed by its parentage. .PP The domain system is currently in use within the .UC ARPA Internet, .UC CSNET, and is in progress within the .UC UUCP world. Under its anonymous root domain, it presently has six three-letter organizational domains registered and a continuously increasing number of national two-letter domains. The organizational domains are mainly used within the U.S., and the national domains in Europe and Asia. There are also a set of .I "de facto" network based domains in use, although not officially registered. These are really mock domains used to incorporate hosts on physical networks that cannot or do not want to handle domain addresses. Examples of these are .UC BITNET and still most of the .UC UUCP world. Appendix D lists all domains currently registered with the SRI Network Information Center together with a set of otherwise frequently recognized network based domains. .NH 2 Attribute Addresses .LP With the .UC CCITT \** .FS .I Comite\*' Consultatif International Te\*'le\*'phonique et Te\*'le\*'graphique, .R i.e. the International Telegraph and Telephone Consultive Committee .FE .UC X .400 \& .[[ %A Malaga-Torremolinos %T Message Handling Systems: System Model\\*-Service Elements %S \s-1X\s+1.400 %D 1984 .]] series standard for electronic mail in emergence, a new kind of addressing system is being proposed. In this format, recipients are uniquely identified using a list of attribute-value pairs. Some of these, like the Organization and Country attributes, are obligatory while others may be supplied only if known by the sender. The idea is that the base attributes should be able to guide the message to a relevant directory server, while the others then are used to select the actual recipient. Attribute sets that select no or more than one recipient will probably be considered erroneous, but could be used in selecting multiple recipients. .PP It will yet take several years before the attribute addressing scheme has come to widespread use. It will, however, surely come\*-if nothing else, then because it has the force of the united PTTs behind it. Already, there exists guidelines for mapping between .UC RFC 822 based addresses and .UC X .400, such as .UC RFC 987 \& .[[ %A Steven Kille %S \s-1RFC\s+1\&987 %T Mapping Between \s-1X\s+1.400 and \s-1RFC\s+1\&822 %D 1986 .]]. .NH 2 Hybrid Addresses .LP With all this in mind, let's take a look at how different formats sometimes are combined and how we can resolve them. The three major addressing formats for routing messages are: .TS l lw(2i) l. [1] T{ The .UC UUCP .SQ ! -path T} <\fInode\*<1\*>\fP!\fInode\*<2\*>\fP!\fInode\*<3\*>\fP!\fIuser\fP> [2] T{ Ye Olde .UC ARPANET .SQ % -Kludge T} <\fIuser\fP%\fInode\*<3\*>\fP%\fInode\*<2\*>\fP@\fInode\*<1\*>\fP> [3] T{ The .UC RFC 822 route syntax T} <@\fInode\*<1\*>\fP,@\fInode\*<2\*>\fP:\fIuser\fP@\fInode\*<3\*>\fP> .TE .LP where the latter mostly is used for envelope senders. .PP Combinations of the above usually appear in messages crossing one or more network boundaries with different addressing formats. Since each of these formats were independently developed, it may not be obvious how they should be interpreted when combined. Still, by reasoning a little, much can be inferred from how they incrementally are constructed. .PP Starting with the Domainist's approach to the matter, we have to give .SQ @ precedence over .SQ ! since this is implied by .UC RFC 822. This means that addresses like: .QQ node\*<2\*>!node\*<1\*>!user@domain .LP will be interpreted as: .QQ domain \(-> node\*<2\*> \(-> node\*<1\*> \(-> user .LP Now, since .SQ % is often the .I "de facto" standard routing operator on top of .SQ @ , an address like: .QQ host!user@domain .LP that is autorouted through .I relay will probably end up looking as: .QQ host!user%domain@relay .LP meaning: .QQ relay \(-> domain \(-> host \(-> user .LP This forces us to give .SQ % priority over .SQ ! . However, a .SQ ! -path address ending with a .DQ user%node, cannot be a domain address (no .SQ @ ) and should therefore be interpreted using .UC UUCP semantics by prioritizing .SQ ! over .SQ % . Thus, .QQ node\*<1\*>!node\*<2\*>!user%domain .LP should be read as: .QQ node\*<1\*> \(-> node\*<2\*> \(-> domain \(-> user .LP Mixtures with .UC RFC 822 routes may look hard to read, but are actually easy to parse. A fairly complicated address like: .QQ node\*<1\*>!node\*<2\*>!@domain\*<1\*>,@domain\*<2\*>:host!user%relay@domain\*<3\*> .LP has to be interpreted as: .QQ node\*<1\*> \(-> node\*<2\*> \(-> domain\*<1\*> \(-> domain\*<2\*> \(-> domain\*<3\*> \(-> relay \(-> host \(-> user .LP since .UC RFC 822 like .SQ ! -paths associate left-to-right, and since the last .DQ localpart@domain can be unambiguously found after the colon. .PP Now, not all of us are Domainists. Many nodes can and will only be able to interpret .UC UUCP .SQ ! -paths, which leads to complications with mixed .SQ ! - and .SQ @ -style addresses. The only workable solution to this is to try and avoid such mixtures altogether. The easiest way of doing this is to write them as .SQ ! - and .SQ % -style combinations, but even better would be to wrap them wholly around to the .SQ ! -path format. They should then turned back into .SQ % and .SQ @ combinations when breaking the Domain Land boundary. .NH A SHORT ANATOMY OF THE ELECTRONIC MESSAGE .LP In analogy to the written letter, there are two major parts of a message: The envelope and the contents. The envelope is there specifically for the MTAs to handle and contains the sender address together with the message's actual recipients. The contents are usually further subdivided into the header lines and the actual body, where only the latter is under the sender's full control. The headers are used by the MTAs and MUAs\** .FS Mail User Agent, the program that the user directly interacts with when reading or composing messages. .FE to store various information of interest to the recipient, such as sender, all official recipients, posting date, etc. Although the body usually is left uninterpreted, some mail systems put constraints by limiting the length of each line or the whole message, or by only allowing printable .UC ASCII characters. .NH 2 The Envelope .LP The envelope contains the physical message's actual recipients, which very well may be different from those in the headers. Typically, a message sent to more than one recipient will be split into .I n copies, one for each network. These messages will have the original's all recipients listed in their header lines, but each copy's envelope should only have those being delivered over the network in question. There is usually also the option of .I "Blank Carbon Copy" recipients, which per definition never shall show up in the headers. .PP The envelope will also contain the explicit path back to the sender for error messages and tracing purposes. This path should formed by having each node that forwards the message incrementally add its name to the route, thus avoiding routing problems that otherwise may appear. The result of each rewriting should be a full route in a suitable format leading from the current node back to the originator. .PP If the envelope recipient(s) are routes, they are handled in an analogous manner to the senders by removing the local node's name from each address before propagating it further. Optionally, the address can be made fully relative to the immediate receiving node by removing its name from the route as well. This should be determined on a mailer dependent basis. The MTA has the full freedom of at any point turning a simple envelope recipient address into a route if it sees reason to do so. This could be done on the grounds that the immediate recipient node cannot perform automatic routing. It should, however, be avoided if possible since it is hard to keep routing tables fully updated with topological changes in distant parts of the network. Turning envelope routes into simple addresses should also be avoided since there usually exists a good reason for a route to be there. .NH 2 The Headers .LP Header addresses are not normally used by the MTA. Exceptions may be when headers such as .DQ "Return-Receipt-To:" exists and the MTA is doing the final delivery or when the delivery of a message fails and there exists a .DQ Errors-To: header.\** .FS These are .I sendmail specific; other MTAs may have other exceptions. .FE The MTA is also allowed to rewrite, or .DQ munge, header addresses when a message is forwarded from one network to another. This is done by first removing the addressing idiosyncrasies of the transmitting network to obtain some internal canonical format and then applying the receiving network's idiosyncrasies to produce a conforming address .[ [ %A Marshall Rose %T Proposed Standard for Message Header Munging %S \s-1RFC\s+1\&886 %D 1983 .]]. Of course, this should be done to both envelope and header addresses. .PP Even within one world, like the .UC UUCP pseudo-network, it may be necessary to .DQ munge addresses for them to be understandable by the recipient system. For instance, many mail systems does not recognize all domains or perhaps cannot even handle anything but pure and fully routed .UC UUCP .SQ ! -paths. If the transmitting MTA does not take this into consideration, the user sending the message has to submit full source routes with each receiving network's addressing syntax embedded. Except in the most simple cases, this task requires great knowledge\** .FS That is, a case for a .I guru ! .FE about how networks are interconnected, much more than can be considered reasonable by any casual or even experienced user. .PP .I In our opinion, this is currently the greatest obstacle in making electronic mail usable. .R On from bad to worse, these user supplied source routes that are fully contained in the headers often get rewritten into further complicated routes. When such a message is received by its recipient, its header addresses may very well be too unintelligible to be understandable by a human being, much less by a machine. In the best case, they will just have routes with incorrect points of reference, forcing .DQ reply messages to the other recipients to first be (automatically) routed to the first node of the path before it can start on the actual route. Then often in the opposite direction, leading half way back again. .NH ADDRESS REWRITING STRATEGIES .LP Now, given the freedom and flexibility of .I sendmail , our project's task has been to construct a configuration file that, with the necessary enhancements to the .I sendmail source, will completely resolve and canonicalize all envelope and header addresses to an internal format. All unqualified addresses are then officialized using the .UC TCP/IP name server function and a local .I dbm (3) based domain name table, and a route is found using a direct interface to a .I pathalias (1) routing file. Finally, using a static .I dbm (3) mailer table together again with the .UC TCP/IP name server function, the message is dispatched to the appripriate mailer which fully rewrites the addresses according to its own idiosyncrasies. .NH 2 Sneak-In Preview .LP To give a taste of how the complete system performs with a realistic case, consider at the following only partly imaginary example: .QQ .nf .ne 2.1 .B Envelope: Sender: enea!seismo!relay.cs.net!cate%busch%pany.com Recipient: obelix!p_e .ne 2.1 .B Headers: From: enea!relay.cs.net!cate%busch%pany.com To: mcvax!enea!liuida!obelix!p_e%seismo.css.gov@relay.cs.net cc: ree.pete%fidelio.uu.se%seismo.css.gov@relay.cs.net .fi .LP A user .I cate on the Company Inc's local host .I busch has sent a message to two Swedish recipients: .I p_e on the .UC UUCP host .I obelix in Linko\*:ping and to .I ree.pete on the Uppsala node .I fidelio.uu.se. If the headers would be left untouched, a reply from .I p_e to both .I cate and .I ree.pete would force .I ree.pete 's copy to go all the way back to .I relay.cs.net before it could return to Sweden and Uppsala. Clearly, this is a waste of both resources and time when there might (and does) exist a much shorter path within the country. With The Kit's rewriting heuristics, the same header lines will look like the following when leaving the local node: .QQ .nf .ne 2.1 .B Envelope: Sender: @majestix.liu.se,@enea.se,@seismo:cate%busch%pany.com@relay.cs.net Recipient: p_e%obelix.liu.se@asterix.liu.se .ne 2.1 .B Headers: From: cate%busch@pany.com To: p_e@obelix.\s-1UUCP\s+1 cc: ree.pete@fidelio.uu.se .fi .LP Here, our local node's name has been added to the envelope sender path, which also has been transformed into a .UC RFC 822 route\**. .FS Save for the .SQ < and .SQ > brackets. .FE Other options would be to have it as a .SQ ! -path or .SQ % -path. The envelope recipient has been routed via .I asterix.liu.se, and changed into a .SQ % -path, on the basis that the message is forwarded over a .UC TCP/IP connection and this is the preferred route format for most such systems. .PP Also, the route has been removed from the header .DQ From: line, leaving the first universally qualified node there together with a .SQ % -path from that point to the recipient. The .DQ To: line has undergone even more drastic changes. First, the route to .I seismo.css.gov was removed since this is the first universally qualified node. Then a table of well-known .UC UUCP relays was consulted to further compress the path. .I Mcvax , .I enea , and .I liuida were all members of that list. This gave .DQ obelix!p_e as a result, which then was turned into the domain form .DQ p_e@obelix.\s-1UUCP\s+1. In the last line, .DQ ree.pete@fidelio.uu.se simply had its path removed since .UC \fISE\fP is a registered top domain. .NH 2 The Configuration File .LP The IDA Sendmail Master Configuration File should be sent through the .I m4 (1) macro processor to produce an actual configuration file. Several .I m4 identifiers are used to customize the file; each of them is described in .I "Appendix C: Customization Parameters" . Unlike the Berkeley version, it was not designed as a set of .I m4 fragments that .DQ sources each other to form a full configuration, but rather as a single master configuration file which holds a .I bank of all possible mailers and corresponding rewriting rulesets. The instance's actually available mailers are enabled by giving values to their corresponding .I m4 identifiers. The current version include mailer definitions for a .UC TCP/IP mailer, three kinds of .UC UUCP mailers depending on the remote node's address handling capabilities, a mock .UC DEC net mailer, as well as the .UC LOCAL and .UC PROG mailers. Their design has been kept as clean as possible to make the construction of e.g. .UC BITNET or .UC CSNET mailers using these as templates straight-forward. .PP The rewriting rules of the Kit's configuration file are explicitly oriented towards the domain naming syntax. They will resolve all input addresses to an internal domain based format and then rewrite them according to the selected mailer's preferences. Internally, all addresses have the same .QQ user@.domain .LP format. Note the dot after the atsign; it is there to make it easier to rewrite the address. Also note that this differs substantially from the Berkeley .DQ "whatever<@host>whatever" format. For historical reasons, both the .UC RFC 822 route syntax and .I Ye Olde .UC ARPANET .SQ % -Kludge .R are used internally to represent routes when only one of them should be sufficient. .NH 2 Canonicalizing the Address .LP Ruleset 3 canonicalizes all addresses, making them conform to our internal format. After the canonicalization, the .DQ user part may end up containing a route in either standard .UC RFC 822 format or using the .SQ % -path format. .SQ ! -, .SQ : -, and .SQ :: -style paths are rewritten into .UC RFC 822 routes. Reasonable mixtures of route formats are resolved using the strategies described in the section about .I "Hybrid Addresses" . As an option, the (untested) .UC UUCPPRECEDENCE switch may be turned on in the configuration master file. This will enable some simple heuristics that will decide between domain style and .UC UUCP .SQ ! -path prioritized unpacking depending on whether the .I domain is qualified or not. In any case, ruleset 3 will make sure that the .I domain part of all .DQ user@.domain addresses are mapped to their full, official domain names whenever possible using both the .UC TCP/IP name server and a dbm domaintable. It also goes through some effort to repair malformed addresses, but much of this is probably too site specific to be generally useful. .PP Since .SQ ! -paths are internally represented as .UC RFC 822 routes, you should not be surprised when you see an address like: .QQ foo!bar!baz!user .LP first be transformed into: .QQ @foo.\s-1UUCP\s+1,@bar:user@baz .LP and then to: .QQ bar:user@baz@.foo.\s-1UUCP\s+1 .LP The .UC UUCP domain of .I foo has been inferred from the .SQ ! -style syntax. If .I foo had been known by the domaintable to have specific domain name, that had been used instead. Nothing can be inferred about the nodes .I bar and .I baz , since we they may be local to .I foo . Now, since the pure .UC RFC 822 route doesn't conform to our internal format, i.e. it does not have a .DQ user part followed by an atsign-dot and a .DQ domain, we had to rearrange it a little. The closest node of the route was thus extracted and added the right side of the rest of the route together with the atsign-dot. It may not be very pretty to look at, but it is easier to handle this way. .PP Note that there is a risk of confusing .UC UUCP node names with local hosts using the domaintable lookup. For example, if you had a local node .I linus with a full domain name of .I linus.liu.se and received an address like .DQ linus!user, this would be interpreted as the local .I linus and rewritten into .DQ user@linus.liu.se. This is probably right for envelope recipients, but not so surely in header lines. You can define .UC BANGIMPLIESUUCP if you want to disable the domaintable qualification. .NH 2 Finding Route and Mailer .QQ .I .in +\n(QIu .ti -\n(QIu \*QWould you tell me, please, which way I ought to go from here?\*U .br .ti -\n(QIu \*QThat depends a good deal on where you want to get to,\*U said the Cat. .br .in -\n(QIu .R .ad r \& .[[ %A Lewis Carrol %T Alice in Wonderland %D 1896 .]] .br .ad b .LP Before ruleset 0 tries to find an applicable mailer, it digests all routes through the local host by stripping off its own name and sending the address through ruleset 3 again. It then has four strategies of finding a suitable mailer for the address: .II 1 Try to find a mailer that will connect to the immediate host in the address. .II Try to find a route to the address' domain using a .I dbm (3) routing table and a mailer that will connect to the route's closest node. .II Use the firm-wired .UC RELAY_MAILER and .UC RELAY_HOST pairs to automatically forward the message. .II Give up; send the address to the .UC ERROR mailer. .LP The code that determines if a mailer directly can deliver to a certain domain is found in ruleset 26.\** .FS Yes, I too wish that named rulesets would be available in .I sendmail . Perhaps somebody should convert this configuration file into .I ease . .FE It does this on a per mailer bases with the following order of priority: .IP \s-1LOCAL\s+1 10 If the supplied domain is any of local host's names (member of the .B $w class), or if the complete address is found in the .I aliases (5) file, the message is delivered locally. The latter type of local delivery will cause the address to be expanded to the RHS of the alias entry and the complete process to recurse. .IP \\\\k:\\fISpecial\\fP\\\\h'|\\\\n:u'\\\\v'+1'\\fIMailers\\fP\\\\v'-1' In order to override the standard mailer selection, a special dbm .I mailertable may be used to force addresses to be delivered using specific mailers. If the address' domain is found in the .I mailertable , the associated mailer will be used. The mailer table should map official domain names to .DQ mailer:host pairs, with a colon between the mailer and the host. .IP \s-1TCP/IP\s+1 With the new .I default argument of the .UC TCP/IP nameserver lookup function, it is possible to determine if an address can be delivered using this protocol family without relying on static host tables. If the address' domain is known to the .UC TCP/IP nameserver, it is returned together with its canonicalized host name. .IP \s-1DEC\s+1net The .UC DEC net mailer does not share the network based nameserver facilities of the .UC TCP/IP mailer, and thus has to rely on a host table. This is done with a two-phase operation\*-first the domain is mapped to a .UC DEC net name, if known, then the the .UC DEC net host name is checked in the list of connectable .UC DEC net hosts before it is returned. This is because some .UC DEC net nodes cannot talk across area boundaries, forcing recipient addresses to be explicitly routed over an intermediary host. .I Note: The supplied .UC DEC net mailer uses a .UC TCP/IP connection to a .UC DEC system-20 acting as gateway. A real implementation should remove the immediate node from routes before returning them, but we cannot do this. .IP \s-1UUCP\s+1 The .UC UUCP mailer is also determined with a two-phase operation\*-first the domains is mapped through the .UC UUCP translation table, returning the .UC UUCP node name, if known. The .UC UUCP mailer will then be selected only if the .UC UUCP name is known to be directly connectable by us (normally determined using the /usr/lib/uucp/L.sys file). All nodes found this way will be sent to through the .DQ dumb .UC UUCP mailer. Delivery using either the .UC UUCP-A or the .UC UUCP-B mailer has to be determined using the special mailertable previously mentioned. .LP If an address needs to be routed, i.e. if the first pass through ruleset 26 fails, it is given to ruleset 22 where its domain is looked up in a .I pathalias (1) type routing table. Routes to explicit domain/host names are preferred over general (parent) domain routes. Before the new address is returned, it is sent through the canonicalization routines of ruleset 3. This makes specific .I pathalias route syntax effectively ineffective. The normal way would be not to specify any special routing syntax at all to .I pathalias , but to invariably let it produce .SQ ! -paths. .NH 2 Externalizing the Address .LP After a mailer has been chosen, addresses are rewritten using rulesets 1 and 2 for envelope senders/recipients and rulesets 5 and 6 for header senders/recipients. Envelope senders are left untouched by this process, but envelope recipients will have .UC RFC 822 routes turned into .SQ % -paths. Header .UC RFC 822 routes will also be turned into .SQ % -paths and then gently compressed by having paths to fully qualified domains and .UC UUCP relay-to-relay paths removed. Header senders will furthermore have their host names hidden by .UC HIDDENNAME, if defined, and their addresses filtered through the .UC GENERICFROM table, if available. .PP When this is done, the mailer specific rewriting phase starts. The .UC LOCAL and .UC PROG mailers does not do any further rewriting as supplied, but could be convinced to produce .SQ ! -paths for .UC UUCP routes if preferred [using ruleset 15 or a variant thereof]. .PP The .UC TCP/IP and .UC DEC net mailers will add a call to ruleset 24 for all envelope recipients. This will turn domains corresponding to .UC DEC net nodes into flatspaced .UC DEC net host names, since domains are not supported there. This should really not be done in the .UC TCP/IP mailer, but all our .UC DEC net traffic is presently routed over a .UC TCP/IP link. Since no special rewriting is done for envelope senders, this means that they normally will appear in .UC RFC 822 route format using these as well as any of the previous mailers. .PP There are three variants of the .UC UUCP mailer depending on the remote node's address handling capabilities. The .DQ dumb version, simply called .UC UUCP , corresponds closely to the class 1 mailer of .UC RFC 976 \& .[[ %A Mark Horton %T \s-1UUCP\s+1 Mail Interchange Format Standard %S \s-1RFC\s+1\&976 %D 1986 .]]. It will rewrite all addresses into .SQ ! -format, and makes all header addresses .SQ ! -relative the recipient node, routed through the transmitting node if necessary.\** .FS See the new .UC M_RELATIVIZE mailer flag in the following section. .FE The .UC UUCP-A is closer to the .UC RFC 976 classes 2 and 3 mailers in that it will let all header addresses stay in .SQ @ -format, but change envelope addresses to .SQ ! -paths whenever applicable. The .UC UUCP-B mailer, finally, functions as the .UC UUCP-A mailer but will in addition supply envelope senders in .UC RFC 822 route format and transmit the message to a .I bsmtp program on the remote node. .PP Ruleset 4 will as usual make the address truly external. In our case, this means by removing the dot after the atsign and by moving the immediate domain to the head of .UC RFC 822 routes.