|
DataMuseum.dkPresents historical artifacts from the history of: DKUUG/EUUG Conference tapes |
This is an automatic "excavation" of a thematic subset of
See our Wiki for more about DKUUG/EUUG Conference tapes Excavated with: AutoArchaeologist - Free & Open Source Software. |
top - metrics - downloadIndex: C T
Length: 6874 (0x1ada) Types: TextFile Names: »COMMENT«
└─⟦9ae75bfbd⟧ Bits:30007242 EUUGD3: Starter Kit └─⟦bfebc70e2⟧ »EurOpenD3/mail/sendmail-5.65b+IDA-1.4.3.tar.Z« └─⟦f9e35cd84⟧ └─⟦this⟧ »sendmail/ida/cf/COMMENT«
Some comments on my rewrite of Sendmail.mc . Neil Rickert, rickert@cs.niu.edu Thu, 7 Jun 90 (Note: These are ex post facto comments, and do not necessarily reflect my thinking during the early part of the design phase.) 1. Data structures. It has long been my philosophy that the three most important ingredients of good programming are: Data structures, data structures, and data strucures. I think I came up with a pretty good data structure for representing addresses. You can essentially think of the internal address as being represented as an array, or more accurately as an array implementation of a pushdown stack. The bottom element of the stack is the 'user' identifier in the final destination domain. Otherwise each element is a pair consisting of a domain (the '@domain' portion) and an addressing format flag (the ',' or ':' or '!'). The majority of the .cf file deals with the top of the stack. For example, you can think of the PATHTABLE lookup as popping the top from the stack and then pushing the domains of its pathalias lookup value. The mailer specific rulesets, of course, must use the fact that the stack is implemented as an array, so that they can visit each entry and modify the flag. There is, of course, one more overall flag, which I implemented with the choice of either '@' or '%' as the leading character for the primary domain. Think of this as an overall flag to distinguish between sender and recipient addresses. This was added when I realized I was pushing the limit of the 30 allowable rulesets, and I didn't wish to require code changes if it could be avoided. It was pretty obvious that the mailer specific rulesets for sender and recipient addresses were usually very similar, so this flag idea makes sense. The flag idea could easily have been extended to also distinguish between header and envelope addresses, but I felt that would be overreaching for the present. 2. Logic. In my (admittedly biased) opinion, the internal logic is, in the main, simpler than in the original IDA. The original package was made overly complex by the continual reformatting from a bang path to a source route to %-path, and then possibly converting it back again. Part of my motivation in doing this rewrite was my frustration with the complexity of the original IDA. Almost every time I studied it closely I found another logic error, usually caused by making unwarranted assumptions in the process of conversion of an address between formats. It is fortunate that most of these logic errors rarely caused serious problems. It is much simpler to retain a consistent internal representation, and work mostly by changing the 'addressing format flag'. Most of the remaining complexity is because the internal representation as an array of domains must be crudely simulated in a tokenized character string. Except for the mailer specific rulesets, which need considerable flexibility, this would have been much easier to implement in code than in the replacement rulesets. A code version would probably be more reliable, too. Paul mentioned that Berkeley is working on a sendmail replacement. Perhaps they should look at this config, together with these comments, for one possible approach to address formatting. 3. Dealing with ambiguous addresses. I largely followed the interpretation of Lennart Lovestrand in the processing of input addresses. In particular in an address containing '@', '!' and '%', I gave '%' a higher precedence than '!' (unless the address originates at a UUCP source and STRICTLY822 is NOT defined). Thus the address 'c!u%b@a' was converted to the internal form: @a, @b: @c! u (where the spacing is for readability, and <> are omitted). It is entirely possible that on some occasions this is incorrect, and the address should have been interpreted as: @a, @c! @b: u I permitted sufficient residual ambiguity in many of the rulesets so that, when finally processed by ruleset 4, either of these internal forms would still finish up as 'c!u%b@a' in the final output address. 4. Ruleset #4. This ruleset is now in its fourth, and I hope final, rewrite. In each of my earlier versions I made some attempt to limit the degree of ambiguity allowed in the resultant address. But every approach had potential problems with eliminating some plausibly reasonable output address formats. I finally concluded that S4 should be totally general, and the reduction of ambiguity should be handled in the mailer specific rulesets. There is a down side to this complete generality. It is theoretically possible to present an internal format address to ruleset #4, such that the output is completely uninterpretable. I doubt that such addresses will arise naturally, and in any case the mailer specific rulesets are designed to eliminate most of the problems that could permit such addresses. About the only ambiguity I allowed to remain after the mailer specific rulesets is that between the precedence of '%' and '!' in a mixed address. I allowed some residual ambiguity there because, as I commented above, the original assumptions when first parsing the input address may have been incorrect. 5. Residual complexity. Most of the remaining complexity is in rulesets #4, #7, #9, and #19. Complexity is unavoidable in rulesets #4, #7 and #9, which deal with conversion between internal and external forms. The complexity of ruleset #19 is caused by the combination of two factors: my wish to retain full domain addresses as far as possible, replacing them by UUCP names only in ruleset #4; and my decision to fully follow the logic of the original IDA in using the UUCPXTABLE lookup as one criterion in the decision to continue conversion to ! formatting. Actually ruleset #19 (and its continuation in ruleset #20, or was it #21) was one of the places where there was a logic error in the original IDA. It never really used UUCPXTABLE the way the documentation claimed. I have a sneaking suspicion that all of the UUCPXTABLE dependent code in ruleset #19 should be eliminated, for the sake of a reduction in complexity. Now that the unnecessary conversion back and forth between '!' and '%' formats is eliminated, the original need for this code has probably disappeared.