⟦8cce5addf⟧

TextFile

              Some comments on my rewrite of Sendmail.mc .

                                                        Neil Rickert,
                                                        rickert@cs.niu.edu
							Thu, 7 Jun 90

  (Note: These are ex post facto comments, and do not necessarily reflect my 
         thinking during the early part of the design phase.)

1.  Data structures.

    It has long been my philosophy that the three most important ingredients 
    of good programming are: Data structures, data structures, and data 
    strucures.  I think I came up with a pretty good data structure for 
    representing addresses. 

    You can essentially think of the internal address as being represented as 
    an array, or more accurately as an array implementation of a pushdown 
    stack.  The bottom element of the stack is the 'user' identifier in the 
    final destination domain.  Otherwise each element is a pair consisting of 
    a domain (the '@domain' portion) and an addressing format flag (the ',' or 
    ':' or '!').  The majority of the .cf file deals with the top of the 
    stack.  For example, you can think of the PATHTABLE lookup as popping the 
    top from the stack and then pushing the domains of its pathalias lookup 
    value. 

    The mailer specific rulesets, of course, must use the fact that the stack 
    is implemented as an array, so that they can visit each entry and modify 
    the flag.

    There is, of course, one more overall flag, which I implemented with the 
    choice of either '@' or '%' as the leading character for the primary 
    domain.  Think of this as an overall flag to distinguish between sender 
    and recipient addresses.  This was added when I realized I was pushing the 
    limit of the 30 allowable rulesets, and I didn't wish to require code 
    changes if it could be avoided.  It was pretty obvious that the mailer 
    specific rulesets for sender and recipient addresses were usually very 
    similar, so this flag idea makes sense.  The flag idea could easily have 
    been extended to also distinguish between header and envelope addresses, 
    but I felt that would be overreaching for the present. 

2.  Logic.

    In my (admittedly biased) opinion, the internal logic is, in the main, 
    simpler than in the original IDA.  The original package was made overly 
    complex by the continual reformatting from a bang path to a source route 
    to %-path, and then possibly converting it back again.  Part of my 
    motivation in doing this rewrite was my frustration with the complexity of 
    the original IDA.  Almost every time I studied it closely I found another 
    logic error, usually caused by making unwarranted assumptions in the 
    process of conversion of an address between formats.  It is fortunate that 
    most of these logic errors rarely caused serious problems. 

    It is much simpler to retain a consistent internal representation, and 
    work mostly by changing the 'addressing format flag'. 

    Most of the remaining complexity is because the internal representation as 
    an array of domains must be crudely simulated in a tokenized character 
    string.  Except for the mailer specific rulesets, which need considerable 
    flexibility, this would have been much easier to implement in code than in 
    the replacement rulesets.  A code version would probably be more reliable, 
    too.  Paul mentioned that Berkeley is working on a sendmail replacement.  
    Perhaps they should look at this config, together with these comments, for 
    one possible approach to address formatting. 

3.  Dealing with ambiguous addresses.

    I largely followed the interpretation of Lennart Lovestrand in the 
    processing of input addresses.  In particular in an address containing 
    '@', '!' and '%', I gave '%' a higher precedence than '!' (unless the 
    address originates at a UUCP source and STRICTLY822 is NOT defined).

    Thus the address  'c!u%b@a'  was converted to the internal form:
        @a,  @b:  @c!  u   
      (where the spacing is for readability, and <> are omitted).

    It is entirely possible that on some occasions this is incorrect, and the 
    address should have been interpreted as:

        @a,  @c!  @b:  u

    I permitted sufficient residual ambiguity in many of the rulesets so that, 
    when finally processed by ruleset 4, either of these internal forms would 
    still finish up as  'c!u%b@a'  in the final output address. 

4.  Ruleset #4.

    This ruleset is now in its fourth, and I hope final, rewrite.  In each of 
    my earlier versions I made some attempt to limit the degree of ambiguity 
    allowed in the resultant address.  But every approach had potential 
    problems with eliminating some plausibly reasonable output address 
    formats.  I finally concluded that S4 should be totally general, and the 
    reduction of ambiguity should be handled in the mailer specific rulesets. 

    There is a down side to this complete generality.  It is theoretically 
    possible to present an internal format address to ruleset #4, such that 
    the output is completely uninterpretable.  I doubt that such addresses 
    will arise naturally, and in any case the mailer specific rulesets are 
    designed to eliminate most of the problems that could permit such 
    addresses.

    About the only ambiguity I allowed to remain after the mailer specific 
    rulesets is that between the precedence of '%' and '!' in a mixed address. 
 
    I allowed some residual ambiguity there because, as I commented above, the 
    original assumptions when first parsing the input address may have been 
    incorrect. 

5.  Residual complexity. 

    Most of the remaining complexity is in rulesets #4, #7, #9, and #19.  
    Complexity is unavoidable in rulesets #4, #7 and #9, which deal with 
    conversion between internal and external forms. 

    The complexity of ruleset #19 is caused by the combination of two 
    factors: my wish to retain full domain addresses as far as possible, 
    replacing them by UUCP names only in ruleset #4; and my decision to fully 
    follow the logic of the original IDA in using the UUCPXTABLE lookup as one 
    criterion in the decision to continue conversion to ! formatting. 

    Actually ruleset #19 (and its continuation in ruleset #20, or was it #21) 
    was one of the places where there was a logic error in the original IDA.  
    It never really used UUCPXTABLE the way the documentation claimed.  I have 
    a sneaking suspicion that all of the UUCPXTABLE dependent code in ruleset 
    #19 should be eliminated, for the sake of a reduction in complexity.  Now 
    that the unnecessary conversion back and forth between '!' and '%' formats 
    is eliminated, the original need for this code has probably disappeared.
DataMuseum.dk

DKUUG/EUUG Conference tapes

⟦8cce5addf⟧ TextFile

Derivation

TextFile