|
|
DataMuseum.dkPresents historical artifacts from the history of: DKUUG/EUUG Conference tapes |
This is an automatic "excavation" of a thematic subset of
See our Wiki for more about DKUUG/EUUG Conference tapes Excavated with: AutoArchaeologist - Free & Open Source Software. |
top - metrics - downloadIndex: C T
Length: 6874 (0x1ada)
Types: TextFile
Names: »COMMENT«
└─⟦9ae75bfbd⟧ Bits:30007242 EUUGD3: Starter Kit
└─⟦bfebc70e2⟧ »EurOpenD3/mail/sendmail-5.65b+IDA-1.4.3.tar.Z«
└─⟦f9e35cd84⟧
└─⟦this⟧ »sendmail/ida/cf/COMMENT«
Some comments on my rewrite of Sendmail.mc .
Neil Rickert,
rickert@cs.niu.edu
Thu, 7 Jun 90
(Note: These are ex post facto comments, and do not necessarily reflect my
thinking during the early part of the design phase.)
1. Data structures.
It has long been my philosophy that the three most important ingredients
of good programming are: Data structures, data structures, and data
strucures. I think I came up with a pretty good data structure for
representing addresses.
You can essentially think of the internal address as being represented as
an array, or more accurately as an array implementation of a pushdown
stack. The bottom element of the stack is the 'user' identifier in the
final destination domain. Otherwise each element is a pair consisting of
a domain (the '@domain' portion) and an addressing format flag (the ',' or
':' or '!'). The majority of the .cf file deals with the top of the
stack. For example, you can think of the PATHTABLE lookup as popping the
top from the stack and then pushing the domains of its pathalias lookup
value.
The mailer specific rulesets, of course, must use the fact that the stack
is implemented as an array, so that they can visit each entry and modify
the flag.
There is, of course, one more overall flag, which I implemented with the
choice of either '@' or '%' as the leading character for the primary
domain. Think of this as an overall flag to distinguish between sender
and recipient addresses. This was added when I realized I was pushing the
limit of the 30 allowable rulesets, and I didn't wish to require code
changes if it could be avoided. It was pretty obvious that the mailer
specific rulesets for sender and recipient addresses were usually very
similar, so this flag idea makes sense. The flag idea could easily have
been extended to also distinguish between header and envelope addresses,
but I felt that would be overreaching for the present.
2. Logic.
In my (admittedly biased) opinion, the internal logic is, in the main,
simpler than in the original IDA. The original package was made overly
complex by the continual reformatting from a bang path to a source route
to %-path, and then possibly converting it back again. Part of my
motivation in doing this rewrite was my frustration with the complexity of
the original IDA. Almost every time I studied it closely I found another
logic error, usually caused by making unwarranted assumptions in the
process of conversion of an address between formats. It is fortunate that
most of these logic errors rarely caused serious problems.
It is much simpler to retain a consistent internal representation, and
work mostly by changing the 'addressing format flag'.
Most of the remaining complexity is because the internal representation as
an array of domains must be crudely simulated in a tokenized character
string. Except for the mailer specific rulesets, which need considerable
flexibility, this would have been much easier to implement in code than in
the replacement rulesets. A code version would probably be more reliable,
too. Paul mentioned that Berkeley is working on a sendmail replacement.
Perhaps they should look at this config, together with these comments, for
one possible approach to address formatting.
3. Dealing with ambiguous addresses.
I largely followed the interpretation of Lennart Lovestrand in the
processing of input addresses. In particular in an address containing
'@', '!' and '%', I gave '%' a higher precedence than '!' (unless the
address originates at a UUCP source and STRICTLY822 is NOT defined).
Thus the address 'c!u%b@a' was converted to the internal form:
@a, @b: @c! u
(where the spacing is for readability, and <> are omitted).
It is entirely possible that on some occasions this is incorrect, and the
address should have been interpreted as:
@a, @c! @b: u
I permitted sufficient residual ambiguity in many of the rulesets so that,
when finally processed by ruleset 4, either of these internal forms would
still finish up as 'c!u%b@a' in the final output address.
4. Ruleset #4.
This ruleset is now in its fourth, and I hope final, rewrite. In each of
my earlier versions I made some attempt to limit the degree of ambiguity
allowed in the resultant address. But every approach had potential
problems with eliminating some plausibly reasonable output address
formats. I finally concluded that S4 should be totally general, and the
reduction of ambiguity should be handled in the mailer specific rulesets.
There is a down side to this complete generality. It is theoretically
possible to present an internal format address to ruleset #4, such that
the output is completely uninterpretable. I doubt that such addresses
will arise naturally, and in any case the mailer specific rulesets are
designed to eliminate most of the problems that could permit such
addresses.
About the only ambiguity I allowed to remain after the mailer specific
rulesets is that between the precedence of '%' and '!' in a mixed address.
I allowed some residual ambiguity there because, as I commented above, the
original assumptions when first parsing the input address may have been
incorrect.
5. Residual complexity.
Most of the remaining complexity is in rulesets #4, #7, #9, and #19.
Complexity is unavoidable in rulesets #4, #7 and #9, which deal with
conversion between internal and external forms.
The complexity of ruleset #19 is caused by the combination of two
factors: my wish to retain full domain addresses as far as possible,
replacing them by UUCP names only in ruleset #4; and my decision to fully
follow the logic of the original IDA in using the UUCPXTABLE lookup as one
criterion in the decision to continue conversion to ! formatting.
Actually ruleset #19 (and its continuation in ruleset #20, or was it #21)
was one of the places where there was a logic error in the original IDA.
It never really used UUCPXTABLE the way the documentation claimed. I have
a sneaking suspicion that all of the UUCPXTABLE dependent code in ruleset
#19 should be eliminated, for the sake of a reduction in complexity. Now
that the unnecessary conversion back and forth between '!' and '%' formats
is eliminated, the original need for this code has probably disappeared.