DataMuseum.dk

Presents historical artifacts from the history of:

DKUUG/EUUG Conference tapes

This is an automatic "excavation" of a thematic subset of
artifacts from Datamuseum.dk's BitArchive.

See our Wiki for more about DKUUG/EUUG Conference tapes

Excavated with: AutoArchaeologist - Free & Open Source Software.


top - metrics - download
Index: T m

⟦00eae3d30⟧ TextFile

    Length: 2400 (0x960)
    Types: TextFile
    Names: »markov3.6«

Derivation

└─⟦b20c6495f⟧ Bits:30007238 EUUGD18: Wien-båndet, efterår 1987
    └─⟦this⟧ »EUUGD18/General/Article/markov3.6« 

TextFile

.\" markov3
.TH MARKOV3 6 "17 Feb 1987"
.UC 4
.SH NAME
markov3 \- Digest and spit out quasi-random Usenet articles
.SH SYNOPSIS
.B markov3
[
.B \-pv
] [
.B \-n
.I n_articles
] [
.B \-d
.I dumpfile
] [
.B \-s
.I seed
files
.SH DESCRIPTION
.PP
.I Markov3
digests Usenet articles and builds an internal data structure that
models the articles as if they came from a random process, where
each word is determined by the previous two.  It then emits a series
of articles on the standard output that have the same distribution
of words, word pairs, and word triplets as do the input files.
The name
.I markov3
comes from the fact that this structure is called a Markov chain,
and that the statistics for word triplets are modeled.
Here, a "word" is a sequence of printable characters surrounded by
whitespace.  Paragraph breaks (blank lines) are also treated as a
"word".
.PP
By default, the program expects to be fed Usenet articles; it strips
off headers, included text, and signatures (or at least it tries).
The
.B \-p
(plain) option disables the header-stripping feature (otherwise
everything is skipped until a blank line is encountered).
.PP
By default, 10 articles, separated by form feeds, are written on the
standard output.  The
.B \-n
option lets you specify a different number.
.PP
The
.B \-d
(dump) option dumps a representation of the internal data structure
built by
.I markov3
on the named file.
.PP
Finally, the
.B \-v
(verbose)
option prints some statistics on the standard error.
.SH "CAVEATS"
This program allocates lots of memory if given large amounts of input.
On virtual memory systems, the paging behavior is atrocious because
pointers tend to point every which way, and many pointers are dereferenced
for every word processed.  This could be improved, I'm sure.
.PP
Posting articles generated by
.I markov3
to the net may be hazardous to your health.
.PP
Not as smart as Mark V. Shaney.
.SH "PORTABILITY"
An effort has been made to make this program as portable as possible;
however, you need lex(1).  It uses the rand(3) function for random
number generation; look here if the output seems too predictable.  If
there is a problem, you'll just have to redo the function roll(n),
which generates a random integer between 0 and n-1.  I'm not certain
how portable rand(3) is.  
.PP
If you don't have lex, you'll need to rewrite the lexical analyzer
but most of the program is in C.