|
DataMuseum.dkPresents historical artifacts from the history of: DKUUG/EUUG Conference tapes |
This is an automatic "excavation" of a thematic subset of
See our Wiki for more about DKUUG/EUUG Conference tapes Excavated with: AutoArchaeologist - Free & Open Source Software. |
top - metrics - downloadIndex: T m
Length: 2400 (0x960) Types: TextFile Names: »markov3.6«
└─⟦b20c6495f⟧ Bits:30007238 EUUGD18: Wien-båndet, efterår 1987 └─⟦this⟧ »EUUGD18/General/Article/markov3.6«
.\" markov3 .TH MARKOV3 6 "17 Feb 1987" .UC 4 .SH NAME markov3 \- Digest and spit out quasi-random Usenet articles .SH SYNOPSIS .B markov3 [ .B \-pv ] [ .B \-n .I n_articles ] [ .B \-d .I dumpfile ] [ .B \-s .I seed files .SH DESCRIPTION .PP .I Markov3 digests Usenet articles and builds an internal data structure that models the articles as if they came from a random process, where each word is determined by the previous two. It then emits a series of articles on the standard output that have the same distribution of words, word pairs, and word triplets as do the input files. The name .I markov3 comes from the fact that this structure is called a Markov chain, and that the statistics for word triplets are modeled. Here, a "word" is a sequence of printable characters surrounded by whitespace. Paragraph breaks (blank lines) are also treated as a "word". .PP By default, the program expects to be fed Usenet articles; it strips off headers, included text, and signatures (or at least it tries). The .B \-p (plain) option disables the header-stripping feature (otherwise everything is skipped until a blank line is encountered). .PP By default, 10 articles, separated by form feeds, are written on the standard output. The .B \-n option lets you specify a different number. .PP The .B \-d (dump) option dumps a representation of the internal data structure built by .I markov3 on the named file. .PP Finally, the .B \-v (verbose) option prints some statistics on the standard error. .SH "CAVEATS" This program allocates lots of memory if given large amounts of input. On virtual memory systems, the paging behavior is atrocious because pointers tend to point every which way, and many pointers are dereferenced for every word processed. This could be improved, I'm sure. .PP Posting articles generated by .I markov3 to the net may be hazardous to your health. .PP Not as smart as Mark V. Shaney. .SH "PORTABILITY" An effort has been made to make this program as portable as possible; however, you need lex(1). It uses the rand(3) function for random number generation; look here if the output seems too predictable. If there is a problem, you'll just have to redo the function roll(n), which generates a random integer between 0 and n-1. I'm not certain how portable rand(3) is. .PP If you don't have lex, you'll need to rewrite the lexical analyzer but most of the program is in C.