|
DataMuseum.dkPresents historical artifacts from the history of: DKUUG/EUUG Conference tapes |
This is an automatic "excavation" of a thematic subset of
See our Wiki for more about DKUUG/EUUG Conference tapes Excavated with: AutoArchaeologist - Free & Open Source Software. |
top - metrics - downloadIndex: T w
Length: 7544 (0x1d78) Types: TextFile Names: »wc.w«
└─⟦52210d11f⟧ Bits:30007239 EUUGD2: TeX 3 1992-12 └─⟦63303ae94⟧ »unix3.14/TeX3.14.tar.Z« └─⟦c58930e5c⟧ └─⟦this⟧ »TeX3.14/CWEB/examples/wc.w«
\nocon % omit table of contents @* An example of {\tt CWEB}. This example presents the ``word count'' program from \UNIX. It was rewritten in \.{CWEB} to demonstrate literate programming in \Cee. The level of detail is high for didactic purposes in this document; many of the things spelled out here don't have to be explained in other programs. The purpose of \.{wc} is to count lines, words, and/or characters in a list of files. The number of characters is the file length in bytes; the number of lines is the number of newline characters in the file. A ``word'' is a maximal sequence of consecutive characters other than newline, space, or tab, containing at least one visible ASCII code. (We assume that the standard ASCII code is in use.) @ Most \.{CWEB} programs share a common structure. It's probably a good idea to have one module explicitly stating this common structure, even though the functions could all be introduced in unnamed \.{WEB} modules if they don't need to appear in any special order. @c @<Global |#include|s@>@/ @<Global variables@>@/ @<Functions@>@/ @<The main program@> @ We must include the standard I/O definitions to use formatted output to |stdout| and |stderr|. @<Global |#in...@>= #include <stdio.h> @ The |status| variable tells the operating system if the run was successful or not, and |prog_name| is used in case there's an error message to be printed. @d OK 0 /* |status| code for successful run */ @d usage_error 1 /* |status| code for improper syntax */ @d cannot_open_file 2 /* |status| code for file access error */ @<Global variables@>= int status=OK; /* exit status of command, initially |OK| */ char *prog_name; /* who we are */ @ Now we come to the general layout of the |main| function. @<The main...@>= main (argc,argv) int argc; /* number of arguments on the \UNIX\ command line */ char **argv; /* the arguments themselves, an array of strings */ { @<Variables local to |main|@>@; prog_name=argv[0]; @<Set up option selection@>; @<Process all the files@>; @<Print the grand totals if there were multiple files @>; exit(status); } @ If the first argument begins with a \.{'-'} the user is choosing the desired counts and specifying the order in which they should be displayed. Each selection is given by the initial character (lines, words, or characters). We do not process this string now. It is sufficient just to suppress unwanted figures at output time. @<Var...@>= int file_count; /* how many files there are */ char *which; /* which counts to print */ @ @<Set up o...@>= which="lwc"; /* if no option is given, print all three values */ if (argc>1 && *argv[1] == '-') { which=++argv[1]; argc--; argv++; } file_count=argc-1; @ Now we scan the remaining arguments and try to open a file, if possible. The file is processed and its statistics are given. We use the |do| \dots\ |while| loop because we should read from the standard input if no file name is given. @<Process...@>= argc--; do { @<If a file is given try to open |*(++argv)|; |continue| if unsuccesful@>; @<Initialize pointers and counters@>; @<Scan file@>; @<Write statistics for file@>; @<Close file@>; @<Update grand totals@>; /* even if there is only one file */ } while (--argc>0); @ Here's the code to open the file. A special trick allows us to handle input from |stdin| when no name is given. Recall that the file descriptor to |stdin| is 0, so that's what we initialize our file descriptor to. @<Variabl...@>= int fd=0; /* file descriptor, initialized to |stdin| */ @ @d READ_ONLY 0 /* read access code for system |open| routine */ @<If a fi...@>= if (file_count>0 && (fd=open(*(++argv),READ_ONLY))<0) { fprintf (stderr, "%s: cannot open file %s\n", prog_name, *argv); @.cannot open file@> status|=cannot_open_file; file_count--; continue; } @ @<Close file@>= close(fd); @ We will do some homemade buffering in order to speed things up: Characters will be read into the |buffer| array before we process them. To do this we set up appropriate pointers and counters. @d buf_size BUFSIZ /* \.{stdio.h}'s |BUFSIZ| is chosen for efficiency*/ @<Var...@>= char buffer[buf_size]; /* we read the input into this */ register char *ptr, *buf_end; /* pointers into |buffer| */ register int c; /* character read or |EOF| */ int in_word; /* are we within a word? */ long word_count, line_count, char_count; /* number of words, lines, and characters in a file */ @ @<Init...@>= ptr=buf_end=buffer; line_count=word_count=char_count=0; in_word=0; @ The grand totals must be initialized to zero at the beginning of the program. If we made these variables local to |main|, we would have to do this initialization explicitly; however, \Cee's globals are automatically zeroed. (Or rather, `statically zeroed'.) (Get it?) @<Global var...@>= long tot_word_count, tot_line_count, tot_char_count; /* total number of words, lines and chars */ @ @<Scan...@>= while (1) { @<Fill |buffer| if it is empty; |break| at end of file@>; c=*ptr++; if (c>' ' && c<0177) { /* visible ASCII codes */ if (!in_word) {word_count++; in_word++;} continue; } if (c=='\n') line_count++; else if (c!=' ' && c!='\t') continue; in_word=0; /* |c| is newline, space, or tab */ } @ Using buffered I/O makes it very easy to count the number of characters, almost for free. @<Fill |buff...@>= if (ptr>=buf_end) { ptr=buffer; c=read(fd,ptr,buf_size); if (c<=0) break; char_count+=c; buf_end=buffer+c; } @ It's convenient to output the statistics by defining a new function |wc_print|; then the same function can be used for the totals. Additionally we must decide here if we know the name of the file we have processed or if it was just |stdin|. @<Write...@>= wc_print(which, char_count, word_count, line_count); if (file_count) printf (" %s\n", *argv); else printf ("\n"); @ @<Upda...@>= tot_line_count+=line_count; tot_word_count+=word_count; tot_char_count+=char_count; @ We might as well improve a bit on \UNIX's \.{wc} by counting the files too. @<Print the...@>= if (file_count>1) { wc_print(which, tot_char_count, tot_word_count, tot_line_count); printf(" total in %d files\n",file_count); } @ Here now is the function that prints the values according to the specified options. If an invalid option character is found we inform the reader about proper usage of the command. Counts are printed in 8-digit fields so that they will line up in columns. @d print_count(n) printf("%8ld",n) @<Fun...@>= wc_print(which, char_count, word_count, line_count) char *which; /* which counts to print */ long char_count, word_count, line_count; /* given totals */ { while (*which) switch (*which++) { case 'l': print_count(line_count); break; case 'w': print_count(word_count); break; case 'c': print_count(char_count); break; default: if ((status & usage_error)==0) { fprintf (stderr, "usage: %s [-lwc] [filename ...]\n", prog_name); @.usage: ...@> status|=usage_error; } } } @ Incidentally, a test of this program against the system \.{wc} command on a SPARCstation showed that the ``official'' \.{wc} was slower. Furthermore, although that \.{wc} gave an appropriate error message for the options `\.{-abc}', it made no complaints about the options `\.{-labc}'! Perhaps the system routine would have been better if its programmer had been more literate? @* Index. Here is a list of the identifiers used, and where they appear. Underlined entries indicate the place of definition. Error messages are also shown.