|
DataMuseum.dkPresents historical artifacts from the history of: DKUUG/EUUG Conference tapes |
This is an automatic "excavation" of a thematic subset of
See our Wiki for more about DKUUG/EUUG Conference tapes Excavated with: AutoArchaeologist - Free & Open Source Software. |
top - metrics - downloadIndex: T g
Length: 48221 (0xbc5d) Types: TextFile Names: »gawk-info-5«
└─⟦9ae75bfbd⟧ Bits:30007242 EUUGD3: Starter Kit └─⟦f133efdaf⟧ »EurOpenD3/gnu/gawk/gawk-doc-2.11.tar.Z« └─⟦a05ed705a⟧ Bits:30007078 DKUUG GNU 2/12/89 └─⟦f133efdaf⟧ »./gawk-doc-2.11.tar.Z« └─⟦8f64183b0⟧ └─⟦this⟧ »gawk-2.11-doc/gawk-info-5«
Info file gawk-info, produced by Makeinfo, -*- Text -*- from input file gawk.texinfo. This file documents `awk', a program that you can use to select particular records in a file and perform operations upon them. Copyright (C) 1989 Free Software Foundation, Inc. Permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and this permission notice are preserved on all copies. Permission is granted to copy and distribute modified versions of this manual under the conditions for verbatim copying, provided that the entire resulting derived work is distributed under the terms of a permission notice identical to this one. Permission is granted to copy and distribute translations of this manual into another language, under the above conditions for modified versions, except that this permission notice may be stated in a translation approved by the Foundation. ▶1f◀ File: gawk-info, Node: String Functions, Next: I/O Functions, Prev: Numeric Functions, Up: Built-in Built-in Functions for String Manipulation ========================================== The functions in this section look at the text of one or more strings. `index(IN, FIND)' This searches the string IN for the first occurrence of the string FIND, and returns the position where that occurrence begins in the string IN. For example: awk 'BEGIN { print index("peanut", "an") }' prints `3'. If FIND is not found, `index' returns 0. `length(STRING)' This gives you the number of characters in STRING. If STRING is a number, the length of the digit string representing that number is returned. For example, `length("abcde")' is 5. By contrast, `length(15 * 35)' works out to 3. How? Well, 15 * 35 = 525, and 525 is then converted to the string `"525"', which has three characters. If no argument is supplied, `length' returns the length of `$0'. `match(STRING, REGEXP)' The `match' function searches the string, STRING, for the longest, leftmost substring matched by the regular expression, REGEXP. It returns the character position, or "index", of where that substring begins (1, if it starts at the beginning of STRING). If no match if found, it returns 0. The `match' function sets the built-in variable `RSTART' to the index. It also sets the built-in variable `RLENGTH' to the length of the matched substring. If no match is found, `RSTART' is set to 0, and `RLENGTH' to -1. For example: awk '{ if ($1 == "FIND") regex = $2 else { where = match($0, regex) if (where) print "Match of", regex, "found at", where, "in", $0 } }' This program looks for lines that match the regular expression stored in the variable `regex'. This regular expression can be changed. If the first word on a line is `FIND', `regex' is changed to be the second word on that line. Therefore, given: FIND fo*bar My program was a foobar But none of it would doobar FIND Melvin JF+KM This line is property of The Reality Engineering Co. This file created by Melvin. `awk' prints: Match of fo*bar found at 18 in My program was a foobar Match of Melvin found at 26 in This file created by Melvin. `split(STRING, ARRAY, FIELDSEP)' This divides STRING up into pieces separated by FIELDSEP, and stores the pieces in ARRAY. The first piece is stored in `ARRAY[1]', the second piece in `ARRAY[2]', and so forth. The string value of the third argument, FIELDSEP, is used as a regexp to search for to find the places to split STRING. If the FIELDSEP is omitted, the value of `FS' is used. `split' returns the number of elements created. The `split' function, then, splits strings into pieces in a manner similar to the way input lines are split into fields. For example: split("auto-da-fe", a, "-") splits the string `auto-da-fe' into three fields using `-' as the separator. It sets the contents of the array `a' as follows: a[1] = "auto" a[2] = "da" a[3] = "fe" The value returned by this call to `split' is 3. `sprintf(FORMAT, EXPRESSION1,...)' This returns (without printing) the string that `printf' would have printed out with the same arguments (*note Printf::.). For example: sprintf("pi = %.2f (approx.)", 22/7) returns the string `"pi = 3.14 (approx.)"'. `sub(REGEXP, REPLACEMENT, TARGET)' The `sub' function alters the value of TARGET. It searches this value, which should be a string, for the leftmost substring matched by the regular expression, REGEXP, extending this match as far as possible. Then the entire string is changed by replacing the matched text with REPLACEMENT. The modified string becomes the new value of TARGET. This function is peculiar because TARGET is not simply used to compute a value, and not just any expression will do: it must be a variable, field or array reference, so that `sub' can store a modified value there. If this argument is omitted, then the default is to use and alter `$0'. For example: str = "water, water, everywhere" sub(/at/, "ith", str) sets `str' to `"wither, water, everywhere"', by replacing the leftmost, longest occurrence of `at' with `ith'. The `sub' function returns the number of substitutions made (either one or zero). If the special character `&' appears in REPLACEMENT, it stands for the precise substring that was matched by REGEXP. (If the regexp can match more than one string, then this precise substring may vary.) For example: awk '{ sub(/candidate/, "& and his wife"); print }' changes the first occurrence of `candidate' to `candidate and his wife' on each input line. The effect of this special character can be turned off by putting a backslash before it in the string. As usual, to insert one backslash in the string, you must write two backslashes. Therefore, write `\\&' in a string constant to include a literal `&' in the replacement. For example, here is how to replace the first `|' on each line with an `&': awk '{ sub(/\|/, "\\&"); print }' *Note:* as mentioned above, the third argument to `sub' must be an lvalue. Some versions of `awk' allow the third argument to be an expression which is not an lvalue. In such a case, `sub' would still search for the pattern and return 0 or 1, but the result of the substitution (if any) would be thrown away because there is no place to put it. Such versions of `awk' accept expressions like this: sub(/USA/, "United States", "the USA and Canada") But that is considered erroneous in `gawk'. `gsub(REGEXP, REPLACEMENT, TARGET)' This is similar to the `sub' function, except `gsub' replaces *all* of the longest, leftmost, *nonoverlapping* matching substrings it can find. The `g' in `gsub' stands for ``global'', which means replace everywhere. For example: awk '{ gsub(/Britain/, "United Kingdom"); print }' replaces all occurrences of the string `Britain' with `United Kingdom' for all input records. The `gsub' function returns the number of substitutions made. If the variable to be searched and altered, TARGET, is omitted, then the entire input record, `$0', is used. As in `sub', the characters `&' and `\' are special, and the third argument must be an lvalue. `substr(STRING, START, LENGTH)' This returns a LENGTH-character-long substring of STRING, starting at character number START. The first character of a string is character number one. For example, `substr("washington", 5, 3)' returns `"ing"'. If LENGTH is not present, this function returns the whole suffix of STRING that begins at character number START. For example, `substr("washington", 5)' returns `"ington"'. `tolower(STRING)' This returns a copy of STRING, with each upper-case character in the string replaced with its corresponding lower-case character. Nonalphabetic characters are left unchanged. For example, `tolower("MiXeD cAsE 123")' returns `"mixed case 123"'. `toupper(STRING)' This returns a copy of STRING, with each lower-case character in the string replaced with its corresponding upper-case character. Nonalphabetic characters are left unchanged. For example, `toupper("MiXeD cAsE 123")' returns `"MIXED CASE 123"'. ▶1f◀ File: gawk-info, Node: I/O Functions, Prev: String Functions, Up: Built-in Built-in Functions For Input/Output =================================== `close(FILENAME)' Close the file FILENAME, for input or output. The argument may alternatively be a shell command that was used for redirecting to or from a pipe; then the pipe is closed. *Note Close Input::, regarding closing input files and pipes. *Note Close Output::, regarding closing output files and pipes. `system(COMMAND)' The system function allows the user to execute operating system commands and then return to the `awk' program. The `system' function executes the command given by the string COMMAND. It returns, as its value, the status returned by the command that was executed. For example, if the following fragment of code is put in your `awk' program: END { system("mail -s 'awk run done' operator < /dev/null") } the system operator will be sent mail when the `awk' program finishes processing input and begins its end-of-input processing. Note that much the same result can be obtained by redirecting `print' or `printf' into a pipe. However, if your `awk' program is interactive, `system' is useful for cranking up large self-contained programs, such as a shell or an editor. Some operating systems cannot implement the `system' function. `system' causes a fatal error if it is not supported. ▶1f◀ File: gawk-info, Node: User-defined, Next: Built-in Variables, Prev: Built-in, Up: Top User-defined Functions ********************** Complicated `awk' programs can often be simplified by defining your own functions. User-defined functions can be called just like built-in ones (*note Function Calls::.), but it is up to you to define them--to tell `awk' what they should do. * Menu: * Definition Syntax:: How to write definitions and what they mean. * Function Example:: An example function definition and what it does. * Function Caveats:: Things to watch out for. * Return Statement:: Specifying the value a function returns. ▶1f◀ File: gawk-info, Node: Definition Syntax, Next: Function Example, Prev: User-defined, Up: User-defined Syntax of Function Definitions ============================== Definitions of functions can appear anywhere between the rules of the `awk' program. Thus, the general form of an `awk' program is extended to include sequences of rules *and* user-defined function definitions. The definition of a function named NAME looks like this: function NAME (PARAMETER-LIST) { BODY-OF-FUNCTION } The keyword `function' may be abbreviated `func'. NAME is the name of the function to be defined. A valid function name is like a valid variable name: a sequence of letters, digits and underscores, not starting with a digit. PARAMETER-LIST is a list of the function's arguments and local variable names, separated by commas. When the function is called, the argument names are used to hold the argument values given in the call. The local variables are initialized to the null string. The BODY-OF-FUNCTION consists of `awk' statements. It is the most important part of the definition, because it says what the function should actually *do*. The argument names exist to give the body a way to talk about the arguments; local variables, to give the body places to keep temporary values. Argument names are not distinguished syntactically from local variable names; instead, the number of arguments supplied when the function is called determines how many argument variables there are. Thus, if three argument values are given, the first three names in PARAMETER-LIST are arguments, and the rest are local variables. It follows that if the number of arguments is not the same in all calls to the function, some of the names in PARAMETER-LIST may be arguments on some occasions and local variables on others. Another way to think of this is that omitted arguments default to the null string. Usually when you write a function you know how many names you intend to use for arguments and how many you intend to use as locals. By convention, you should write an extra space between the arguments and the locals, so that other people can follow how your function is supposed to be used. During execution of the function body, the arguments and local variable values hide or "shadow" any variables of the same names used in the rest of the program. The shadowed variables are not accessible in the function definition, because there is no way to name them while their names have been taken away for the local variables. All other variables used in the `awk' program can be referenced or set normally in the function definition. The arguments and local variables last only as long as the function body is executing. Once the body finishes, the shadowed variables come back. The function body can contain expressions which call functions. They can even call this function, either directly or by way of another function. When this happens, we say the function is "recursive". There is no need in `awk' to put the definition of a function before all uses of the function. This is because `awk' reads the entire program before starting to execute any of it. ▶1f◀ File: gawk-info, Node: Function Example, Next: Function Caveats, Prev: Definition Syntax, Up: User-defined Function Definition Example =========================== Here is an example of a user-defined function, called `myprint', that takes a number and prints it in a specific format. function myprint(num) { printf "%6.3g\n", num } To illustrate, here is an `awk' rule which uses our `myprint' function: $3 > 0 { myprint($3) } This program prints, in our special format, all the third fields that contain a positive number in our input. Therefore, when given: 1.2 3.4 5.6 7.8 9.10 11.12 13.14 15.16 17.18 19.20 21.22 23.24 this program, using our function to format the results, prints: 5.6 13.1 21.2 Here is a rather contrived example of a recursive function. It prints a string backwards: function rev (str, len) { if (len == 0) { printf "\n" return } printf "%c", substr(str, len, 1) rev(str, len - 1) } ▶1f◀ File: gawk-info, Node: Function Caveats, Next: Return Statement, Prev: Function Example, Up: User-defined Calling User-defined Functions ============================== "Calling a function" means causing the function to run and do its job. A function call is an expression, and its value is the value returned by the function. A function call consists of the function name followed by the arguments in parentheses. What you write in the call for the arguments are `awk' expressions; each time the call is executed, these expressions are evaluated, and the values are the actual arguments. For example, here is a call to `foo' with three arguments: foo(x y, "lose", 4 * z) *Note:* whitespace characters (spaces and tabs) are not allowed between the function name and the open-parenthesis of the argument list. If you write whitespace by mistake, `awk' might think that you mean to concatenate a variable with an expression in parentheses. However, it notices that you used a function name and not a variable name, and reports an error. When a function is called, it is given a *copy* of the values of its arguments. This is called "call by value". The caller may use a variable as the expression for the argument, but the called function does not know this: all it knows is what value the argument had. For example, if you write this code: foo = "bar" z = myfunc(foo) then you should not think of the argument to `myfunc' as being ``the variable `foo'''. Instead, think of the argument as the string value, `"bar"'. If the function `myfunc' alters the values of its local variables, this has no effect on any other variables. In particular, if `myfunc' does this: function myfunc (win) { print win win = "zzz" print win } to change its first argument variable `win', this *does not* change the value of `foo' in the caller. The role of `foo' in calling `myfunc' ended when its value, `"bar"', was computed. If `win' also exists outside of `myfunc', the function body cannot alter this outer value, because it is shadowed during the execution of `myfunc' and cannot be seen or changed from there. However, when arrays are the parameters to functions, they are *not* copied. Instead, the array itself is made available for direct manipulation by the function. This is usually called "call by reference". Changes made to an array parameter inside the body of a function *are* visible outside that function. *This can be very dangerous if you don't watch what you are doing.* For example: function changeit (array, ind, nvalue) { array[ind] = nvalue } BEGIN { a[1] = 1 ; a[2] = 2 ; a[3] = 3 changeit(a, 2, "two") printf "a[1] = %s, a[2] = %s, a[3] = %s\n", a[1], a[2], a[3] } prints `a[1] = 1, a[2] = two, a[3] = 3', because calling `changeit' stores `"two"' in the second element of `a'. ▶1f◀ File: gawk-info, Node: Return Statement, Prev: Function Caveats, Up: User-defined The `return' Statement ====================== The body of a user-defined function can contain a `return' statement. This statement returns control to the rest of the `awk' program. It can also be used to return a value for use in the rest of the `awk' program. It looks like this: return EXPRESSION The EXPRESSION part is optional. If it is omitted, then the returned value is undefined and, therefore, unpredictable. A `return' statement with no value expression is assumed at the end of every function definition. So if control reaches the end of the function definition, then the function returns an unpredictable value. Here is an example of a user-defined function that returns a value for the largest number among the elements of an array: function maxelt (vec, i, ret) { for (i in vec) { if (ret == "" || vec[i] > ret) ret = vec[i] } return ret } You call `maxelt' with one argument, an array name. The local variables `i' and `ret' are not intended to be arguments; while there is nothing to stop you from passing two or three arguments to `maxelt', the results would be strange. The extra space before `i' in the function parameter list is to indicate that `i' and `ret' are not supposed to be arguments. This is a convention which you should follow when you define functions. Here is a program that uses our `maxelt' function. It loads an array, calls `maxelt', and then reports the maximum number in that array: awk ' function maxelt (vec, i, ret) { for (i in vec) { if (ret == "" || vec[i] > ret) ret = vec[i] } return ret } # Load all fields of each record into nums. { for(i = 1; i <= NF; i++) nums[NR, i] = $i } END { print maxelt(nums) }' Given the following input: 1 5 23 8 16 44 3 5 2 8 26 256 291 1396 2962 100 -6 467 998 1101 99385 11 0 225 our program tells us (predictably) that: 99385 is the largest number in our array. ▶1f◀ File: gawk-info, Node: Built-in Variables, Next: Command Line, Prev: User-defined, Up: Top Built-in Variables ****************** Most `awk' variables are available for you to use for your own purposes; they never change except when your program assigns them, and never affect anything except when your program examines them. A few variables have special built-in meanings. Some of them `awk' examines automatically, so that they enable you to tell `awk' how to do certain things. Others are set automatically by `awk', so that they carry information from the internal workings of `awk' to your program. This chapter documents all the built-in variables of `gawk'. Most of them are also documented in the chapters where their areas of activity are described. * Menu: * User-modified:: Built-in variables that you change to control `awk'. * Auto-set:: Built-in variables where `awk' gives you information. ▶1f◀ File: gawk-info, Node: User-modified, Next: Auto-set, Prev: Built-in Variables, Up: Built-in Variables Built-in Variables That Control `awk' ===================================== This is a list of the variables which you can change to control how `awk' does certain things. `FS' `FS' is the input field separator (*note Field Separators::.). The value is a regular expression that matches the separations between fields in an input record. The default value is `" "', a string consisting of a single space. As a special exception, this value actually means that any sequence of spaces and tabs is a single separator. It also causes spaces and tabs at the beginning or end of a line to be ignored. You can set the value of `FS' on the command line using the `-F' option: awk -F, 'PROGRAM' INPUT-FILES `IGNORECASE' If `IGNORECASE' is nonzero, then *all* regular expression matching is done in a case-independent fashion. In particular, regexp matching with `~' and `!~', and the `gsub' `index', `match', `split' and `sub' functions all ignore case when doing their particular regexp operations. *Note:* since field splitting with the value of the `FS' variable is also a regular expression operation, that too is done with case ignored. *Note Case-sensitivity::. If `gawk' is in compatibility mode (*note Command Line::.), then `IGNORECASE' has no special meaning, and regexp operations are always case-sensitive. `OFMT' This string is used by `awk' to control conversion of numbers to strings (*note Conversion::.). It works by being passed, in effect, as the first argument to the `sprintf' function. Its default value is `"%.6g"'. `OFS' This is the output field separator (*note Output Separators::.). It is output between the fields output by a `print' statement. Its default value is `" "', a string consisting of a single space. `ORS' This is the output record separator. It is output at the end of every `print' statement. Its default value is a string containing a single newline character, which could be written as `"\n"'. (*Note Output Separators::). `RS' This is `awk''s record separator. Its default value is a string containing a single newline character, which means that an input record consists of a single line of text. (*Note Records::.) `SUBSEP' `SUBSEP' is a subscript separator. It has the default value of `"\034"', and is used to separate the parts of the name of a multi-dimensional array. Thus, if you access `foo[12,3]', it really accesses `foo["12\0343"]'. (*Note Multi-dimensional::). ▶1f◀ File: gawk-info, Node: Auto-set, Prev: User-modified, Up: Built-in Variables Built-in Variables That Convey Information to You ================================================= This is a list of the variables that are set automatically by `awk' on certain occasions so as to provide information for your program. `ARGC' `ARGV' The command-line arguments available to `awk' are stored in an array called `ARGV'. `ARGC' is the number of command-line arguments present. `ARGV' is indexed from zero to `ARGC - 1'. *Note Command Line::. For example: awk '{ print ARGV[$1] }' inventory-shipped BBS-list In this example, `ARGV[0]' contains `"awk"', `ARGV[1]' contains `"inventory-shipped"', and `ARGV[2]' contains `"BBS-list"'. The value of `ARGC' is 3, one more than the index of the last element in `ARGV' since the elements are numbered from zero. Notice that the `awk' program is not entered in `ARGV'. The other special command line options, with their arguments, are also not entered. But variable assignments on the command line *are* treated as arguments, and do show up in the `ARGV' array. Your program can alter `ARGC' and the elements of `ARGV'. Each time `awk' reaches the end of an input file, it uses the next element of `ARGV' as the name of the next input file. By storing a different string there, your program can change which files are read. You can use `"-"' to represent the standard input. By storing additional elements and incrementing `ARGC' you can cause additional files to be read. If you decrease the value of `ARGC', that eliminates input files from the end of the list. By recording the old value of `ARGC' elsewhere, your program can treat the eliminated arguments as something other than file names. To eliminate a file from the middle of the list, store the null string (`""') into `ARGV' in place of the file's name. As a special feature, `awk' ignores file names that have been replaced with the null string. `ENVIRON' This is an array that contains the values of the environment. The array indices are the environment variable names; the values are the values of the particular environment variables. For example, `ENVIRON["HOME"]' might be `/u/close'. Changing this array does not affect the environment passed on to any programs that `awk' may spawn via redirection or the `system' function. (In a future version of `gawk', it may do so.) Some operating systems may not have environment variables. On such systems, the array `ENVIRON' is empty. `FILENAME' This is the name of the file that `awk' is currently reading. If `awk' is reading from the standard input (in other words, there are no files listed on the command line), `FILENAME' is set to `"-"'. `FILENAME' is changed each time a new file is read (*note Reading Files::.). `FNR' `FNR' is the current record number in the current file. `FNR' is incremented each time a new record is read (*note Getline::.). It is reinitialized to 0 each time a new input file is started. `NF' `NF' is the number of fields in the current input record. `NF' is set each time a new record is read, when a new field is created, or when `$0' changes (*note Fields::.). `NR' This is the number of input records `awk' has processed since the beginning of the program's execution. (*note Records::.). `NR' is set each time a new record is read. `RLENGTH' `RLENGTH' is the length of the substring matched by the `match' function (*note String Functions::.). `RLENGTH' is set by invoking the `match' function. Its value is the length of the matched string, or -1 if no match was found. `RSTART' `RSTART' is the start-index of the substring matched by the `match' function (*note String Functions::.). `RSTART' is set by invoking the `match' function. Its value is the position of the string where the matched substring starts, or 0 if no match was found. ▶1f◀ File: gawk-info, Node: Command Line, Next: Language History, Prev: Built-in Variables, Up: Top Invocation of `awk' ******************* There are two ways to run `awk': with an explicit program, or with one or more program files. Here are templates for both of them; items enclosed in `[...]' in these templates are optional. awk [`-FFS'] [`-v VAR=VAL'] [`-V'] [`-C'] [`-c'] [`-a'] [`-e'] [`--'] 'PROGRAM' FILE ... awk [`-FFS'] `-f SOURCE-FILE' [`-f SOURCE-FILE ...'] [`-v VAR=VAL'] [`-V'] [`-C'] [`-c'] [`-a'] [`-e'] [`--'] FILE ... * Menu: * Options:: Command line options and their meanings. * Other Arguments:: Input file names and variable assignments. * AWKPATH Variable:: Searching directories for `awk' programs. ▶1f◀ File: gawk-info, Node: Options, Next: Other Arguments, Prev: Command Line, Up: Command Line Command Line Options ==================== Options begin with a minus sign, and consist of a single character. The options and their meanings are as follows: `-FFS' Sets the `FS' variable to FS (*note Field Separators::.). `-f SOURCE-FILE' Indicates that the `awk' program is to be found in SOURCE-FILE instead of in the first non-option argument. `-v VAR=VAL' Sets the variable VAR to the value VAL *before* execution of the program begins. Such variable values are available inside the `BEGIN' rule (see below for a fuller explanation). The `-v' option only has room to set one variable, but you can use it more than once, setting another variable each time, like this: `-v foo=1 -v bar=2'. `-a' Specifies use of traditional `awk' syntax for regular expressions. This means that `\' can be used to quote any regular expression operators inside of square brackets, just as it can be outside of them. This mode is currently the default; the `-a' option is useful in shell scripts so that they will not break if the default is changed. *Note Regexp Operators::. `-e' Specifies use of `egrep' syntax for regular expressions. This means that `\' does not serve as a quoting character inside of square brackets; ideosyncratic techniques are needed to include various special characters within them. This mode may become the default at some time in the future. *Note Regexp Operators::. `-c' Specifies "compatibility mode", in which the GNU extensions in `gawk' are disabled, so that `gawk' behaves just like Unix `awk'. These extensions are noted below, where their usage is explained. *Note Compatibility Mode::. `-V' Prints version information for this particular copy of `gawk'. This is so you can determine if your copy of `gawk' is up to date with respect to whatever the Free Software Foundation is currently distributing. This option may disappear in a future version of `gawk'. `-C' Prints the short version of the General Public License. This option may disappear in a future version of `gawk'. `--' Signals the end of the command line options. The following arguments are not treated as options even if they begin with `-'. This interpretation of `--' follows the POSIX argument parsing conventions. This is useful if you have file names that start with `-', or in shell scripts, if you have file names that will be specified by the user and that might start with `-'. Any other options are flagged as invalid with a warning message, but are otherwise ignored. In compatibility mode, as a special case, if the value of FS supplied to the `-F' option is `t', then `FS' is set to the tab character (`"\t"'). Also, the `-C' and `-V' options are not recognized. If the `-f' option is *not* used, then the first non-option command line argument is expected to be the program text. The `-f' option may be used more than once on the command line. Then `awk' reads its program source from all of the named files, as if they had been concatenated together into one big file. This is useful for creating libraries of `awk' functions. Useful functions can be written once, and then retrieved from a standard place, instead of having to be included into each individual program. You can still type in a program at the terminal and use library functions, by specifying `-f /dev/tty'. `awk' will read a file from the terminal to use as part of the `awk' program. After typing your program, type `Control-d' (the end-of-file character) to terminate it. ▶1f◀ File: gawk-info, Node: Other Arguments, Next: AWKPATH Variable, Prev: Options, Up: Command Line Other Command Line Arguments ============================ Any additional arguments on the command line are normally treated as input files to be processed in the order specified. However, an argument that has the form `VAR=VALUE', means to assign the value VALUE to the variable VAR--it does not specify a file at all. All these arguments are made available to your `awk' program in the `ARGV' array (*note Built-in Variables::.). Command line options and the program text (if present) are omitted from the `ARGV' array. All other arguments, including variable assignments, are included. The distinction between file name arguments and variable-assignment arguments is made when `awk' is about to open the next input file. At that point in execution, it checks the ``file name'' to see whether it is really a variable assignment; if so, `awk' sets the variable instead of reading a file. Therefore, the variables actually receive the specified values after all previously specified files have been read. In particular, the values of variables assigned in this fashion are *not* available inside a `BEGIN' rule (*note BEGIN/END::.), since such rules are run before `awk' begins scanning the argument list. In some earlier implementations of `awk', when a variable assignment occurred before any file names, the assignment would happen *before* the `BEGIN' rule was executed. Some applications came to depend upon this ``feature''. When `awk' was changed to be more consistent, the `-v' option was added to accomodate applications that depended upon this old behaviour. The variable assignment feature is most useful for assigning to variables such as `RS', `OFS', and `ORS', which control input and output formats, before scanning the data files. It is also useful for controlling state if multiple passes are needed over a data file. For example: awk 'pass == 1 { PASS 1 STUFF } pass == 2 { PASS 2 STUFF }' pass=1 datafile pass=2 datafile ▶1f◀ File: gawk-info, Node: AWKPATH Variable, Prev: Other Arguments, Up: Command Line The `AWKPATH' Environment Variable ================================== The previous section described how `awk' program files can be named on the command line with the `-f' option. In some `awk' implementations, you must supply a precise path name for each program file, unless the file is in the current directory. But in `gawk', if the file name supplied in the `-f' option does not contain a `/', then `gawk' searches a list of directories (called the "search path"), one by one, looking for a file with the specified name. The search path is actually a string containing directory names separated by colons. `gawk' gets its search path from the `AWKPATH' environment variable. If that variable does not exist, `gawk' uses the default path, which is `.:/usr/lib/awk:/usr/local/lib/awk'. The search path feature is particularly useful for building up libraries of useful `awk' functions. The library files can be placed in a standard directory that is in the default path, and then specified on the command line with a short file name. Otherwise, the full file name would have to be typed for each file. Path searching is not done if `gawk' is in compatibility mode. *Note Command Line::. *Note:* if you want files in the current directory to be found, you must include the current directory in the path, either by writing `.' as an entry in the path, or by writing a null entry in the path. (A null entry is indicated by starting or ending the path with a colon, or by placing two colons next to each other (`::').) If the current directory is not included in the path, then files cannot be found in the current directory. This path search mechanism is identical to the shell's. ▶1f◀ File: gawk-info, Node: Language History, Next: Gawk Summary, Prev: Command Line, Up: Top The Evolution of the `awk' Language *********************************** This manual describes the GNU implementation of `awk', which is patterned after the System V Release 4 version. Many `awk' users are only familiar with the original `awk' implementation in Version 7 Unix, which is also the basis for the version in Berkeley Unix. This chapter briefly describes the evolution of the `awk' language. * Menu: * V7/S5R3.1:: The major changes between V7 and System V Release 3.1. * S5R4:: The minor changes between System V Releases 3.1 and 4. * S5R4/GNU:: The extensions in `gawk' not in System V Release 4. ▶1f◀ File: gawk-info, Node: V7/S5R3.1, Next: S5R4, Prev: Language History, Up: Language History Major Changes Between V7 and S5R3.1 =================================== The `awk' language evolved considerably between the release of Version 7 Unix (1978) and the new version first made widely available in System V Release 3.1 (1987). This section summarizes the changes, with cross-references to further details. * The requirement for `;' to separate rules on a line (*note Statements/Lines::.). * User-defined functions, and the `return' statement (*note User-defined::.). * The `delete' statement (*note Delete::.). * The `do'-`while' statement (*note Do Statement::.). * The built-in functions `atan2', `cos', `sin', `rand' and `srand' (*note Numeric Functions::.). * The built-in functions `gsub', `sub', and `match' (*note String Functions::.). * The built-in functions `close' and `system' (*note I/O Functions::.). * The `ARGC', `ARGV', `FNR', `RLENGTH', `RSTART', and `SUBSEP' built-in variables (*note Built-in Variables::.). * The conditional expression using the operators `?' and `:' (*note Conditional Exp::.). * The exponentiation operator `^' (*note Arithmetic Ops::.) and its assignment operator form `^=' (*note Assignment Ops::.). * C-compatible operator precedence, which breaks some old `awk' programs (*note Precedence::.). * Regexps as the value of `FS' (*note Field Separators::.), or as the third argument to the `split' function (*note String Functions::.). * Dynamic regexps as operands of the `~' and `!~' operators (*note Regexp Usage::.). * Escape sequences (*note Constants::.) in regexps. * The escape sequences `\b', `\f', and `\r' (*note Constants::.). * Redirection of input for the `getline' function (*note Getline::.). * Multiple `BEGIN' and `END' rules (*note BEGIN/END::.). * Simulation of multidimensional arrays (*note Multi-dimensional::.). ▶1f◀ File: gawk-info, Node: S5R4, Next: S5R4/GNU, Prev: V7/S5R3.1, Up: Language History Minor Changes between S5R3.1 and S5R4 ===================================== The System V Release 4 version of Unix `awk' added these features: * The `ENVIRON' variable (*note Built-in Variables::.). * Multiple `-f' options on the command line (*note Command Line::.). * The `-v' option for assigning variables before program execution begins (*note Command Line::.). * The `--' option for terminating command line options. * The `\a', `\v', and `\x' escape sequences (*note Constants::.). * A defined return value for the `srand' built-in function (*note Numeric Functions::.). * The `toupper' and `tolower' built-in string functions for case translation (*note String Functions::.). * A cleaner specification for the `%c' format-control letter in the `printf' function (*note Printf::.). * The use of constant regexps such as `/foo/' as expressions, where they are equivalent to use of the matching operator, as in `$0 ~ /foo/'. ▶1f◀ File: gawk-info, Node: S5R4/GNU, Prev: S5R4, Up: Language History Extensions In `gawk' Not In S5R4 ================================ The GNU implementation, `gawk', adds these features: * The `AWKPATH' environment variable for specifying a path search for the `-f' command line option (*note Command Line::.). * The `-C' and `-V' command line options (*note Command Line::.). * The `IGNORECASE' variable and its effects (*note Case-sensitivity::.). * The `/dev/stdin', `/dev/stdout', `/dev/stderr', and `/dev/fd/N' file name interpretation (*note Special Files::.). * The `-c' option to turn off these extensions (*note Command Line::.). * The `-a' and `-e' options to specify the syntax of regular expressions that `gawk' will accept (*note Command Line::.). ▶1f◀ File: gawk-info, Node: Gawk Summary, Next: Sample Program, Prev: Language History, Up: Top `gawk' Summary ************** This appendix provides a brief summary of the `gawk' command line and the `awk' language. It is designed to serve as ``quick reference.'' It is therefore terse, but complete. * Menu: * Command Line Summary:: Recapitulation of the command line. * Language Summary:: A terse review of the language. * Variables/Fields:: Variables, fields, and arrays. * Rules Summary:: Patterns and Actions, and their component parts. * Functions Summary:: Defining and calling functions. ▶1f◀ File: gawk-info, Node: Command Line Summary, Next: Language Summary, Prev: Gawk Summary, Up: Gawk Summary Command Line Options Summary ============================ The command line consists of options to `gawk' itself, the `awk' program text (if not supplied via the `-f' option), and values to be made available in the `ARGC' and `ARGV' predefined `awk' variables: awk [`-FFS'] [`-v VAR=VAL'] [`-V'] [`-C'] [`-c'] [`-a'] [`-e'] [`--'] 'PROGRAM' FILE ... awk [`-FFS'] `-f SOURCE-FILE' [`-f SOURCE-FILE ...'] [`-v VAR=VAL'] [`-V'] [`-C'] [`-c'] [`-a'] [`-e'] [`--'] FILE ... The options that `gawk' accepts are: `-FFS' Use FS for the input field separator (the value of the `FS' predefined variable). `-f PROGRAM-FILE' Read the `awk' program source from the file PROGRAM-FILE, instead of from the first command line argument. `-v VAR=VAL' Assign the variable VAR the value VAL before program execution begins. `-a' Specifies use of traditional `awk' syntax for regular expressions. This means that `\' can be used to quote regular expression operators inside of square brackets, just as it can be outside of them. `-e' Specifies use of `egrep' syntax for regular expressions. This means that `\' does not serve as a quoting character inside of square brackets. `-c' Specifies compatibility mode, in which `gawk' extensions are turned off. `-V' Print version information for this particular copy of `gawk' on the error output. This option may disappear in a future version of `gawk'. `-C' Print the short version of the General Public License on the error output. This option may disappear in a future version of `gawk'. `--' Signal the end of options. This is useful to allow further arguments to the `awk' program itself to start with a `-'. This is mainly for consistency with the argument parsing conventions of POSIX. Any other options are flagged as invalid, but are otherwise ignored. *Note Command Line::, for more details. ▶1f◀ File: gawk-info, Node: Language Summary, Next: Variables/Fields, Prev: Command Line Summary, Up: Gawk Summary Language Summary ================ An `awk' program consists of a sequence of pattern-action statements and optional function definitions. PATTERN { ACTION STATEMENTS } function NAME(PARAMETER LIST) { ACTION STATEMENTS } `gawk' first reads the program source from the PROGRAM-FILE(s) if specified, or from the first non-option argument on the command line. The `-f' option may be used multiple times on the command line. `gawk' reads the program text from all the PROGRAM-FILE files, effectively concatenating them in the order they are specified. This is useful for building libraries of `awk' functions, without having to include them in each new `awk' program that uses them. To use a library function in a file from a program typed in on the command line, specify `-f /dev/tty'; then type your program, and end it with a `C-d'. *Note Command Line::. The environment variable `AWKPATH' specifies a search path to use when finding source files named with the `-f' option. If the variable `AWKPATH' is not set, `gawk' uses the default path, `.:/usr/lib/awk:/usr/local/lib/awk'. If a file name given to the `-f' option contains a `/' character, no path search is performed. *Note AWKPATH Variable::, for a full description of the `AWKPATH' environment variable. `gawk' compiles the program into an internal form, and then proceeds to read each file named in the `ARGV' array. If there are no files named on the command line, `gawk' reads the standard input. If a ``file'' named on the command line has the form `VAR=VAL', it is treated as a variable assignment: the variable VAR is assigned the value VAL. For each line in the input, `gawk' tests to see if it matches any PATTERN in the `awk' program. For each pattern that the line matches, the associated ACTION is executed. ▶1f◀ File: gawk-info, Node: Variables/Fields, Next: Rules Summary, Prev: Language Summary, Up: Gawk Summary Variables and Fields ==================== `awk' variables are dynamic; they come into existence when they are first used. Their values are either floating-point numbers or strings. `awk' also has one-dimension arrays; multiple-dimensional arrays may be simulated. There are several predefined variables that `awk' sets as a program runs; these are summarized below. * Menu: * Fields Summary:: Input field splitting. * Built-in Summary:: `awk''s built-in variables. * Arrays Summary:: Using arrays. * Data Type Summary:: Values in `awk' are numbers or strings. ▶1f◀ File: gawk-info, Node: Fields Summary, Next: Built-in Summary, Prev: Variables/Fields, Up: Variables/Fields Fields ------ As each input line is read, `gawk' splits the line into FIELDS, using the value of the `FS' variable as the field separator. If `FS' is a single character, fields are separated by that character. Otherwise, `FS' is expected to be a full regular expression. In the special case that `FS' is a single blank, fields are separated by runs of blanks and/or tabs. Note that the value of `IGNORECASE' (*note Case-sensitivity::.) also affects how fields are split when `FS' is a regular expression. Each field in the input line may be referenced by its position, `$1', `$2', and so on. `$0' is the whole line. The value of a field may be assigned to as well. Field numbers need not be constants: n = 5 print $n prints the fifth field in the input line. The variable `NF' is set to the total number of fields in the input line. References to nonexistent fields (i.e., fields after `$NF') return the null-string. However, assigning to a nonexistent field (e.g., `$(NF+2) = 5') increases the value of `NF', creates any intervening fields with the null string as their value, and causes the value of `$0' to be recomputed, with the fields being separated by the value of `OFS'. *Note Reading Files::, for a full description of the way `awk' defines and uses fields.