|
DataMuseum.dkPresents historical artifacts from the history of: DKUUG/EUUG Conference tapes |
This is an automatic "excavation" of a thematic subset of
See our Wiki for more about DKUUG/EUUG Conference tapes Excavated with: AutoArchaeologist - Free & Open Source Software. |
top - metrics - downloadIndex: T g
Length: 49904 (0xc2f0) Types: TextFile Names: »gawk-info-3«
└─⟦9ae75bfbd⟧ Bits:30007242 EUUGD3: Starter Kit └─⟦f133efdaf⟧ »EurOpenD3/gnu/gawk/gawk-doc-2.11.tar.Z« └─⟦a05ed705a⟧ Bits:30007078 DKUUG GNU 2/12/89 └─⟦f133efdaf⟧ »./gawk-doc-2.11.tar.Z« └─⟦8f64183b0⟧ └─⟦this⟧ »gawk-2.11-doc/gawk-info-3«
Info file gawk-info, produced by Makeinfo, -*- Text -*- from input file gawk.texinfo. This file documents `awk', a program that you can use to select particular records in a file and perform operations upon them. Copyright (C) 1989 Free Software Foundation, Inc. Permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and this permission notice are preserved on all copies. Permission is granted to copy and distribute modified versions of this manual under the conditions for verbatim copying, provided that the entire resulting derived work is distributed under the terms of a permission notice identical to this one. Permission is granted to copy and distribute translations of this manual into another language, under the above conditions for modified versions, except that this permission notice may be stated in a translation approved by the Foundation. ▶1f◀ File: gawk-info, Node: Patterns, Next: Actions, Prev: One-liners, Up: Top Patterns ******** Patterns in `awk' control the execution of rules: a rule is executed when its pattern matches the current input record. This chapter tells all about how to write patterns. * Menu: * Kinds of Patterns:: A list of all kinds of patterns. The following subsections describe them in detail. * Empty:: The empty pattern, which matches every record. * Regexp:: Regular expressions such as `/foo/'. * Comparison Patterns:: Comparison expressions such as `$1 > 10'. * Boolean Patterns:: Combining comparison expressions. * Expression Patterns:: Any expression can be used as a pattern. * Ranges:: Using pairs of patterns to specify record ranges. * BEGIN/END:: Specifying initialization and cleanup rules. ▶1f◀ File: gawk-info, Node: Kinds of Patterns, Next: Empty, Prev: Patterns, Up: Patterns Kinds of Patterns ================= Here is a summary of the types of patterns supported in `awk'. `/REGULAR EXPRESSION/' A regular expression as a pattern. It matches when the text of the input record fits the regular expression. (*Note Regular Expressions as Patterns: Regexp.) `EXPRESSION' A single expression. It matches when its value, converted to a number, is nonzero (if a number) or nonnull (if a string). (*Note Expression Patterns::.) `PAT1, PAT2' A pair of patterns separated by a comma, specifying a range of records. (*Note Specifying Record Ranges With Patterns: Ranges.) `BEGIN' `END' Special patterns to supply start-up or clean-up information to `awk'. (*Note BEGIN/END::.) `NULL' The empty pattern matches every input record. (*Note The Empty Pattern: Empty.) ▶1f◀ File: gawk-info, Node: Empty, Next: Regexp, Prev: Kinds of Patterns, Up: Patterns The Empty Pattern ================= An empty pattern is considered to match *every* input record. For example, the program: awk '{ print $1 }' BBS-list prints just the first field of every record. ▶1f◀ File: gawk-info, Node: Regexp, Next: Comparison Patterns, Prev: Empty, Up: Patterns Regular Expressions as Patterns =============================== A "regular expression", or "regexp", is a way of describing a class of strings. A regular expression enclosed in slashes (`/') is an `awk' pattern that matches every input record whose text belongs to that class. The simplest regular expression is a sequence of letters, numbers, or both. Such a regexp matches any string that contains that sequence. Thus, the regexp `foo' matches any string containing `foo'. Therefore, the pattern `/foo/' matches any input record containing `foo'. Other kinds of regexps let you specify more complicated classes of strings. * Menu: * Usage: Regexp Usage. How regexps are used in patterns. * Operators: Regexp Operators. How to write a regexp. * Case-sensitivity:: How to do case-insensitive matching. ▶1f◀ File: gawk-info, Node: Regexp Usage, Next: Regexp Operators, Prev: Regexp, Up: Regexp How to Use Regular Expressions ------------------------------ A regular expression can be used as a pattern by enclosing it in slashes. Then the regular expression is matched against the entire text of each record. (Normally, it only needs to match some part of the text in order to succeed.) For example, this prints the second field of each record that contains `foo' anywhere: awk '/foo/ { print $2 }' BBS-list Regular expressions can also be used in comparison expressions. Then you can specify the string to match against; it need not be the entire current input record. These comparison expressions can be used as patterns or in `if' and `while' statements. `EXP ~ /REGEXP/' This is true if the expression EXP (taken as a character string) is matched by REGEXP. The following example matches, or selects, all input records with the upper-case letter `J' somewhere in the first field: awk '$1 ~ /J/' inventory-shipped So does this: awk '{ if ($1 ~ /J/) print }' inventory-shipped `EXP !~ /REGEXP/' This is true if the expression EXP (taken as a character string) is *not* matched by REGEXP. The following example matches, or selects, all input records whose first field *does not* contain the upper-case letter `J': awk '$1 !~ /J/' inventory-shipped The right hand side of a `~' or `!~' operator need not be a constant regexp (i.e., a string of characters between slashes). It may be any expression. The expression is evaluated, and converted if necessary to a string; the contents of the string are used as the regexp. A regexp that is computed in this way is called a "dynamic regexp". For example: identifier_regexp = "[A-Za-z_][A-Za-z_0-9]+" $0 ~ identifier_regexp sets `identifier_regexp' to a regexp that describes `awk' variable names, and tests if the input record matches this regexp. ▶1f◀ File: gawk-info, Node: Regexp Operators, Next: Case-sensitivity, Prev: Regexp Usage, Up: Regexp Regular Expression Operators ---------------------------- You can combine regular expressions with the following characters, called "regular expression operators", or "metacharacters", to increase the power and versatility of regular expressions. Here is a table of metacharacters. All characters not listed in the table stand for themselves. `^' This matches the beginning of the string or the beginning of a line within the string. For example: ^@chapter matches the `@chapter' at the beginning of a string, and can be used to identify chapter beginnings in Texinfo source files. `$' This is similar to `^', but it matches only at the end of a string or the end of a line within the string. For example: p$ matches a record that ends with a `p'. `.' This matches any single character except a newline. For example: .P matches any single character followed by a `P' in a string. Using concatenation we can make regular expressions like `U.A', which matches any three-character sequence that begins with `U' and ends with `A'. `[...]' This is called a "character set". It matches any one of the characters that are enclosed in the square brackets. For example: [MVX] matches any of the characters `M', `V', or `X' in a string. Ranges of characters are indicated by using a hyphen between the beginning and ending characters, and enclosing the whole thing in brackets. For example: [0-9] matches any digit. To include the character `\', `]', `-' or `^' in a character set, put a `\' in front of it. For example: [d\]] matches either `]', or `d'. This treatment of `\' is compatible with other `awk' implementations but incompatible with the proposed POSIX specification for `awk'. The current draft specifies the use of the same syntax used in `egrep'. We may change `gawk' to fit the standard, once we are sure it will no longer change. For the meanwhile, the `-a' option specifies the traditional `awk' syntax described above (which is also the default), while the `-e' option specifies `egrep' syntax. *Note Options::. In `egrep' syntax, backslash is not syntactically special within square brackets. This means that special tricks have to be used to represent the characters `]', `-' and `^' as members of a character set. To match `-', write it as `--', which is a range containing only `-'. You may also give `-' as the first or last character in the set. To match `^', put it anywhere except as the first character of a set. To match a `]', make it the first character in the set. For example: []d^] matches either `]', `d' or `^'. `[^ ...]' This is a "complemented character set". The first character after the `[' *must* be a `^'. It matches any characters *except* those in the square brackets. For example: [^0-9] matches any character that is not a digit. `|' This is the "alternation operator" and it is used to specify alternatives. For example: ^P|[0-9] matches any string that matches either `^P' or `[0-9]'. This means it matches any string that contains a digit or starts with `P'. The alternation applies to the largest possible regexps on either side. `(...)' Parentheses are used for grouping in regular expressions as in arithmetic. They can be used to concatenate regular expressions containing the alternation operator, `|'. `*' This symbol means that the preceding regular expression is to be repeated as many times as possible to find a match. For example: ph* applies the `*' symbol to the preceding `h' and looks for matches to one `p' followed by any number of `h's. This will also match just `p' if no `h's are present. The `*' repeats the *smallest* possible preceding expression. (Use parentheses if you wish to repeat a larger expression.) It finds as many repetitions as possible. For example: awk '/\(c[ad][ad]*r x\)/ { print }' sample prints every record in the input containing a string of the form `(car x)', `(cdr x)', `(cadr x)', and so on. `+' This symbol is similar to `*', but the preceding expression must be matched at least once. This means that: wh+y would match `why' and `whhy' but not `wy', whereas `wh*y' would match all three of these strings. This is a simpler way of writing the last `*' example: awk '/\(c[ad]+r x\)/ { print }' sample `?' This symbol is similar to `*', but the preceding expression can be matched once or not at all. For example: fe?d will match `fed' or `fd', but nothing else. `\' This is used to suppress the special meaning of a character when matching. For example: \$ matches the character `$'. The escape sequences used for string constants (*note Constants::.) are valid in regular expressions as well; they are also introduced by a `\'. In regular expressions, the `*', `+', and `?' operators have the highest precedence, followed by concatenation, and finally by `|'. As in arithmetic, parentheses can change how operators are grouped. ▶1f◀ File: gawk-info, Node: Case-sensitivity, Prev: Regexp Operators, Up: Regexp Case-sensitivity in Matching ---------------------------- Case is normally significant in regular expressions, both when matching ordinary characters (i.e., not metacharacters), and inside character sets. Thus a `w' in a regular expression matches only a lower case `w' and not an upper case `W'. The simplest way to do a case-independent match is to use a character set: `[Ww]'. However, this can be cumbersome if you need to use it often; and it can make the regular expressions harder for humans to read. There are two other alternatives that you might prefer. One way to do a case-insensitive match at a particular point in the program is to convert the data to a single case, using the `tolower' or `toupper' built-in string functions (which we haven't discussed yet; *note String Functions::.). For example: tolower($1) ~ /foo/ { ... } converts the first field to lower case before matching against it. Another method is to set the variable `IGNORECASE' to a nonzero value (*note Built-in Variables::.). When `IGNORECASE' is not zero, *all* regexp operations ignore case. Changing the value of `IGNORECASE' dynamically controls the case sensitivity of your program as it runs. Case is significant by default because `IGNORECASE' (like most variables) is initialized to zero. x = "aB" if (x ~ /ab/) ... # this test will fail IGNORECASE = 1 if (x ~ /ab/) ... # now it will succeed You cannot generally use `IGNORECASE' to make certain rules case-insensitive and other rules case-sensitive, because there is no way to set `IGNORECASE' just for the pattern of a particular rule. To do this, you must use character sets or `tolower'. However, one thing you can do only with `IGNORECASE' is turn case-sensitivity on or off dynamically for all the rules at once. `IGNORECASE' can be set on the command line, or in a `BEGIN' rule. Setting `IGNORECASE' from the command line is a way to make a program case-insensitive without having to edit it. The value of `IGNORECASE' has no effect if `gawk' is in compatibility mode (*note Command Line::.). Case is always significant in compatibility mode. ▶1f◀ File: gawk-info, Node: Comparison Patterns, Next: Boolean Patterns, Prev: Regexp, Up: Patterns Comparison Expressions as Patterns ================================== "Comparison patterns" test relationships such as equality between two strings or numbers. They are a special case of expression patterns (*note Expression Patterns::.). They are written with "relational operators", which are a superset of those in C. Here is a table of them: `X < Y' True if X is less than Y. `X <= Y' True if X is less than or equal to Y. `X > Y' True if X is greater than Y. `X >= Y' True if X is greater than or equal to Y. `X == Y' True if X is equal to Y. `X != Y' True if X is not equal to Y. `X ~ Y' True if X matches the regular expression described by Y. `X !~ Y' True if X does not match the regular expression described by Y. The operands of a relational operator are compared as numbers if they are both numbers. Otherwise they are converted to, and compared as, strings (*note Conversion::.). Strings are compared by comparing the first character of each, then the second character of each, and so on, until there is a difference. If the two strings are equal until the shorter one runs out, the shorter one is considered to be less than the longer one. Thus, `"10"' is less than `"9"'. The left operand of the `~' and `!~' operators is a string. The right operand is either a constant regular expression enclosed in slashes (`/REGEXP/'), or any expression, whose string value is used as a dynamic regular expression (*note Regexp Usage::.). The following example prints the second field of each input record whose first field is precisely `foo'. awk '$1 == "foo" { print $2 }' BBS-list Contrast this with the following regular expression match, which would accept any record with a first field that contains `foo': awk '$1 ~ "foo" { print $2 }' BBS-list or, equivalently, this one: awk '$1 ~ /foo/ { print $2 }' BBS-list ▶1f◀ File: gawk-info, Node: Boolean Patterns, Next: Expression Patterns, Prev: Comparison Patterns, Up: Patterns Boolean Operators and Patterns ============================== A "boolean pattern" is an expression which combines other patterns using the "boolean operators" ``or'' (`||'), ``and'' (`&&'), and ``not'' (`!'). Whether the boolean pattern matches an input record depends on whether its subpatterns match. For example, the following command prints all records in the input file `BBS-list' that contain both `2400' and `foo'. awk '/2400/ && /foo/' BBS-list The following command prints all records in the input file `BBS-list' that contain *either* `2400' or `foo', or both. awk '/2400/ || /foo/' BBS-list The following command prints all records in the input file `BBS-list' that do *not* contain the string `foo'. awk '! /foo/' BBS-list Note that boolean patterns are a special case of expression patterns (*note Expression Patterns::.); they are expressions that use the boolean operators. For complete information on the boolean operators, see *Note Boolean Ops::. The subpatterns of a boolean pattern can be constant regular expressions, comparisons, or any other `gawk' expressions. Range patterns are not expressions, so they cannot appear inside boolean patterns. Likewise, the special patterns `BEGIN' and `END', which never match any input record, are not expressions and cannot appear inside boolean patterns. ▶1f◀ File: gawk-info, Node: Expression Patterns, Next: Ranges, Prev: Boolean Patterns, Up: Patterns Expressions as Patterns ======================= Any `awk' expression is valid also as a pattern in `gawk'. Then the pattern ``matches'' if the expression's value is nonzero (if a number) or nonnull (if a string). The expression is reevaluated each time the rule is tested against a new input record. If the expression uses fields such as `$1', the value depends directly on the new input record's text; otherwise, it depends only on what has happened so far in the execution of the `awk' program, but that may still be useful. Comparison patterns are actually a special case of this. For example, the expression `$5 == "foo"' has the value 1 when the value of `$5' equals `"foo"', and 0 otherwise; therefore, this expression as a pattern matches when the two values are equal. Boolean patterns are also special cases of expression patterns. A constant regexp as a pattern is also a special case of an expression pattern. `/foo/' as an expression has the value 1 if `foo' appears in the current input record; thus, as a pattern, `/foo/' matches any record containing `foo'. Other implementations of `awk' are less general than `gawk': they allow comparison expressions, and boolean combinations thereof (optionally with parentheses), but not necessarily other kinds of expressions. ▶1f◀ File: gawk-info, Node: Ranges, Next: BEGIN/END, Prev: Expression Patterns, Up: Patterns Specifying Record Ranges With Patterns ====================================== A "range pattern" is made of two patterns separated by a comma, of the form `BEGPAT, ENDPAT'. It matches ranges of consecutive input records. The first pattern BEGPAT controls where the range begins, and the second one ENDPAT controls where it ends. For example, awk '$1 == "on", $1 == "off"' prints every record between `on'/`off' pairs, inclusive. In more detail, a range pattern starts out by matching BEGPAT against every input record; when a record matches BEGPAT, the range pattern becomes "turned on". The range pattern matches this record. As long as it stays turned on, it automatically matches every input record read. But meanwhile, it also matches ENDPAT against every input record, and when that succeeds, the range pattern is turned off again for the following record. Now it goes back to checking BEGPAT against each record. The record that turns on the range pattern and the one that turns it off both match the range pattern. If you don't want to operate on these records, you can write `if' statements in the rule's action to distinguish them. It is possible for a pattern to be turned both on and off by the same record, if both conditions are satisfied by that record. Then the action is executed for just that record. ▶1f◀ File: gawk-info, Node: BEGIN/END, Prev: Ranges, Up: Patterns `BEGIN' and `END' Special Patterns ================================== `BEGIN' and `END' are special patterns. They are not used to match input records. Rather, they are used for supplying start-up or clean-up information to your `awk' script. A `BEGIN' rule is executed, once, before the first input record has been read. An `END' rule is executed, once, after all the input has been read. For example: awk 'BEGIN { print "Analysis of `foo'" } /foo/ { ++foobar } END { print "`foo' appears " foobar " times." }' BBS-list This program finds out how many times the string `foo' appears in the input file `BBS-list'. The `BEGIN' rule prints a title for the report. There is no need to use the `BEGIN' rule to initialize the counter `foobar' to zero, as `awk' does this for us automatically (*note Variables::.). The second rule increments the variable `foobar' every time a record containing the pattern `foo' is read. The `END' rule prints the value of `foobar' at the end of the run. The special patterns `BEGIN' and `END' cannot be used in ranges or with boolean operators. An `awk' program may have multiple `BEGIN' and/or `END' rules. They are executed in the order they appear, all the `BEGIN' rules at start-up and all the `END' rules at termination. Multiple `BEGIN' and `END' sections are useful for writing library functions, since each library can have its own `BEGIN' or `END' rule to do its own initialization and/or cleanup. Note that the order in which library functions are named on the command line controls the order in which their `BEGIN' and `END' rules are executed. Therefore you have to be careful to write such rules in library files so that it doesn't matter what order they are executed in. *Note Command Line::, for more information on using library functions. If an `awk' program only has a `BEGIN' rule, and no other rules, then the program exits after the `BEGIN' rule has been run. (Older versions of `awk' used to keep reading and ignoring input until end of file was seen.) However, if an `END' rule exists as well, then the input will be read, even if there are no other rules in the program. This is necessary in case the `END' rule checks the `NR' variable. `BEGIN' and `END' rules must have actions; there is no default action for these rules since there is no current record when they run. ▶1f◀ File: gawk-info, Node: Actions, Next: Expressions, Prev: Patterns, Up: Top Actions: Overview ***************** An `awk' "program" or "script" consists of a series of "rules" and function definitions, interspersed. (Functions are described later; see *Note User-defined::.) A rule contains a pattern and an "action", either of which may be omitted. The purpose of the action is to tell `awk' what to do once a match for the pattern is found. Thus, the entire program looks somewhat like this: [PATTERN] [{ ACTION }] [PATTERN] [{ ACTION }] ... function NAME (ARGS) { ... } ... An action consists of one or more `awk' "statements", enclosed in curly braces (`{' and `}'). Each statement specifies one thing to be done. The statements are separated by newlines or semicolons. The curly braces around an action must be used even if the action contains only one statement, or even if it contains no statements at all. However, if you omit the action entirely, omit the curly braces as well. (An omitted action is equivalent to `{ print $0 }'.) Here are the kinds of statement supported in `awk': * Expressions, which can call functions or assign values to variables (*note Expressions::.). Executing this kind of statement simply computes the value of the expression and then ignores it. This is useful when the expression has side effects (*note Assignment Ops::.). * Control statements, which specify the control flow of `awk' programs. The `awk' language gives you C-like constructs (`if', `for', `while', and so on) as well as a few special ones (*note Statements::.). * Compound statements, which consist of one or more statements enclosed in curly braces. A compound statement is used in order to put several statements together in the body of an `if', `while', `do' or `for' statement. * Input control, using the `getline' function (*note Getline::.), and the `next' statement (*note Next Statement::.). * Output statements, `print' and `printf'. *Note Printing::. * Deletion statements, for deleting array elements. *Note Delete::. ▶1f◀ File: gawk-info, Node: Expressions, Next: Statements, Prev: Actions, Up: Top Actions: Expressions ******************** Expressions are the basic building block of `awk' actions. An expression evaluates to a value, which you can print, test, store in a variable or pass to a function. But, beyond that, an expression can assign a new value to a variable or a field, with an assignment operator. An expression can serve as a statement on its own. Most other kinds of statement contain one or more expressions which specify data to be operated on. As in other languages, expressions in `awk' include variables, array references, constants, and function calls, as well as combinations of these with various operators. * Menu: * Constants:: String, numeric, and regexp constants. * Variables:: Variables give names to values for later use. * Arithmetic Ops:: Arithmetic operations (`+', `-', etc.) * Concatenation:: Concatenating strings. * Comparison Ops:: Comparison of numbers and strings with `<', etc. * Boolean Ops:: Combining comparison expressions using boolean operators `||' (``or''), `&&' (``and'') and `!' (``not''). * Assignment Ops:: Changing the value of a variable or a field. * Increment Ops:: Incrementing the numeric value of a variable. * Conversion:: The conversion of strings to numbers and vice versa. * Conditional Exp:: Conditional expressions select between two subexpressions under control of a third subexpression. * Function Calls:: A function call is an expression. * Precedence:: How various operators nest. ▶1f◀ File: gawk-info, Node: Constants, Next: Variables, Prev: Expressions, Up: Expressions Constant Expressions ==================== The simplest type of expression is the "constant", which always has the same value. There are three types of constant: numeric constants, string constants, and regular expression constants. A "numeric constant" stands for a number. This number can be an integer, a decimal fraction, or a number in scientific (exponential) notation. Note that all numeric values are represented within `awk' in double-precision floating point. Here are some examples of numeric constants, which all have the same value: 105 1.05e+2 1050e-1 A string constant consists of a sequence of characters enclosed in double-quote marks. For example: "parrot" represents the string whose contents are `parrot'. Strings in `gawk' can be of any length and they can contain all the possible 8-bit ASCII characters including ASCII NUL. Other `awk' implementations may have difficulty with some character codes. Some characters cannot be included literally in a string constant. You represent them instead with "escape sequences", which are character sequences beginning with a backslash (`\'). One use of an escape sequence is to include a double-quote character in a string constant. Since a plain double-quote would end the string, you must use `\"' to represent a single double-quote character as a part of the string. Backslash itself is another character that can't be included normally; you write `\\' to put one backslash in the string. Thus, the string whose contents are the two characters `"\' must be written `"\"\\"'. Another use of backslash is to represent unprintable characters such as newline. While there is nothing to stop you from writing most of these characters directly in a string constant, they may look ugly. Here is a table of all the escape sequences used in `awk': `\\' Represents a literal backslash, `\'. `\a' Represents the ``alert'' character, control-g, ASCII code 7. `\b' Represents a backspace, control-h, ASCII code 8. `\f' Represents a formfeed, control-l, ASCII code 12. `\n' Represents a newline, control-j, ASCII code 10. `\r' Represents a carriage return, control-m, ASCII code 13. `\t' Represents a horizontal tab, control-i, ASCII code 9. `\v' Represents a vertical tab, control-k, ASCII code 11. `\NNN' Represents the octal value NNN, where NNN are one to three digits between 0 and 7. For example, the code for the ASCII ESC (escape) character is `\033'. `\xHH...' Represents the hexadecimal value HH, where HH are hexadecimal digits (`0' through `9' and either `A' through `F' or `a' through `f'). Like the same construct in ANSI C, the escape sequence continues until the first non-hexadecimal digit is seen. However, using more than two hexadecimal digits produces undefined results. A constant regexp is a regular expression description enclosed in slashes, such as `/^beginning and end$/'. Most regexps used in `awk' programs are constant, but the `~' and `!~' operators can also match computed or ``dynamic'' regexps (*note Regexp Usage::.). Constant regexps are useful only with the `~' and `!~' operators; you cannot assign them to variables or print them. They are not truly expressions in the usual sense. ▶1f◀ File: gawk-info, Node: Variables, Next: Arithmetic Ops, Prev: Constants, Up: Expressions Variables ========= Variables let you give names to values and refer to them later. You have already seen variables in many of the examples. The name of a variable must be a sequence of letters, digits and underscores, but it may not begin with a digit. Case is significant in variable names; `a' and `A' are distinct variables. A variable name is a valid expression by itself; it represents the variable's current value. Variables are given new values with "assignment operators" and "increment operators". *Note Assignment Ops::. A few variables have special built-in meanings, such as `FS', the field separator, and `NF', the number of fields in the current input record. *Note Built-in Variables::, for a list of them. These built-in variables can be used and assigned just like all other variables, but their values are also used or changed automatically by `awk'. Each built-in variable's name is made entirely of upper case letters. Variables in `awk' can be assigned either numeric values or string values. By default, variables are initialized to the null string, which is effectively zero if converted to a number. So there is no need to ``initialize'' each variable explicitly in `awk', the way you would need to do in C or most other traditional programming languages. * Menu: * Assignment Options:: Setting variables on the command line and a summary of command line syntax. This is an advanced method of input. ▶1f◀ File: gawk-info, Node: Assignment Options, Prev: Variables, Up: Variables Assigning Variables on the Command Line --------------------------------------- You can set any `awk' variable by including a "variable assignment" among the arguments on the command line when you invoke `awk' (*note Command Line::.). Such an assignment has this form: VARIABLE=TEXT With it, you can set a variable either at the beginning of the `awk' run or in between input files. If you precede the assignment with the `-v' option, like this: -v VARIABLE=TEXT then the variable is set at the very beginning, before even the `BEGIN' rules are run. The `-v' option and its assignment must precede all the file name arguments. Otherwise, the variable assignment is performed at a time determined by its position among the input file arguments: after the processing of the preceding input file argument. For example: awk '{ print $n }' n=4 inventory-shipped n=2 BBS-list prints the value of field number `n' for all input records. Before the first file is read, the command line sets the variable `n' equal to 4. This causes the fourth field to be printed in lines from the file `inventory-shipped'. After the first file has finished, but before the second file is started, `n' is set to 2, so that the second field is printed in lines from `BBS-list'. Command line arguments are made available for explicit examination by the `awk' program in an array named `ARGV' (*note Built-in Variables::.). ▶1f◀ File: gawk-info, Node: Arithmetic Ops, Next: Concatenation, Prev: Variables, Up: Expressions Arithmetic Operators ==================== The `awk' language uses the common arithmetic operators when evaluating expressions. All of these arithmetic operators follow normal precedence rules, and work as you would expect them to. This example divides field three by field four, adds field two, stores the result into field one, and prints the resulting altered input record: awk '{ $1 = $2 + $3 / $4; print }' inventory-shipped The arithmetic operators in `awk' are: `X + Y' Addition. `X - Y' Subtraction. `- X' Negation. `X * Y' Multiplication. `X / Y' Division. Since all numbers in `awk' are double-precision floating point, the result is not rounded to an integer: `3 / 4' has the value 0.75. `X % Y' Remainder. The quotient is rounded toward zero to an integer, multiplied by Y and this result is subtracted from X. This operation is sometimes known as ``trunc-mod''. The following relation always holds: b * int(a / b) + (a % b) == a One undesirable effect of this definition of remainder is that `X % Y' is negative if X is negative. Thus, -17 % 8 = -1 In other `awk' implementations, the signedness of the remainder may be machine dependent. `X ^ Y' `X ** Y' Exponentiation: X raised to the Y power. `2 ^ 3' has the value 8. The character sequence `**' is equivalent to `^'. ▶1f◀ File: gawk-info, Node: Concatenation, Next: Comparison Ops, Prev: Arithmetic Ops, Up: Expressions String Concatenation ==================== There is only one string operation: concatenation. It does not have a specific operator to represent it. Instead, concatenation is performed by writing expressions next to one another, with no operator. For example: awk '{ print "Field number one: " $1 }' BBS-list produces, for the first record in `BBS-list': Field number one: aardvark Without the space in the string constant after the `:', the line would run together. For example: awk '{ print "Field number one:" $1 }' BBS-list produces, for the first record in `BBS-list': Field number one:aardvark Since string concatenation does not have an explicit operator, it is often necessary to insure that it happens where you want it to by enclosing the items to be concatenated in parentheses. For example, the following code fragment does not concatenate `file' and `name' as you might expect: file = "file" name = "name" print "something meaningful" > file name It is necessary to use the following: print "something meaningful" > (file name) We recommend you use parentheses around concatenation in all but the most common contexts (such as in the right-hand operand of `='). ▶1f◀ File: gawk-info, Node: Comparison Ops, Next: Boolean Ops, Prev: Concatenation, Up: Expressions Comparison Expressions ====================== "Comparison expressions" compare strings or numbers for relationships such as equality. They are written using "relational operators", which are a superset of those in C. Here is a table of them: `X < Y' True if X is less than Y. `X <= Y' True if X is less than or equal to Y. `X > Y' True if X is greater than Y. `X >= Y' True if X is greater than or equal to Y. `X == Y' True if X is equal to Y. `X != Y' True if X is not equal to Y. `X ~ Y' True if the string X matches the regexp denoted by Y. `X !~ Y' True if the string X does not match the regexp denoted by Y. `SUBSCRIPT in ARRAY' True if array ARRAY has an element with the subscript SUBSCRIPT. Comparison expressions have the value 1 if true and 0 if false. The operands of a relational operator are compared as numbers if they are both numbers. Otherwise they are converted to, and compared as, strings (*note Conversion::.). Strings are compared by comparing the first character of each, then the second character of each, and so on. Thus, `"10"' is less than `"9"'. For example, $1 == "foo" has the value of 1, or is true, if the first field of the current input record is precisely `foo'. By contrast, $1 ~ /foo/ has the value 1 if the first field contains `foo'. The right hand operand of the `~' and `!~' operators may be either a constant regexp (`/.../'), or it may be an ordinary expression, in which case the value of the expression as a string is a dynamic regexp (*note Regexp Usage::.). In very recent implementations of `awk', a constant regular expression in slashes by itself is also an expression. The regexp `/REGEXP/' is an abbreviation for this comparison expression: $0 ~ /REGEXP/ In some contexts it may be necessary to write parentheses around the regexp to avoid confusing the `gawk' parser. For example, `(/x/ - /y/) > threshold' is not allowed, but `((/x/) - (/y/)) > threshold' parses properly. One special place where `/foo/' is *not* an abbreviation for `$0 ~ /foo/' is when it is the right-hand operand of `~' or `!~'! ▶1f◀ File: gawk-info, Node: Boolean Ops, Next: Assignment Ops, Prev: Comparison Ops, Up: Expressions Boolean Expressions =================== A "boolean expression" is combination of comparison expressions or matching expressions, using the "boolean operators" ``or'' (`||'), ``and'' (`&&'), and ``not'' (`!'), along with parentheses to control nesting. The truth of the boolean expression is computed by combining the truth values of the component expressions. Boolean expressions can be used wherever comparison and matching expressions can be used. They can be used in `if' and `while' statements. They have numeric values (1 if true, 0 if false), which come into place if the result of the boolean expression is stored in a variable, or used in arithmetic. In addition, every boolean expression is also a valid boolean pattern, so you can use it as a pattern to control the execution of rules. Here are descriptions of the three boolean operators, with an example of each. It may be instructive to compare these examples with the analogous examples of boolean patterns (*note Boolean Patterns::.), which use the same boolean operators in patterns instead of expressions. `BOOLEAN1 && BOOLEAN2' True if both BOOLEAN1 and BOOLEAN2 are true. For example, the following statement prints the current input record if it contains both `2400' and `foo'. if ($0 ~ /2400/ && $0 ~ /foo/) print The subexpression BOOLEAN2 is evaluated only if BOOLEAN1 is true. This can make a difference when BOOLEAN2 contains expressions that have side effects: in the case of `$0 ~ /foo/ && ($2 == bar++)', the variable `bar' is not incremented if there is no `foo' in the record. `BOOLEAN1 || BOOLEAN2' True if at least one of BOOLEAN1 and BOOLEAN2 is true. For example, the following command prints all records in the input file `BBS-list' that contain *either* `2400' or `foo', or both. awk '{ if ($0 ~ /2400/ || $0 ~ /foo/) print }' BBS-list The subexpression BOOLEAN2 is evaluated only if BOOLEAN1 is false. This can make a difference when BOOLEAN2 contains expressions that have side effects. `!BOOLEAN' True if BOOLEAN is false. For example, the following program prints all records in the input file `BBS-list' that do *not* contain the string `foo'. awk '{ if (! ($0 ~ /foo/)) print }' BBS-list ▶1f◀ File: gawk-info, Node: Assignment Ops, Next: Increment Ops, Prev: Boolean Ops, Up: Expressions Assignment Expressions ====================== An "assignment" is an expression that stores a new value into a variable. For example, let's assign the value 1 to the variable `z': z = 1 After this expression is executed, the variable `z' has the value 1. Whatever old value `z' had before the assignment is forgotten. Assignments can store string values also. For example, this would store the value `"this food is good"' in the variable `message': thing = "food" predicate = "good" message = "this " thing " is " predicate (This also illustrates concatenation of strings.) The `=' sign is called an "assignment operator". It is the simplest assignment operator because the value of the right-hand operand is stored unchanged. Most operators (addition, concatenation, and so on) have no effect except to compute a value. If you ignore the value, you might as well not use the operator. An assignment operator is different; it does produce a value, but even if you ignore the value, the assignment still makes itself felt through the alteration of the variable. We call this a "side effect". The left-hand operand of an assignment need not be a variable (*note Variables::.); it can also be a field (*note Changing Fields::.) or an array element (*note Arrays::.). These are all called "lvalues", which means they can appear on the left-hand side of an assignment operator. The right-hand operand may be any expression; it produces the new value which the assignment stores in the specified variable, field or array element. It is important to note that variables do *not* have permanent types. The type of a variable is simply the type of whatever value it happens to hold at the moment. In the following program fragment, the variable `foo' has a numeric value at first, and a string value later on: foo = 1 print foo foo = "bar" print foo When the second assignment gives `foo' a string value, the fact that it previously had a numeric value is forgotten. An assignment is an expression, so it has a value: the same value that is assigned. Thus, `z = 1' as an expression has the value 1. One consequence of this is that you can write multiple assignments together: x = y = z = 0 stores the value 0 in all three variables. It does this because the value of `z = 0', which is 0, is stored into `y', and then the value of `y = z = 0', which is 0, is stored into `x'. You can use an assignment anywhere an expression is called for. For example, it is valid to write `x != (y = 1)' to set `y' to 1 and then test whether `x' equals 1. But this style tends to make programs hard to read; except in a one-shot program, you should rewrite it to get rid of such nesting of assignments. This is never very hard. Aside from `=', there are several other assignment operators that do arithmetic with the old value of the variable. For example, the operator `+=' computes a new value by adding the right-hand value to the old value of the variable. Thus, the following assignment adds 5 to the value of `foo': foo += 5 This is precisely equivalent to the following: foo = foo + 5 Use whichever one makes the meaning of your program clearer. Here is a table of the arithmetic assignment operators. In each case, the right-hand operand is an expression whose value is converted to a number. `LVALUE += INCREMENT' Adds INCREMENT to the value of LVALUE to make the new value of LVALUE. `LVALUE -= DECREMENT' Subtracts DECREMENT from the value of LVALUE. `LVALUE *= COEFFICIENT' Multiplies the value of LVALUE by COEFFICIENT. `LVALUE /= QUOTIENT' Divides the value of LVALUE by QUOTIENT. `LVALUE %= MODULUS' Sets LVALUE to its remainder by MODULUS. `LVALUE ^= POWER' `LVALUE **= POWER' Raises LVALUE to the power POWER. ▶1f◀ File: gawk-info, Node: Increment Ops, Next: Conversion, Prev: Assignment Ops, Up: Expressions Increment Operators =================== "Increment operators" increase or decrease the value of a variable by 1. You could do the same thing with an assignment operator, so the increment operators add no power to the `awk' language; but they are convenient abbreviations for something very common. The operator to add 1 is written `++'. It can be used to increment a variable either before or after taking its value. To pre-increment a variable V, write `++V'. This adds 1 to the value of V and that new value is also the value of this expression. The assignment expression `V += 1' is completely equivalent. Writing the `++' after the variable specifies post-increment. This increments the variable value just the same; the difference is that the value of the increment expression itself is the variable's *old* value. Thus, if `foo' has value 4, then the expression `foo++' has the value 4, but it changes the value of `foo' to 5. The post-increment `foo++' is nearly equivalent to writing `(foo += 1) - 1'. It is not perfectly equivalent because all numbers in `awk' are floating point: in floating point, `foo + 1 - 1' does not necessarily equal `foo'. But the difference is minute as long as you stick to numbers that are fairly small (less than a trillion). Any lvalue can be incremented. Fields and array elements are incremented just like variables. The decrement operator `--' works just like `++' except that it subtracts 1 instead of adding. Like `++', it can be used before the lvalue to pre-decrement or after it to post-decrement. Here is a summary of increment and decrement expressions. `++LVALUE' This expression increments LVALUE and the new value becomes the value of this expression. `LVALUE++' This expression causes the contents of LVALUE to be incremented. The value of the expression is the *old* value of LVALUE. `--LVALUE' Like `++LVALUE', but instead of adding, it subtracts. It decrements LVALUE and delivers the value that results. `LVALUE--' Like `LVALUE++', but instead of adding, it subtracts. It decrements LVALUE. The value of the expression is the *old* value of LVALUE. ▶1f◀ File: gawk-info, Node: Conversion, Next: Conditional Exp, Prev: Increment Ops, Up: Expressions Conversion of Strings and Numbers ================================= Strings are converted to numbers, and numbers to strings, if the context of the `awk' program demands it. For example, if the value of either `foo' or `bar' in the expression `foo + bar' happens to be a string, it is converted to a number before the addition is performed. If numeric values appear in string concatenation, they are converted to strings. Consider this: two = 2; three = 3 print (two three) + 4 This eventually prints the (numeric) value 27. The numeric values of the variables `two' and `three' are converted to strings and concatenated together, and the resulting string is converted back to the number 23, to which 4 is then added. If, for some reason, you need to force a number to be converted to a string, concatenate the null string with that number. To force a string to be converted to a number, add zero to that string. Strings are converted to numbers by interpreting them as numerals: `"2.5"' converts to 2.5, and `"1e3"' converts to 1000. Strings that can't be interpreted as valid numbers are converted to zero. The exact manner in which numbers are converted into strings is controlled by the `awk' built-in variable `OFMT' (*note Built-in Variables::.). Numbers are converted using a special version of the `sprintf' function (*note Built-in::.) with `OFMT' as the format specifier. `OFMT''s default value is `"%.6g"', which prints a value with at least six significant digits. For some applications you will want to change it to specify more precision. Double precision on most modern machines gives you 16 or 17 decimal digits of precision. Strange results can happen if you set `OFMT' to a string that doesn't tell `sprintf' how to format floating point numbers in a useful way. For example, if you forget the `%' in the format, all numbers will be converted to the same constant string. ▶1f◀ File: gawk-info, Node: Conditional Exp, Next: Function Calls, Prev: Conversion, Up: Expressions Conditional Expressions ======================= A "conditional expression" is a special kind of expression with three operands. It allows you to use one expression's value to select one of two other expressions. The conditional expression looks the same as in the C language: SELECTOR ? IF-TRUE-EXP : IF-FALSE-EXP There are three subexpressions. The first, SELECTOR, is always computed first. If it is ``true'' (not zero) then IF-TRUE-EXP is computed next and its value becomes the value of the whole expression. Otherwise, IF-FALSE-EXP is computed next and its value becomes the value of the whole expression. For example, this expression produces the absolute value of `x': x > 0 ? x : -x Each time the conditional expression is computed, exactly one of IF-TRUE-EXP and IF-FALSE-EXP is computed; the other is ignored. This is important when the expressions contain side effects. For example, this conditional expression examines element `i' of either array `a' or array `b', and increments `i'. x == y ? a[i++] : b[i++] This is guaranteed to increment `i' exactly once, because each time one or the other of the two increment expressions is executed, and the other is not.