DataMuseum.dk

Presents historical artifacts from the history of:

DKUUG/EUUG Conference tapes

This is an automatic "excavation" of a thematic subset of
artifacts from Datamuseum.dk's BitArchive.

See our Wiki for more about DKUUG/EUUG Conference tapes

Excavated with: AutoArchaeologist - Free & Open Source Software.


top - metrics - download
Index: T g

⟦3a3441689⟧ TextFile

    Length: 49904 (0xc2f0)
    Types: TextFile
    Names: »gawk-info-3«

Derivation

└─⟦9ae75bfbd⟧ Bits:30007242 EUUGD3: Starter Kit
    └─⟦f133efdaf⟧ »EurOpenD3/gnu/gawk/gawk-doc-2.11.tar.Z« 
└─⟦a05ed705a⟧ Bits:30007078 DKUUG GNU 2/12/89
    └─⟦f133efdaf⟧ »./gawk-doc-2.11.tar.Z« 
        └─⟦8f64183b0⟧ 
            └─⟦this⟧ »gawk-2.11-doc/gawk-info-3« 

TextFile

Info file gawk-info, produced by Makeinfo, -*- Text -*- from input
file gawk.texinfo.

This file documents `awk', a program that you can use to select
particular records in a file and perform operations upon them.

Copyright (C) 1989 Free Software Foundation, Inc.

Permission is granted to make and distribute verbatim copies of this
manual provided the copyright notice and this permission notice are
preserved on all copies.

Permission is granted to copy and distribute modified versions of
this manual under the conditions for verbatim copying, provided that
the entire resulting derived work is distributed under the terms of a
permission notice identical to this one.

Permission is granted to copy and distribute translations of this
manual into another language, under the above conditions for modified
versions, except that this permission notice may be stated in a
translation approved by the Foundation.


▶1f◀
File: gawk-info,  Node: Patterns,  Next: Actions,  Prev: One-liners,  Up: Top

Patterns
********

Patterns in `awk' control the execution of rules: a rule is executed
when its pattern matches the current input record.  This chapter
tells all about how to write patterns.


* Menu:

* Kinds of Patterns::    A list of all kinds of patterns.
                         The following subsections describe them in detail.

* Empty::                The empty pattern, which matches every record.

* Regexp::               Regular expressions such as `/foo/'.

* Comparison Patterns::  Comparison expressions such as `$1 > 10'.

* Boolean Patterns::     Combining comparison expressions.

* Expression Patterns::  Any expression can be used as a pattern.

* Ranges::               Using pairs of patterns to specify record ranges.

* BEGIN/END::            Specifying initialization and cleanup rules.

 
▶1f◀
File: gawk-info,  Node: Kinds of Patterns,  Next: Empty,  Prev: Patterns,  Up: Patterns

Kinds of Patterns
=================

Here is a summary of the types of patterns supported in `awk'.

`/REGULAR EXPRESSION/'
     A regular expression as a pattern.  It matches when the text of
     the input record fits the regular expression.  (*Note Regular
     Expressions as Patterns: Regexp.)

`EXPRESSION'
     A single expression.  It matches when its value, converted to a
     number, is nonzero (if a number) or nonnull (if a string). 
     (*Note Expression Patterns::.)

`PAT1, PAT2'
     A pair of patterns separated by a comma, specifying a range of
     records.  (*Note Specifying Record Ranges With Patterns: Ranges.)

`BEGIN'
`END'
     Special patterns to supply start-up or clean-up information to
     `awk'.  (*Note BEGIN/END::.)

`NULL'
     The empty pattern matches every input record.  (*Note The Empty
     Pattern: Empty.)


▶1f◀
File: gawk-info,  Node: Empty,  Next: Regexp,  Prev: Kinds of Patterns,  Up: Patterns

The Empty Pattern
=================

An empty pattern is considered to match *every* input record.  For
example, the program:

     awk '{ print $1 }' BBS-list

prints just the first field of every record.


▶1f◀
File: gawk-info,  Node: Regexp,  Next: Comparison Patterns,  Prev: Empty,  Up: Patterns

Regular Expressions as Patterns
===============================

A "regular expression", or "regexp", is a way of describing a class
of strings.  A regular expression enclosed in slashes (`/') is an
`awk' pattern that matches every input record whose text belongs to
that class.

The simplest regular expression is a sequence of letters, numbers, or
both.  Such a regexp matches any string that contains that sequence. 
Thus, the regexp `foo' matches any string containing `foo'. 
Therefore, the pattern `/foo/' matches any input record containing
`foo'.  Other kinds of regexps let you specify more complicated
classes of strings.


* Menu:

* Usage: Regexp Usage.          How regexps are used in patterns.
* Operators: Regexp Operators.  How to write a regexp.
* Case-sensitivity::            How to do case-insensitive matching.

 
▶1f◀
File: gawk-info,  Node: Regexp Usage,  Next: Regexp Operators,  Prev: Regexp,  Up: Regexp

How to Use Regular Expressions
------------------------------

A regular expression can be used as a pattern by enclosing it in
slashes.  Then the regular expression is matched against the entire
text of each record.  (Normally, it only needs to match some part of
the text in order to succeed.)  For example, this prints the second
field of each record that contains `foo' anywhere:

     awk '/foo/ { print $2 }' BBS-list

Regular expressions can also be used in comparison expressions.  Then
you can specify the string to match against; it need not be the
entire current input record.  These comparison expressions can be
used as patterns or in `if' and `while' statements.

`EXP ~ /REGEXP/'
     This is true if the expression EXP (taken as a character string)
     is matched by REGEXP.  The following example matches, or
     selects, all input records with the upper-case letter `J'
     somewhere in the first field:

          awk '$1 ~ /J/' inventory-shipped

     So does this:

          awk '{ if ($1 ~ /J/) print }' inventory-shipped

`EXP !~ /REGEXP/'
     This is true if the expression EXP (taken as a character string)
     is *not* matched by REGEXP.  The following example matches, or
     selects, all input records whose first field *does not* contain
     the upper-case letter `J':

          awk '$1 !~ /J/' inventory-shipped

The right hand side of a `~' or `!~' operator need not be a constant
regexp (i.e., a string of characters between slashes).  It may be any
expression.  The expression is evaluated, and converted if necessary
to a string; the contents of the string are used as the regexp.  A
regexp that is computed in this way is called a "dynamic regexp". 
For example:

     identifier_regexp = "[A-Za-z_][A-Za-z_0-9]+"
     $0 ~ identifier_regexp

sets `identifier_regexp' to a regexp that describes `awk' variable
names, and tests if the input record matches this regexp.


▶1f◀
File: gawk-info,  Node: Regexp Operators,  Next: Case-sensitivity,  Prev: Regexp Usage,  Up: Regexp

Regular Expression Operators
----------------------------

You can combine regular expressions with the following characters,
called "regular expression operators", or "metacharacters", to
increase the power and versatility of regular expressions.

Here is a table of metacharacters.  All characters not listed in the
table stand for themselves.

`^'
     This matches the beginning of the string or the beginning of a
     line within the string.  For example:

          ^@chapter

     matches the `@chapter' at the beginning of a string, and can be
     used to identify chapter beginnings in Texinfo source files.

`$'
     This is similar to `^', but it matches only at the end of a
     string or the end of a line within the string.  For example:

          p$

     matches a record that ends with a `p'.

`.'
     This matches any single character except a newline.  For example:

          .P

     matches any single character followed by a `P' in a string. 
     Using concatenation we can make regular expressions like `U.A',
     which matches any three-character sequence that begins with `U'
     and ends with `A'.

`[...]'
     This is called a "character set".  It matches any one of the
     characters that are enclosed in the square brackets.  For example:

          [MVX]

     matches any of the characters `M', `V', or `X' in a string.

     Ranges of characters are indicated by using a hyphen between the
     beginning and ending characters, and enclosing the whole thing
     in brackets.  For example:

          [0-9]

     matches any digit.

     To include the character `\', `]', `-' or `^' in a character
     set, put a `\' in front of it.  For example:

          [d\]]

     matches either `]', or `d'.

     This treatment of `\' is compatible with other `awk'
     implementations but incompatible with the proposed POSIX
     specification for `awk'.  The current draft specifies the use of
     the same syntax used in `egrep'.

     We may change `gawk' to fit the standard, once we are sure it
     will no longer change.  For the meanwhile, the `-a' option
     specifies the traditional `awk' syntax described above (which is
     also the default), while the `-e' option specifies `egrep' syntax.
     *Note Options::.

     In `egrep' syntax, backslash is not syntactically special within
     square brackets.  This means that special tricks have to be used
     to represent the characters `]', `-' and `^' as members of a
     character set.

     To match `-', write it as `--', which is a range containing only
     `-'.  You may also give `-' as the first or last character in
     the set.  To match `^', put it anywhere except as the first
     character of a set.  To match a `]', make it the first character
     in the set.  For example:

          []d^]

     matches either `]', `d' or `^'.

`[^ ...]'
     This is a "complemented character set".  The first character
     after the `[' *must* be a `^'.  It matches any characters
     *except* those in the square brackets.  For example:

          [^0-9]

     matches any character that is not a digit.

`|'
     This is the "alternation operator" and it is used to specify
     alternatives.  For example:

          ^P|[0-9]

     matches any string that matches either `^P' or `[0-9]'.  This
     means it matches any string that contains a digit or starts with
     `P'.

     The alternation applies to the largest possible regexps on
     either side.

`(...)'
     Parentheses are used for grouping in regular expressions as in
     arithmetic.  They can be used to concatenate regular expressions
     containing the alternation operator, `|'.

`*'
     This symbol means that the preceding regular expression is to be
     repeated as many times as possible to find a match.  For example:

          ph*

     applies the `*' symbol to the preceding `h' and looks for
     matches to one `p' followed by any number of `h's.  This will
     also match just `p' if no `h's are present.

     The `*' repeats the *smallest* possible preceding expression. 
     (Use parentheses if you wish to repeat a larger expression.)  It
     finds as many repetitions as possible.  For example:

          awk '/\(c[ad][ad]*r x\)/ { print }' sample

     prints every record in the input containing a string of the form
     `(car x)', `(cdr x)', `(cadr x)', and so on.

`+'
     This symbol is similar to `*', but the preceding expression must
     be matched at least once.  This means that:

          wh+y

     would match `why' and `whhy' but not `wy', whereas `wh*y' would
     match all three of these strings.  This is a simpler way of
     writing the last `*' example:

          awk '/\(c[ad]+r x\)/ { print }' sample

`?'
     This symbol is similar to `*', but the preceding expression can
     be matched once or not at all.  For example:

          fe?d

     will match `fed' or `fd', but nothing else.

`\'
     This is used to suppress the special meaning of a character when
     matching.  For example:

          \$

     matches the character `$'.

     The escape sequences used for string constants (*note
     Constants::.) are valid in regular expressions as well; they are
     also introduced by a `\'.

In regular expressions, the `*', `+', and `?' operators have the
highest precedence, followed by concatenation, and finally by `|'. 
As in arithmetic, parentheses can change how operators are grouped.


▶1f◀
File: gawk-info,  Node: Case-sensitivity,  Prev: Regexp Operators,  Up: Regexp

Case-sensitivity in Matching
----------------------------

Case is normally significant in regular expressions, both when
matching ordinary characters (i.e., not metacharacters), and inside
character sets.  Thus a `w' in a regular expression matches only a
lower case `w' and not an upper case `W'.

The simplest way to do a case-independent match is to use a character
set: `[Ww]'.  However, this can be cumbersome if you need to use it
often; and it can make the regular expressions harder for humans to
read.  There are two other alternatives that you might prefer.

One way to do a case-insensitive match at a particular point in the
program is to convert the data to a single case, using the `tolower'
or `toupper' built-in string functions (which we haven't discussed
yet; *note String Functions::.).  For example:

     tolower($1) ~ /foo/  { ... }

converts the first field to lower case before matching against it.

Another method is to set the variable `IGNORECASE' to a nonzero value
(*note Built-in Variables::.).  When `IGNORECASE' is not zero, *all*
regexp operations ignore case.  Changing the value of `IGNORECASE'
dynamically controls the case sensitivity of your program as it runs.
Case is significant by default because `IGNORECASE' (like most
variables) is initialized to zero.

     x = "aB"
     if (x ~ /ab/) ...   # this test will fail
     
     IGNORECASE = 1
     if (x ~ /ab/) ...   # now it will succeed

You cannot generally use `IGNORECASE' to make certain rules
case-insensitive and other rules case-sensitive, because there is no
way to set `IGNORECASE' just for the pattern of a particular rule. 
To do this, you must use character sets or `tolower'.  However, one
thing you can do only with `IGNORECASE' is turn case-sensitivity on
or off dynamically for all the rules at once.

`IGNORECASE' can be set on the command line, or in a `BEGIN' rule. 
Setting `IGNORECASE' from the command line is a way to make a program
case-insensitive without having to edit it.

The value of `IGNORECASE' has no effect if `gawk' is in compatibility
mode (*note Command Line::.).  Case is always significant in
compatibility mode.


▶1f◀
File: gawk-info,  Node: Comparison Patterns,  Next: Boolean Patterns,  Prev: Regexp,  Up: Patterns

Comparison Expressions as Patterns
==================================

"Comparison patterns" test relationships such as equality between two
strings or numbers.  They are a special case of expression patterns
(*note Expression Patterns::.).  They are written with "relational
operators", which are a superset of those in C.  Here is a table of
them:

`X < Y'
     True if X is less than Y.

`X <= Y'
     True if X is less than or equal to Y.

`X > Y'
     True if X is greater than Y.

`X >= Y'
     True if X is greater than or equal to Y.

`X == Y'
     True if X is equal to Y.

`X != Y'
     True if X is not equal to Y.

`X ~ Y'
     True if X matches the regular expression described by Y.

`X !~ Y'
     True if X does not match the regular expression described by Y.

The operands of a relational operator are compared as numbers if they
are both numbers.  Otherwise they are converted to, and compared as,
strings (*note Conversion::.).  Strings are compared by comparing the
first character of each, then the second character of each, and so
on, until there is a difference.  If the two strings are equal until
the shorter one runs out, the shorter one is considered to be less
than the longer one.  Thus, `"10"' is less than `"9"'.

The left operand of the `~' and `!~' operators is a string.  The
right operand is either a constant regular expression enclosed in
slashes (`/REGEXP/'), or any expression, whose string value is used
as a dynamic regular expression (*note Regexp Usage::.).

The following example prints the second field of each input record
whose first field is precisely `foo'.

     awk '$1 == "foo" { print $2 }' BBS-list

Contrast this with the following regular expression match, which
would accept any record with a first field that contains `foo':

     awk '$1 ~ "foo" { print $2 }' BBS-list

or, equivalently, this one:

     awk '$1 ~ /foo/ { print $2 }' BBS-list


▶1f◀
File: gawk-info,  Node: Boolean Patterns,  Next: Expression Patterns,  Prev: Comparison Patterns,  Up: Patterns

Boolean Operators and Patterns
==============================

A "boolean pattern" is an expression which combines other patterns
using the "boolean operators" ``or'' (`||'), ``and'' (`&&'), and
``not'' (`!').  Whether the boolean pattern matches an input record
depends on whether its subpatterns match.

For example, the following command prints all records in the input
file `BBS-list' that contain both `2400' and `foo'.

     awk '/2400/ && /foo/' BBS-list

The following command prints all records in the input file `BBS-list'
that contain *either* `2400' or `foo', or both.

     awk '/2400/ || /foo/' BBS-list

The following command prints all records in the input file `BBS-list'
that do *not* contain the string `foo'.

     awk '! /foo/' BBS-list

Note that boolean patterns are a special case of expression patterns
(*note Expression Patterns::.); they are expressions that use the
boolean operators.  For complete information on the boolean
operators, see *Note Boolean Ops::.

The subpatterns of a boolean pattern can be constant regular
expressions, comparisons, or any other `gawk' expressions.  Range
patterns are not expressions, so they cannot appear inside boolean
patterns.  Likewise, the special patterns `BEGIN' and `END', which
never match any input record, are not expressions and cannot appear
inside boolean patterns.


▶1f◀
File: gawk-info,  Node: Expression Patterns,  Next: Ranges,  Prev: Boolean Patterns,  Up: Patterns

Expressions as Patterns
=======================

Any `awk' expression is valid also as a pattern in `gawk'.  Then the
pattern ``matches'' if the expression's value is nonzero (if a
number) or nonnull (if a string).

The expression is reevaluated each time the rule is tested against a
new input record.  If the expression uses fields such as `$1', the
value depends directly on the new input record's text; otherwise, it
depends only on what has happened so far in the execution of the
`awk' program, but that may still be useful.

Comparison patterns are actually a special case of this.  For
example, the expression `$5 == "foo"' has the value 1 when the value
of `$5' equals `"foo"', and 0 otherwise; therefore, this expression
as a pattern matches when the two values are equal.

Boolean patterns are also special cases of expression patterns.

A constant regexp as a pattern is also a special case of an
expression pattern.  `/foo/' as an expression has the value 1 if
`foo' appears in the current input record; thus, as a pattern,
`/foo/' matches any record containing `foo'.

Other implementations of `awk' are less general than `gawk': they
allow comparison expressions, and boolean combinations thereof
(optionally with parentheses), but not necessarily other kinds of
expressions.


▶1f◀
File: gawk-info,  Node: Ranges,  Next: BEGIN/END,  Prev: Expression Patterns,  Up: Patterns

Specifying Record Ranges With Patterns
======================================

A "range pattern" is made of two patterns separated by a comma, of
the form `BEGPAT, ENDPAT'.  It matches ranges of consecutive input
records.  The first pattern BEGPAT controls where the range begins,
and the second one ENDPAT controls where it ends.  For example,

     awk '$1 == "on", $1 == "off"'

prints every record between `on'/`off' pairs, inclusive.

In more detail, a range pattern starts out by matching BEGPAT against
every input record; when a record matches BEGPAT, the range pattern
becomes "turned on".  The range pattern matches this record.  As long
as it stays turned on, it automatically matches every input record
read.  But meanwhile, it also matches ENDPAT against every input
record, and when that succeeds, the range pattern is turned off again
for the following record.  Now it goes back to checking BEGPAT
against each record.

The record that turns on the range pattern and the one that turns it
off both match the range pattern.  If you don't want to operate on
these records, you can write `if' statements in the rule's action to
distinguish them.

It is possible for a pattern to be turned both on and off by the same
record, if both conditions are satisfied by that record.  Then the
action is executed for just that record.


▶1f◀
File: gawk-info,  Node: BEGIN/END,  Prev: Ranges,  Up: Patterns

`BEGIN' and `END' Special Patterns
==================================

`BEGIN' and `END' are special patterns.  They are not used to match
input records.  Rather, they are used for supplying start-up or
clean-up information to your `awk' script.  A `BEGIN' rule is
executed, once, before the first input record has been read.  An
`END' rule is executed, once, after all the input has been read.  For
example:

     awk 'BEGIN { print "Analysis of `foo'" }
          /foo/ { ++foobar }
          END   { print "`foo' appears " foobar " times." }' BBS-list

This program finds out how many times the string `foo' appears in the
input file `BBS-list'.  The `BEGIN' rule prints a title for the
report.  There is no need to use the `BEGIN' rule to initialize the
counter `foobar' to zero, as `awk' does this for us automatically
(*note Variables::.).

The second rule increments the variable `foobar' every time a record
containing the pattern `foo' is read.  The `END' rule prints the
value of `foobar' at the end of the run.

The special patterns `BEGIN' and `END' cannot be used in ranges or
with boolean operators.

An `awk' program may have multiple `BEGIN' and/or `END' rules.  They
are executed in the order they appear, all the `BEGIN' rules at
start-up and all the `END' rules at termination.

Multiple `BEGIN' and `END' sections are useful for writing library
functions, since each library can have its own `BEGIN' or `END' rule
to do its own initialization and/or cleanup.  Note that the order in
which library functions are named on the command line controls the
order in which their `BEGIN' and `END' rules are executed.  Therefore
you have to be careful to write such rules in library files so that
it doesn't matter what order they are executed in.  *Note Command
Line::, for more information on using library functions.

If an `awk' program only has a `BEGIN' rule, and no other rules, then
the program exits after the `BEGIN' rule has been run.  (Older
versions of `awk' used to keep reading and ignoring input until end
of file was seen.)  However, if an `END' rule exists as well, then
the input will be read, even if there are no other rules in the
program.  This is necessary in case the `END' rule checks the `NR'
variable.

`BEGIN' and `END' rules must have actions; there is no default action
for these rules since there is no current record when they run.


▶1f◀
File: gawk-info,  Node: Actions,  Next: Expressions,  Prev: Patterns,  Up: Top

Actions: Overview
*****************

An `awk' "program" or "script" consists of a series of "rules" and
function definitions, interspersed.  (Functions are described later;
see *Note User-defined::.)

A rule contains a pattern and an "action", either of which may be
omitted.  The purpose of the action is to tell `awk' what to do once
a match for the pattern is found.  Thus, the entire program looks
somewhat like this:

     [PATTERN] [{ ACTION }]
     [PATTERN] [{ ACTION }]
     ...
     function NAME (ARGS) { ... }
     ...

 An action consists of one or more `awk' "statements", enclosed in
curly braces (`{' and `}').  Each statement specifies one thing to be
done.  The statements are separated by newlines or semicolons.

The curly braces around an action must be used even if the action
contains only one statement, or even if it contains no statements at
all.  However, if you omit the action entirely, omit the curly braces
as well.  (An omitted action is equivalent to `{ print $0 }'.)

Here are the kinds of statement supported in `awk':

   * Expressions, which can call functions or assign values to
     variables (*note Expressions::.).  Executing this kind of
     statement simply computes the value of the expression and then
     ignores it.  This is useful when the expression has side effects
     (*note Assignment Ops::.).

   * Control statements, which specify the control flow of `awk'
     programs.  The `awk' language gives you C-like constructs (`if',
     `for', `while', and so on) as well as a few special ones (*note
     Statements::.).

   * Compound statements, which consist of one or more statements
     enclosed in curly braces.  A compound statement is used in order
     to put several statements together in the body of an `if',
     `while', `do' or `for' statement.

   * Input control, using the `getline' function (*note Getline::.),
     and the `next' statement (*note Next Statement::.).

   * Output statements, `print' and `printf'.  *Note Printing::.

   * Deletion statements, for deleting array elements.  *Note Delete::.


▶1f◀
File: gawk-info,  Node: Expressions,  Next: Statements,  Prev: Actions,  Up: Top

Actions: Expressions
********************

Expressions are the basic building block of `awk' actions.  An
expression evaluates to a value, which you can print, test, store in
a variable or pass to a function.

But, beyond that, an expression can assign a new value to a variable
or a field, with an assignment operator.

An expression can serve as a statement on its own.  Most other kinds
of statement contain one or more expressions which specify data to be
operated on.  As in other languages, expressions in `awk' include
variables, array references, constants, and function calls, as well
as combinations of these with various operators.


* Menu:

* Constants::       String, numeric, and regexp constants.
* Variables::       Variables give names to values for later use.
* Arithmetic Ops::  Arithmetic operations (`+', `-', etc.)
* Concatenation::   Concatenating strings.
* Comparison Ops::  Comparison of numbers and strings with `<', etc.
* Boolean Ops::     Combining comparison expressions using boolean operators
                    `||' (``or''), `&&' (``and'') and `!' (``not'').

* Assignment Ops::  Changing the value of a variable or a field.
* Increment Ops::   Incrementing the numeric value of a variable.

* Conversion::      The conversion of strings to numbers and vice versa.
* Conditional Exp:: Conditional expressions select between two subexpressions
                    under control of a third subexpression.
* Function Calls::  A function call is an expression.
* Precedence::      How various operators nest.

 
▶1f◀
File: gawk-info,  Node: Constants,  Next: Variables,  Prev: Expressions,  Up: Expressions

Constant Expressions
====================

The simplest type of expression is the "constant", which always has
the same value.  There are three types of constant: numeric
constants, string constants, and regular expression constants.

A "numeric constant" stands for a number.  This number can be an
integer, a decimal fraction, or a number in scientific (exponential)
notation.  Note that all numeric values are represented within `awk'
in double-precision floating point.  Here are some examples of
numeric constants, which all have the same value:

     105
     1.05e+2
     1050e-1

A string constant consists of a sequence of characters enclosed in
double-quote marks.  For example:

     "parrot"

represents the string whose contents are `parrot'.  Strings in `gawk'
can be of any length and they can contain all the possible 8-bit
ASCII characters including ASCII NUL.  Other `awk' implementations
may have difficulty with some character codes.

Some characters cannot be included literally in a string constant. 
You represent them instead with "escape sequences", which are
character sequences beginning with a backslash (`\').

One use of an escape sequence is to include a double-quote character
in a string constant.  Since a plain double-quote would end the
string, you must use `\"' to represent a single double-quote
character as a part of the string.  Backslash itself is another
character that can't be included normally; you write `\\' to put one
backslash in the string.  Thus, the string whose contents are the two
characters `"\' must be written `"\"\\"'.

Another use of backslash is to represent unprintable characters such
as newline.  While there is nothing to stop you from writing most of
these characters directly in a string constant, they may look ugly.

Here is a table of all the escape sequences used in `awk':

`\\'
     Represents a literal backslash, `\'.

`\a'
     Represents the ``alert'' character, control-g, ASCII code 7.

`\b'
     Represents a backspace, control-h, ASCII code 8.

`\f'
     Represents a formfeed, control-l, ASCII code 12.

`\n'
     Represents a newline, control-j, ASCII code 10.

`\r'
     Represents a carriage return, control-m, ASCII code 13.

`\t'
     Represents a horizontal tab, control-i, ASCII code 9.

`\v'
     Represents a vertical tab, control-k, ASCII code 11.

`\NNN'
     Represents the octal value NNN, where NNN are one to three
     digits between 0 and 7.  For example, the code for the ASCII ESC
     (escape) character is `\033'.

`\xHH...'
     Represents the hexadecimal value HH, where HH are hexadecimal
     digits (`0' through `9' and either `A' through `F' or `a'
     through `f').  Like the same construct in ANSI C, the escape
     sequence continues until the first non-hexadecimal digit is
     seen.  However, using more than two hexadecimal digits produces
     undefined results.

A constant regexp is a regular expression description enclosed in
slashes, such as `/^beginning and end$/'.  Most regexps used in `awk'
programs are constant, but the `~' and `!~' operators can also match
computed or ``dynamic'' regexps (*note Regexp Usage::.).

Constant regexps are useful only with the `~' and `!~' operators; you
cannot assign them to variables or print them.  They are not truly
expressions in the usual sense.


▶1f◀
File: gawk-info,  Node: Variables,  Next: Arithmetic Ops,  Prev: Constants,  Up: Expressions

Variables
=========

Variables let you give names to values and refer to them later.  You
have already seen variables in many of the examples.  The name of a
variable must be a sequence of letters, digits and underscores, but
it may not begin with a digit.  Case is significant in variable
names; `a' and `A' are distinct variables.

A variable name is a valid expression by itself; it represents the
variable's current value.  Variables are given new values with
"assignment operators" and "increment operators".  *Note Assignment
Ops::.

A few variables have special built-in meanings, such as `FS', the
field separator, and `NF', the number of fields in the current input
record.  *Note Built-in Variables::, for a list of them.  These
built-in variables can be used and assigned just like all other
variables, but their values are also used or changed automatically by
`awk'.  Each built-in variable's name is made entirely of upper case
letters.

Variables in `awk' can be assigned either numeric values or string
values.  By default, variables are initialized to the null string,
which is effectively zero if converted to a number.  So there is no
need to ``initialize'' each variable explicitly in `awk', the way you
would need to do in C or most other traditional programming languages.


* Menu:

* Assignment Options::  Setting variables on the command line and a summary
                        of command line syntax.  This is an advanced method
                        of input.

 
▶1f◀
File: gawk-info,  Node: Assignment Options,  Prev: Variables,  Up: Variables

Assigning Variables on the Command Line
---------------------------------------

You can set any `awk' variable by including a "variable assignment"
among the arguments on the command line when you invoke `awk' (*note
Command Line::.).  Such an assignment has this form:

     VARIABLE=TEXT

With it, you can set a variable either at the beginning of the `awk'
run or in between input files.

If you precede the assignment with the `-v' option, like this:

     -v VARIABLE=TEXT

then the variable is set at the very beginning, before even the
`BEGIN' rules are run.  The `-v' option and its assignment must
precede all the file name arguments.

Otherwise, the variable assignment is performed at a time determined
by its position among the input file arguments: after the processing
of the preceding input file argument.  For example:

     awk '{ print $n }' n=4 inventory-shipped n=2 BBS-list

prints the value of field number `n' for all input records.  Before
the first file is read, the command line sets the variable `n' equal
to 4.  This causes the fourth field to be printed in lines from the
file `inventory-shipped'.  After the first file has finished, but
before the second file is started, `n' is set to 2, so that the
second field is printed in lines from `BBS-list'.

Command line arguments are made available for explicit examination by
the `awk' program in an array named `ARGV' (*note Built-in
Variables::.).


▶1f◀
File: gawk-info,  Node: Arithmetic Ops,  Next: Concatenation,  Prev: Variables,  Up: Expressions

Arithmetic Operators
====================

The `awk' language uses the common arithmetic operators when
evaluating expressions.  All of these arithmetic operators follow
normal precedence rules, and work as you would expect them to.  This
example divides field three by field four, adds field two, stores the
result into field one, and prints the resulting altered input record:

     awk '{ $1 = $2 + $3 / $4; print }' inventory-shipped

The arithmetic operators in `awk' are:

`X + Y'
     Addition.

`X - Y'
     Subtraction.

`- X'
     Negation.

`X * Y'
     Multiplication.

`X / Y'
     Division.  Since all numbers in `awk' are double-precision
     floating point, the result is not rounded to an integer: `3 / 4'
     has the value 0.75.

`X % Y'
     Remainder.  The quotient is rounded toward zero to an integer,
     multiplied by Y and this result is subtracted from X.  This
     operation is sometimes known as ``trunc-mod''.  The following
     relation always holds:

          b * int(a / b) + (a % b) == a

     One undesirable effect of this definition of remainder is that
     `X % Y' is negative if X is negative.  Thus,

          -17 % 8 = -1

     In other `awk' implementations, the signedness of the remainder
     may be machine dependent.

`X ^ Y'
`X ** Y'
     Exponentiation: X raised to the Y power.  `2 ^ 3' has the value
     8.  The character sequence `**' is equivalent to `^'.


▶1f◀
File: gawk-info,  Node: Concatenation,  Next: Comparison Ops,  Prev: Arithmetic Ops,  Up: Expressions

String Concatenation
====================

There is only one string operation: concatenation.  It does not have
a specific operator to represent it.  Instead, concatenation is
performed by writing expressions next to one another, with no
operator.  For example:

     awk '{ print "Field number one: " $1 }' BBS-list

produces, for the first record in `BBS-list':

     Field number one: aardvark

Without the space in the string constant after the `:', the line
would run together.  For example:

     awk '{ print "Field number one:" $1 }' BBS-list

produces, for the first record in `BBS-list':

     Field number one:aardvark

Since string concatenation does not have an explicit operator, it is
often necessary to insure that it happens where you want it to by
enclosing the items to be concatenated in parentheses.  For example,
the following code fragment does not concatenate `file' and `name' as
you might expect:

     file = "file"
     name = "name"
     print "something meaningful" > file name

It is necessary to use the following:

     print "something meaningful" > (file name)

We recommend you use parentheses around concatenation in all but the
most common contexts (such as in the right-hand operand of `=').


▶1f◀
File: gawk-info,  Node: Comparison Ops,  Next: Boolean Ops,  Prev: Concatenation,  Up: Expressions

Comparison Expressions
======================

"Comparison expressions" compare strings or numbers for relationships
such as equality.  They are written using "relational operators",
which are a superset of those in C.  Here is a table of them:

`X < Y'
     True if X is less than Y.

`X <= Y'
     True if X is less than or equal to Y.

`X > Y'
     True if X is greater than Y.

`X >= Y'
     True if X is greater than or equal to Y.

`X == Y'
     True if X is equal to Y.

`X != Y'
     True if X is not equal to Y.

`X ~ Y'
     True if the string X matches the regexp denoted by Y.

`X !~ Y'
     True if the string X does not match the regexp denoted by Y.

`SUBSCRIPT in ARRAY'
     True if array ARRAY has an element with the subscript SUBSCRIPT.

Comparison expressions have the value 1 if true and 0 if false.

The operands of a relational operator are compared as numbers if they
are both numbers.  Otherwise they are converted to, and compared as,
strings (*note Conversion::.).  Strings are compared by comparing the
first character of each, then the second character of each, and so on.
Thus, `"10"' is less than `"9"'.

For example,

     $1 == "foo"

has the value of 1, or is true, if the first field of the current
input record is precisely `foo'.  By contrast,

     $1 ~ /foo/

has the value 1 if the first field contains `foo'.

The right hand operand of the `~' and `!~' operators may be either a
constant regexp (`/.../'), or it may be an ordinary expression, in
which case the value of the expression as a string is a dynamic
regexp (*note Regexp Usage::.).

In very recent implementations of `awk', a constant regular
expression in slashes by itself is also an expression.  The regexp
`/REGEXP/' is an abbreviation for this comparison expression:

     $0 ~ /REGEXP/

In some contexts it may be necessary to write parentheses around the
regexp to avoid confusing the `gawk' parser.  For example, `(/x/ -
/y/) > threshold' is not allowed, but `((/x/) - (/y/)) > threshold'
parses properly.

One special place where `/foo/' is *not* an abbreviation for `$0 ~
/foo/' is when it is the right-hand operand of `~' or `!~'!


▶1f◀
File: gawk-info,  Node: Boolean Ops,  Next: Assignment Ops,  Prev: Comparison Ops,  Up: Expressions

Boolean Expressions
===================

A "boolean expression" is combination of comparison expressions or
matching expressions, using the "boolean operators" ``or'' (`||'),
``and'' (`&&'), and ``not'' (`!'), along with parentheses to control
nesting.  The truth of the boolean expression is computed by
combining the truth values of the component expressions.

Boolean expressions can be used wherever comparison and matching
expressions can be used.  They can be used in `if' and `while'
statements.  They have numeric values (1 if true, 0 if false), which
come into place if the result of the boolean expression is stored in
a variable, or used in arithmetic.

In addition, every boolean expression is also a valid boolean
pattern, so you can use it as a pattern to control the execution of
rules.

Here are descriptions of the three boolean operators, with an example
of each.  It may be instructive to compare these examples with the
analogous examples of boolean patterns (*note Boolean Patterns::.),
which use the same boolean operators in patterns instead of
expressions.

`BOOLEAN1 && BOOLEAN2'
     True if both BOOLEAN1 and BOOLEAN2 are true.  For example, the
     following statement prints the current input record if it
     contains both `2400' and `foo'.

          if ($0 ~ /2400/ && $0 ~ /foo/) print

     The subexpression BOOLEAN2 is evaluated only if BOOLEAN1 is
     true.  This can make a difference when BOOLEAN2 contains
     expressions that have side effects: in the case of `$0 ~ /foo/
     && ($2 == bar++)', the variable `bar' is not incremented if
     there is no `foo' in the record.

`BOOLEAN1 || BOOLEAN2'
     True if at least one of BOOLEAN1 and BOOLEAN2 is true.  For
     example, the following command prints all records in the input
     file `BBS-list' that contain *either* `2400' or `foo', or both.

          awk '{ if ($0 ~ /2400/ || $0 ~ /foo/) print }' BBS-list

     The subexpression BOOLEAN2 is evaluated only if BOOLEAN1 is
     false.  This can make a difference when BOOLEAN2 contains
     expressions that have side effects.

`!BOOLEAN'
     True if BOOLEAN is false.  For example, the following program
     prints all records in the input file `BBS-list' that do *not*
     contain the string `foo'.

          awk '{ if (! ($0 ~ /foo/)) print }' BBS-list


▶1f◀
File: gawk-info,  Node: Assignment Ops,  Next: Increment Ops,  Prev: Boolean Ops,  Up: Expressions

Assignment Expressions
======================

An "assignment" is an expression that stores a new value into a
variable.  For example, let's assign the value 1 to the variable `z':

     z = 1

After this expression is executed, the variable `z' has the value 1. 
Whatever old value `z' had before the assignment is forgotten.

Assignments can store string values also.  For example, this would
store the value `"this food is good"' in the variable `message':

     thing = "food"
     predicate = "good"
     message = "this " thing " is " predicate

(This also illustrates concatenation of strings.)

The `=' sign is called an "assignment operator".  It is the simplest
assignment operator because the value of the right-hand operand is
stored unchanged.

Most operators (addition, concatenation, and so on) have no effect
except to compute a value.  If you ignore the value, you might as
well not use the operator.  An assignment operator is different; it
does produce a value, but even if you ignore the value, the
assignment still makes itself felt through the alteration of the
variable.  We call this a "side effect".

The left-hand operand of an assignment need not be a variable (*note
Variables::.); it can also be a field (*note Changing Fields::.) or
an array element (*note Arrays::.).  These are all called "lvalues",
which means they can appear on the left-hand side of an assignment
operator.  The right-hand operand may be any expression; it produces
the new value which the assignment stores in the specified variable,
field or array element.

It is important to note that variables do *not* have permanent types.
The type of a variable is simply the type of whatever value it
happens to hold at the moment.  In the following program fragment,
the variable `foo' has a numeric value at first, and a string value
later on:

     foo = 1
     print foo
     foo = "bar"
     print foo

When the second assignment gives `foo' a string value, the fact that
it previously had a numeric value is forgotten.

An assignment is an expression, so it has a value: the same value
that is assigned.  Thus, `z = 1' as an expression has the value 1. 
One consequence of this is that you can write multiple assignments
together:

     x = y = z = 0

stores the value 0 in all three variables.  It does this because the
value of `z = 0', which is 0, is stored into `y', and then the value
of `y = z = 0', which is 0, is stored into `x'.

You can use an assignment anywhere an expression is called for.  For
example, it is valid to write `x != (y = 1)' to set `y' to 1 and then
test whether `x' equals 1.  But this style tends to make programs
hard to read; except in a one-shot program, you should rewrite it to
get rid of such nesting of assignments.  This is never very hard.

Aside from `=', there are several other assignment operators that do
arithmetic with the old value of the variable.  For example, the
operator `+=' computes a new value by adding the right-hand value to
the old value of the variable.  Thus, the following assignment adds 5
to the value of `foo':

     foo += 5

This is precisely equivalent to the following:

     foo = foo + 5

Use whichever one makes the meaning of your program clearer.

Here is a table of the arithmetic assignment operators.  In each
case, the right-hand operand is an expression whose value is
converted to a number.

`LVALUE += INCREMENT'
     Adds INCREMENT to the value of LVALUE to make the new value of
     LVALUE.

`LVALUE -= DECREMENT'
     Subtracts DECREMENT from the value of LVALUE.

`LVALUE *= COEFFICIENT'
     Multiplies the value of LVALUE by COEFFICIENT.

`LVALUE /= QUOTIENT'
     Divides the value of LVALUE by QUOTIENT.

`LVALUE %= MODULUS'
     Sets LVALUE to its remainder by MODULUS.

`LVALUE ^= POWER'
`LVALUE **= POWER'
     Raises LVALUE to the power POWER.


▶1f◀
File: gawk-info,  Node: Increment Ops,  Next: Conversion,  Prev: Assignment Ops,  Up: Expressions

Increment Operators
===================

"Increment operators" increase or decrease the value of a variable by
1.  You could do the same thing with an assignment operator, so the
increment operators add no power to the `awk' language; but they are
convenient abbreviations for something very common.

The operator to add 1 is written `++'.  It can be used to increment a
variable either before or after taking its value.

To pre-increment a variable V, write `++V'.  This adds 1 to the value
of V and that new value is also the value of this expression.  The
assignment expression `V += 1' is completely equivalent.

Writing the `++' after the variable specifies post-increment.  This
increments the variable value just the same; the difference is that
the value of the increment expression itself is the variable's *old*
value.  Thus, if `foo' has value 4, then the expression `foo++' has
the value 4, but it changes the value of `foo' to 5.

The post-increment `foo++' is nearly equivalent to writing `(foo +=
1) - 1'.  It is not perfectly equivalent because all numbers in `awk'
are floating point: in floating point, `foo + 1 - 1' does not
necessarily equal `foo'.  But the difference is minute as long as you
stick to numbers that are fairly small (less than a trillion).

Any lvalue can be incremented.  Fields and array elements are
incremented just like variables.

The decrement operator `--' works just like `++' except that it
subtracts 1 instead of adding.  Like `++', it can be used before the
lvalue to pre-decrement or after it to post-decrement.

Here is a summary of increment and decrement expressions.

`++LVALUE'
     This expression increments LVALUE and the new value becomes the
     value of this expression.

`LVALUE++'
     This expression causes the contents of LVALUE to be incremented.
     The value of the expression is the *old* value of LVALUE.

`--LVALUE'
     Like `++LVALUE', but instead of adding, it subtracts.  It
     decrements LVALUE and delivers the value that results.

`LVALUE--'
     Like `LVALUE++', but instead of adding, it subtracts.  It
     decrements LVALUE.  The value of the expression is the *old*
     value of LVALUE.


▶1f◀
File: gawk-info,  Node: Conversion,  Next: Conditional Exp,  Prev: Increment Ops,  Up: Expressions

Conversion of Strings and Numbers
=================================

Strings are converted to numbers, and numbers to strings, if the
context of the `awk' program demands it.  For example, if the value
of either `foo' or `bar' in the expression `foo + bar' happens to be
a string, it is converted to a number before the addition is
performed.  If numeric values appear in string concatenation, they
are converted to strings.  Consider this:

     two = 2; three = 3
     print (two three) + 4

This eventually prints the (numeric) value 27.  The numeric values of
the variables `two' and `three' are converted to strings and
concatenated together, and the resulting string is converted back to
the number 23, to which 4 is then added.

If, for some reason, you need to force a number to be converted to a
string, concatenate the null string with that number.  To force a
string to be converted to a number, add zero to that string.

Strings are converted to numbers by interpreting them as numerals:
`"2.5"' converts to 2.5, and `"1e3"' converts to 1000.  Strings that
can't be interpreted as valid numbers are converted to zero.

The exact manner in which numbers are converted into strings is
controlled by the `awk' built-in variable `OFMT' (*note Built-in
Variables::.).  Numbers are converted using a special version of the
`sprintf' function (*note Built-in::.) with `OFMT' as the format
specifier.

`OFMT''s default value is `"%.6g"', which prints a value with at
least six significant digits.  For some applications you will want to
change it to specify more precision.  Double precision on most modern
machines gives you 16 or 17 decimal digits of precision.

Strange results can happen if you set `OFMT' to a string that doesn't
tell `sprintf' how to format floating point numbers in a useful way. 
For example, if you forget the `%' in the format, all numbers will be
converted to the same constant string.


▶1f◀
File: gawk-info,  Node: Conditional Exp,  Next: Function Calls,  Prev: Conversion,  Up: Expressions

Conditional Expressions
=======================

A "conditional expression" is a special kind of expression with three
operands.  It allows you to use one expression's value to select one
of two other expressions.

The conditional expression looks the same as in the C language:

     SELECTOR ? IF-TRUE-EXP : IF-FALSE-EXP

There are three subexpressions.  The first, SELECTOR, is always
computed first.  If it is ``true'' (not zero) then IF-TRUE-EXP is
computed next and its value becomes the value of the whole expression.
Otherwise, IF-FALSE-EXP is computed next and its value becomes the
value of the whole expression.

For example, this expression produces the absolute value of `x':

     x > 0 ? x : -x

Each time the conditional expression is computed, exactly one of
IF-TRUE-EXP and IF-FALSE-EXP is computed; the other is ignored.  This
is important when the expressions contain side effects.  For example,
this conditional expression examines element `i' of either array `a'
or array `b', and increments `i'.

     x == y ? a[i++] : b[i++]

This is guaranteed to increment `i' exactly once, because each time
one or the other of the two increment expressions is executed, and
the other is not.