DataMuseum.dk

Presents historical artifacts from the history of:

DKUUG/EUUG Conference tapes

This is an automatic "excavation" of a thematic subset of
artifacts from Datamuseum.dk's BitArchive.

See our Wiki for more about DKUUG/EUUG Conference tapes

Excavated with: AutoArchaeologist - Free & Open Source Software.


top - metrics - download
Index: T g

⟦343c0f354⟧ TextFile

    Length: 48221 (0xbc5d)
    Types: TextFile
    Names: »gawk-info-5«

Derivation

└─⟦9ae75bfbd⟧ Bits:30007242 EUUGD3: Starter Kit
    └─⟦f133efdaf⟧ »EurOpenD3/gnu/gawk/gawk-doc-2.11.tar.Z« 
└─⟦a05ed705a⟧ Bits:30007078 DKUUG GNU 2/12/89
    └─⟦f133efdaf⟧ »./gawk-doc-2.11.tar.Z« 
        └─⟦8f64183b0⟧ 
            └─⟦this⟧ »gawk-2.11-doc/gawk-info-5« 

TextFile

Info file gawk-info, produced by Makeinfo, -*- Text -*- from input
file gawk.texinfo.

This file documents `awk', a program that you can use to select
particular records in a file and perform operations upon them.

Copyright (C) 1989 Free Software Foundation, Inc.

Permission is granted to make and distribute verbatim copies of this
manual provided the copyright notice and this permission notice are
preserved on all copies.

Permission is granted to copy and distribute modified versions of
this manual under the conditions for verbatim copying, provided that
the entire resulting derived work is distributed under the terms of a
permission notice identical to this one.

Permission is granted to copy and distribute translations of this
manual into another language, under the above conditions for modified
versions, except that this permission notice may be stated in a
translation approved by the Foundation.


▶1f◀
File: gawk-info,  Node: String Functions,  Next: I/O Functions,  Prev: Numeric Functions,  Up: Built-in

Built-in Functions for String Manipulation
==========================================

  The functions in this section look at the text of one or more strings.

`index(IN, FIND)'
     This searches the string IN for the first occurrence of the
     string FIND, and returns the position where that occurrence
     begins in the string IN.  For example:

          awk 'BEGIN { print index("peanut", "an") }'

     prints `3'.  If FIND is not found, `index' returns 0.

`length(STRING)'
     This gives you the number of characters in STRING.  If STRING is
     a number, the length of the digit string representing that
     number is returned.  For example, `length("abcde")' is 5.  By
     contrast, `length(15 * 35)' works out to 3.  How?  Well, 15 * 35
     = 525, and 525 is then converted to the string `"525"', which
     has three characters.

     If no argument is supplied, `length' returns the length of `$0'.

`match(STRING, REGEXP)'
     The `match' function searches the string, STRING, for the
     longest, leftmost substring matched by the regular expression,
     REGEXP.  It returns the character position, or "index", of where
     that substring begins (1, if it starts at the beginning of
     STRING).  If no match if found, it returns 0.

     The `match' function sets the built-in variable `RSTART' to the
     index.  It also sets the built-in variable `RLENGTH' to the
     length of the matched substring.  If no match is found, `RSTART'
     is set to 0, and `RLENGTH' to -1.

     For example:

          awk '{
                 if ($1 == "FIND")
                   regex = $2
                 else {
                   where = match($0, regex)
                   if (where)
                     print "Match of", regex, "found at", where, "in", $0
                 }
          }'

     This program looks for lines that match the regular expression
     stored in the variable `regex'.  This regular expression can be
     changed.  If the first word on a line is `FIND', `regex' is
     changed to be the second word on that line.  Therefore, given:

          FIND fo*bar
          My program was a foobar
          But none of it would doobar
          FIND Melvin
          JF+KM
          This line is property of The Reality Engineering Co.
          This file created by Melvin.

      `awk' prints:

          Match of fo*bar found at 18 in My program was a foobar
          Match of Melvin found at 26 in This file created by Melvin.

`split(STRING, ARRAY, FIELDSEP)'
     This divides STRING up into pieces separated by FIELDSEP, and
     stores the pieces in ARRAY.  The first piece is stored in
     `ARRAY[1]', the second piece in `ARRAY[2]', and so forth.  The
     string value of the third argument, FIELDSEP, is used as a
     regexp to search for to find the places to split STRING.  If the
     FIELDSEP is omitted, the value of `FS' is used.  `split' returns
     the number of elements created.

     The `split' function, then, splits strings into pieces in a
     manner similar to the way input lines are split into fields. 
     For example:

          split("auto-da-fe", a, "-")

     splits the string `auto-da-fe' into three fields using `-' as
     the separator.  It sets the contents of the array `a' as follows:

          a[1] = "auto"
          a[2] = "da"
          a[3] = "fe"

     The value returned by this call to `split' is 3.

`sprintf(FORMAT, EXPRESSION1,...)'
     This returns (without printing) the string that `printf' would
     have printed out with the same arguments (*note Printf::.).  For
     example:

          sprintf("pi = %.2f (approx.)", 22/7)

     returns the string `"pi = 3.14 (approx.)"'.

`sub(REGEXP, REPLACEMENT, TARGET)'
     The `sub' function alters the value of TARGET.  It searches this
     value, which should be a string, for the leftmost substring
     matched by the regular expression, REGEXP, extending this match
     as far as possible.  Then the entire string is changed by
     replacing the matched text with REPLACEMENT.  The modified
     string becomes the new value of TARGET.

     This function is peculiar because TARGET is not simply used to
     compute a value, and not just any expression will do: it must be
     a variable, field or array reference, so that `sub' can store a
     modified value there.  If this argument is omitted, then the
     default is to use and alter `$0'.

     For example:

          str = "water, water, everywhere"
          sub(/at/, "ith", str)

     sets `str' to `"wither, water, everywhere"', by replacing the
     leftmost, longest occurrence of `at' with `ith'.

     The `sub' function returns the number of substitutions made
     (either one or zero).

     If the special character `&' appears in REPLACEMENT, it stands
     for the precise substring that was matched by REGEXP.  (If the
     regexp can match more than one string, then this precise
     substring may vary.)  For example:

          awk '{ sub(/candidate/, "& and his wife"); print }'

     changes the first occurrence of `candidate' to `candidate and
     his wife' on each input line.

     The effect of this special character can be turned off by
     putting a backslash before it in the string.  As usual, to
     insert one backslash in the string, you must write two
     backslashes.  Therefore, write `\\&' in a string constant to
     include a literal `&' in the replacement.  For example, here is
     how to replace the first `|' on each line with an `&':

          awk '{ sub(/\|/, "\\&"); print }'

     *Note:* as mentioned above, the third argument to `sub' must be
     an lvalue.  Some versions of `awk' allow the third argument to
     be an expression which is not an lvalue.  In such a case, `sub'
     would still search for the pattern and return 0 or 1, but the
     result of the substitution (if any) would be thrown away because
     there is no place to put it.  Such versions of `awk' accept
     expressions like this:

          sub(/USA/, "United States", "the USA and Canada")

     But that is considered erroneous in `gawk'.

`gsub(REGEXP, REPLACEMENT, TARGET)'
     This is similar to the `sub' function, except `gsub' replaces
     *all* of the longest, leftmost, *nonoverlapping* matching
     substrings it can find.  The `g' in `gsub' stands for
     ``global'', which means replace everywhere.  For example:

          awk '{ gsub(/Britain/, "United Kingdom"); print }'

     replaces all occurrences of the string `Britain' with `United
     Kingdom' for all input records.

     The `gsub' function returns the number of substitutions made. 
     If the variable to be searched and altered, TARGET, is omitted,
     then the entire input record, `$0', is used.

     As in `sub', the characters `&' and `\' are special, and the
     third argument must be an lvalue.

`substr(STRING, START, LENGTH)'
     This returns a LENGTH-character-long substring of STRING,
     starting at character number START.  The first character of a
     string is character number one.  For example,
     `substr("washington", 5, 3)' returns `"ing"'.

     If LENGTH is not present, this function returns the whole suffix
     of STRING that begins at character number START.  For example,
     `substr("washington", 5)' returns `"ington"'.

`tolower(STRING)'
     This returns a copy of STRING, with each upper-case character in
     the string replaced with its corresponding lower-case character.
     Nonalphabetic characters are left unchanged.  For example,
     `tolower("MiXeD cAsE 123")' returns `"mixed case 123"'.

`toupper(STRING)'
     This returns a copy of STRING, with each lower-case character in
     the string replaced with its corresponding upper-case character.
     Nonalphabetic characters are left unchanged.  For example,
     `toupper("MiXeD cAsE 123")' returns `"MIXED CASE 123"'.


▶1f◀
File: gawk-info,  Node: I/O Functions,  Prev: String Functions,  Up: Built-in

Built-in Functions For Input/Output
===================================

`close(FILENAME)'
     Close the file FILENAME, for input or output.  The argument may
     alternatively be a shell command that was used for redirecting
     to or from a pipe; then the pipe is closed.

     *Note Close Input::, regarding closing input files and pipes. 
     *Note Close Output::, regarding closing output files and pipes.

`system(COMMAND)'
     The system function allows the user to execute operating system
     commands and then return to the `awk' program.  The `system'
     function executes the command given by the string COMMAND.  It
     returns, as its value, the status returned by the command that
     was executed.

     For example, if the following fragment of code is put in your
     `awk' program:

          END {
               system("mail -s 'awk run done' operator < /dev/null")
          }

     the system operator will be sent mail when the `awk' program
     finishes processing input and begins its end-of-input processing.

     Note that much the same result can be obtained by redirecting
     `print' or `printf' into a pipe.  However, if your `awk' program
     is interactive, `system' is useful for cranking up large
     self-contained programs, such as a shell or an editor.

     Some operating systems cannot implement the `system' function. 
     `system' causes a fatal error if it is not supported.


▶1f◀
File: gawk-info,  Node: User-defined,  Next: Built-in Variables,  Prev: Built-in,  Up: Top

User-defined Functions
**********************

Complicated `awk' programs can often be simplified by defining your
own functions.  User-defined functions can be called just like
built-in ones (*note Function Calls::.), but it is up to you to
define them--to tell `awk' what they should do.


* Menu:

* Definition Syntax::   How to write definitions and what they mean.
* Function Example::    An example function definition and what it does.
* Function Caveats::    Things to watch out for.
* Return Statement::    Specifying the value a function returns.

 
▶1f◀
File: gawk-info,  Node: Definition Syntax,  Next: Function Example,  Prev: User-defined,  Up: User-defined

Syntax of Function Definitions
==============================

Definitions of functions can appear anywhere between the rules of the
`awk' program.  Thus, the general form of an `awk' program is
extended to include sequences of rules *and* user-defined function
definitions.

The definition of a function named NAME looks like this:

     function NAME (PARAMETER-LIST) {
          BODY-OF-FUNCTION
     }

The keyword `function' may be abbreviated `func'.

NAME is the name of the function to be defined.  A valid function
name is like a valid variable name: a sequence of letters, digits and
underscores, not starting with a digit.

PARAMETER-LIST is a list of the function's arguments and local
variable names, separated by commas.  When the function is called,
the argument names are used to hold the argument values given in the
call.  The local variables are initialized to the null string.

The BODY-OF-FUNCTION consists of `awk' statements.  It is the most
important part of the definition, because it says what the function
should actually *do*.  The argument names exist to give the body a
way to talk about the arguments; local variables, to give the body
places to keep temporary values.

Argument names are not distinguished syntactically from local
variable names; instead, the number of arguments supplied when the
function is called determines how many argument variables there are. 
Thus, if three argument values are given, the first three names in
PARAMETER-LIST are arguments, and the rest are local variables.

It follows that if the number of arguments is not the same in all
calls to the function, some of the names in PARAMETER-LIST may be
arguments on some occasions and local variables on others.  Another
way to think of this is that omitted arguments default to the null
string.

Usually when you write a function you know how many names you intend
to use for arguments and how many you intend to use as locals.  By
convention, you should write an extra space between the arguments and
the locals, so that other people can follow how your function is
supposed to be used.

During execution of the function body, the arguments and local
variable values hide or "shadow" any variables of the same names used
in the rest of the program.  The shadowed variables are not
accessible in the function definition, because there is no way to
name them while their names have been taken away for the local
variables.  All other variables used in the `awk' program can be
referenced or set normally in the function definition.

The arguments and local variables last only as long as the function
body is executing.  Once the body finishes, the shadowed variables
come back.

The function body can contain expressions which call functions.  They
can even call this function, either directly or by way of another
function.  When this happens, we say the function is "recursive".

There is no need in `awk' to put the definition of a function before
all uses of the function.  This is because `awk' reads the entire
program before starting to execute any of it.


▶1f◀
File: gawk-info,  Node: Function Example,  Next: Function Caveats,  Prev: Definition Syntax,  Up: User-defined

Function Definition Example
===========================

Here is an example of a user-defined function, called `myprint', that
takes a number and prints it in a specific format.

     function myprint(num)
     {
          printf "%6.3g\n", num
     }

To illustrate, here is an `awk' rule which uses our `myprint' function:

     $3 > 0     { myprint($3) }

This program prints, in our special format, all the third fields that
contain a positive number in our input.  Therefore, when given:

      1.2   3.4   5.6   7.8
      9.10 11.12 13.14 15.16
     17.18 19.20 21.22 23.24

this program, using our function to format the results, prints:

        5.6
       13.1
       21.2

Here is a rather contrived example of a recursive function.  It
prints a string backwards:

     function rev (str, len) {
         if (len == 0) {
             printf "\n"
             return
         }
         printf "%c", substr(str, len, 1)
         rev(str, len - 1)
     }


▶1f◀
File: gawk-info,  Node: Function Caveats,  Next: Return Statement,  Prev: Function Example,  Up: User-defined

Calling User-defined Functions
==============================

"Calling a function" means causing the function to run and do its job.
A function call is an expression, and its value is the value returned
by the function.

A function call consists of the function name followed by the
arguments in parentheses.  What you write in the call for the
arguments are `awk' expressions; each time the call is executed,
these expressions are evaluated, and the values are the actual
arguments.  For example, here is a call to `foo' with three arguments:

     foo(x y, "lose", 4 * z)

*Note:* whitespace characters (spaces and tabs) are not allowed
between the function name and the open-parenthesis of the argument
list.  If you write whitespace by mistake, `awk' might think that you
mean to concatenate a variable with an expression in parentheses. 
However, it notices that you used a function name and not a variable
name, and reports an error.

When a function is called, it is given a *copy* of the values of its
arguments.  This is called "call by value".  The caller may use a
variable as the expression for the argument, but the called function
does not know this: all it knows is what value the argument had.  For
example, if you write this code:

     foo = "bar"
     z = myfunc(foo)

then you should not think of the argument to `myfunc' as being ``the
variable `foo'''.  Instead, think of the argument as the string
value, `"bar"'.

If the function `myfunc' alters the values of its local variables,
this has no effect on any other variables.  In particular, if
`myfunc' does this:

     function myfunc (win) {
       print win
       win = "zzz"
       print win
     }

to change its first argument variable `win', this *does not* change
the value of `foo' in the caller.  The role of `foo' in calling
`myfunc' ended when its value, `"bar"', was computed.  If `win' also
exists outside of `myfunc', the function body cannot alter this outer
value, because it is shadowed during the execution of `myfunc' and
cannot be seen or changed from there.

However, when arrays are the parameters to functions, they are *not*
copied.  Instead, the array itself is made available for direct
manipulation by the function.  This is usually called "call by
reference".  Changes made to an array parameter inside the body of a
function *are* visible outside that function.  *This can be very
dangerous if you don't watch what you are doing.*  For example:

     function changeit (array, ind, nvalue) {
          array[ind] = nvalue
     }
     
     BEGIN {
                a[1] = 1 ; a[2] = 2 ; a[3] = 3
                changeit(a, 2, "two")
                printf "a[1] = %s, a[2] = %s, a[3] = %s\n", a[1], a[2], a[3]
           }

prints `a[1] = 1, a[2] = two, a[3] = 3', because calling `changeit'
stores `"two"' in the second element of `a'.


▶1f◀
File: gawk-info,  Node: Return Statement,  Prev: Function Caveats,  Up: User-defined

The `return' Statement
======================

The body of a user-defined function can contain a `return' statement.
This statement returns control to the rest of the `awk' program.  It
can also be used to return a value for use in the rest of the `awk'
program.  It looks like this:

     return EXPRESSION

The EXPRESSION part is optional.  If it is omitted, then the returned
value is undefined and, therefore, unpredictable.

A `return' statement with no value expression is assumed at the end
of every function definition.  So if control reaches the end of the
function definition, then the function returns an unpredictable value.

Here is an example of a user-defined function that returns a value
for the largest number among the elements of an array:

     function maxelt (vec,   i, ret) {
          for (i in vec) {
               if (ret == "" || vec[i] > ret)
                    ret = vec[i]
          }
          return ret
     }

You call `maxelt' with one argument, an array name.  The local
variables `i' and `ret' are not intended to be arguments; while there
is nothing to stop you from passing two or three arguments to
`maxelt', the results would be strange.  The extra space before `i'
in the function parameter list is to indicate that `i' and `ret' are
not supposed to be arguments.  This is a convention which you should
follow when you define functions.

Here is a program that uses our `maxelt' function.  It loads an
array, calls `maxelt', and then reports the maximum number in that
array:

     awk '
     function maxelt (vec,   i, ret) {
          for (i in vec) {
               if (ret == "" || vec[i] > ret)
                    ret = vec[i]
          }
          return ret
     }
     
     # Load all fields of each record into nums.
     {
               for(i = 1; i <= NF; i++)
                    nums[NR, i] = $i
     }
     
     END {
          print maxelt(nums)
     }'

Given the following input:

      1 5 23 8 16
     44 3 5 2 8 26
     256 291 1396 2962 100
     -6 467 998 1101
     99385 11 0 225

our program tells us (predictably) that:

     99385

is the largest number in our array.


▶1f◀
File: gawk-info,  Node: Built-in Variables,  Next: Command Line,  Prev: User-defined,  Up: Top

Built-in Variables
******************

Most `awk' variables are available for you to use for your own
purposes; they never change except when your program assigns them,
and never affect anything except when your program examines them.

A few variables have special built-in meanings.  Some of them `awk'
examines automatically, so that they enable you to tell `awk' how to
do certain things.  Others are set automatically by `awk', so that
they carry information from the internal workings of `awk' to your
program.

This chapter documents all the built-in variables of `gawk'.  Most of
them are also documented in the chapters where their areas of
activity are described.


* Menu:

* User-modified::  Built-in variables that you change to control `awk'.

* Auto-set::       Built-in variables where `awk' gives you information.

 
▶1f◀
File: gawk-info,  Node: User-modified,  Next: Auto-set,  Prev: Built-in Variables,  Up: Built-in Variables

Built-in Variables That Control `awk'
=====================================

This is a list of the variables which you can change to control how
`awk' does certain things.

`FS'
     `FS' is the input field separator (*note Field Separators::.). 
     The value is a regular expression that matches the separations
     between fields in an input record.

     The default value is `" "', a string consisting of a single
     space.  As a special exception, this value actually means that
     any sequence of spaces and tabs is a single separator.  It also
     causes spaces and tabs at the beginning or end of a line to be
     ignored.

     You can set the value of `FS' on the command line using the `-F'
     option:

          awk -F, 'PROGRAM' INPUT-FILES

`IGNORECASE'
     If `IGNORECASE' is nonzero, then *all* regular expression
     matching is done in a case-independent fashion.  In particular,
     regexp matching with `~' and `!~', and the `gsub' `index',
     `match', `split' and `sub' functions all ignore case when doing
     their particular regexp operations.  *Note:* since field
     splitting with the value of the `FS' variable is also a regular
     expression operation, that too is done with case ignored.  *Note
     Case-sensitivity::.

     If `gawk' is in compatibility mode (*note Command Line::.), then
     `IGNORECASE' has no special meaning, and regexp operations are
     always case-sensitive.

`OFMT'
     This string is used by `awk' to control conversion of numbers to
     strings (*note Conversion::.).  It works by being passed, in
     effect, as the first argument to the `sprintf' function.  Its
     default value is `"%.6g"'.

`OFS'
     This is the output field separator (*note Output Separators::.).
     It is output between the fields output by a `print' statement. 
     Its default value is `" "', a string consisting of a single space.

`ORS'
     This is the output record separator.  It is output at the end of
     every `print' statement.  Its default value is a string
     containing a single newline character, which could be written as
     `"\n"'.  (*Note Output Separators::).

`RS'
     This is `awk''s record separator.  Its default value is a string
     containing a single newline character, which means that an input
     record consists of a single line of text.  (*Note Records::.)

`SUBSEP'
     `SUBSEP' is a subscript separator.  It has the default value of
     `"\034"', and is used to separate the parts of the name of a
     multi-dimensional array.  Thus, if you access `foo[12,3]', it
     really accesses `foo["12\0343"]'.  (*Note Multi-dimensional::).


▶1f◀
File: gawk-info,  Node: Auto-set,  Prev: User-modified,  Up: Built-in Variables

Built-in Variables That Convey Information to You
=================================================

This is a list of the variables that are set automatically by `awk'
on certain occasions so as to provide information for your program.

`ARGC'
`ARGV'
     The command-line arguments available to `awk' are stored in an
     array called `ARGV'.  `ARGC' is the number of command-line
     arguments present.  `ARGV' is indexed from zero to `ARGC - 1'. 
     *Note Command Line::.  For example:

          awk '{ print ARGV[$1] }' inventory-shipped BBS-list

     In this example, `ARGV[0]' contains `"awk"', `ARGV[1]' contains
     `"inventory-shipped"', and `ARGV[2]' contains `"BBS-list"'.  The
     value of `ARGC' is 3, one more than the index of the last
     element in `ARGV' since the elements are numbered from zero.

     Notice that the `awk' program is not entered in `ARGV'.  The
     other special command line options, with their arguments, are
     also not entered.  But variable assignments on the command line
     *are* treated as arguments, and do show up in the `ARGV' array.

     Your program can alter `ARGC' and the elements of `ARGV'.  Each
     time `awk' reaches the end of an input file, it uses the next
     element of `ARGV' as the name of the next input file.  By
     storing a different string there, your program can change which
     files are read.  You can use `"-"' to represent the standard
     input.  By storing additional elements and incrementing `ARGC'
     you can cause additional files to be read.

     If you decrease the value of `ARGC', that eliminates input files
     from the end of the list.  By recording the old value of `ARGC'
     elsewhere, your program can treat the eliminated arguments as
     something other than file names.

     To eliminate a file from the middle of the list, store the null
     string (`""') into `ARGV' in place of the file's name.  As a
     special feature, `awk' ignores file names that have been
     replaced with the null string.

`ENVIRON'
     This is an array that contains the values of the environment. 
     The array indices are the environment variable names; the values
     are the values of the particular environment variables.  For
     example, `ENVIRON["HOME"]' might be `/u/close'.  Changing this
     array does not affect the environment passed on to any programs
     that `awk' may spawn via redirection or the `system' function. 
     (In a future version of `gawk', it may do so.)

     Some operating systems may not have environment variables.  On
     such systems, the array `ENVIRON' is empty.

`FILENAME'
     This is the name of the file that `awk' is currently reading. 
     If `awk' is reading from the standard input (in other words,
     there are no files listed on the command line), `FILENAME' is
     set to `"-"'.  `FILENAME' is changed each time a new file is
     read (*note Reading Files::.).

`FNR'
     `FNR' is the current record number in the current file.  `FNR'
     is incremented each time a new record is read (*note Getline::.).
     It is reinitialized to 0 each time a new input file is started.

`NF'
     `NF' is the number of fields in the current input record.  `NF'
     is set each time a new record is read, when a new field is
     created, or when `$0' changes (*note Fields::.).

`NR'
     This is the number of input records `awk' has processed since
     the beginning of the program's execution.  (*note Records::.). 
     `NR' is set each time a new record is read.

`RLENGTH'
     `RLENGTH' is the length of the substring matched by the `match'
     function (*note String Functions::.).  `RLENGTH' is set by
     invoking the `match' function.  Its value is the length of the
     matched string, or -1 if no match was found.

`RSTART'
     `RSTART' is the start-index of the substring matched by the
     `match' function (*note String Functions::.).  `RSTART' is set
     by invoking the `match' function.  Its value is the position of
     the string where the matched substring starts, or 0 if no match
     was found.


▶1f◀
File: gawk-info,  Node: Command Line,  Next: Language History,  Prev: Built-in Variables,  Up: Top

Invocation of `awk'
*******************

There are two ways to run `awk': with an explicit program, or with
one or more program files.  Here are templates for both of them;
items enclosed in `[...]' in these templates are optional.

     awk [`-FFS'] [`-v VAR=VAL'] [`-V'] [`-C'] [`-c'] [`-a'] [`-e'] [`--'] 'PROGRAM' FILE ...
     awk [`-FFS'] `-f SOURCE-FILE' [`-f SOURCE-FILE ...'] [`-v VAR=VAL'] [`-V'] [`-C'] [`-c'] [`-a'] [`-e'] [`--'] FILE ...

 
* Menu:

* Options::             Command line options and their meanings.
* Other Arguments::     Input file names and variable assignments.
* AWKPATH Variable::    Searching directories for `awk' programs.

 
▶1f◀
File: gawk-info,  Node: Options,  Next: Other Arguments,  Prev: Command Line,  Up: Command Line

Command Line Options
====================

Options begin with a minus sign, and consist of a single character. 
The options and their meanings are as follows:

`-FFS'
     Sets the `FS' variable to FS (*note Field Separators::.).

`-f SOURCE-FILE'
     Indicates that the `awk' program is to be found in SOURCE-FILE
     instead of in the first non-option argument.

`-v VAR=VAL'
     Sets the variable VAR to the value VAL *before* execution of the
     program begins.  Such variable values are available inside the
     `BEGIN' rule (see below for a fuller explanation).

     The `-v' option only has room to set one variable, but you can
     use it more than once, setting another variable each time, like
     this: `-v foo=1 -v bar=2'.

`-a'
     Specifies use of traditional `awk' syntax for regular expressions.
     This means that `\' can be used to quote any regular expression
     operators inside of square brackets, just as it can be outside
     of them.  This mode is currently the default; the `-a' option is
     useful in shell scripts so that they will not break if the
     default is changed.  *Note Regexp Operators::.

`-e'
     Specifies use of `egrep' syntax for regular expressions.  This
     means that `\' does not serve as a quoting character inside of
     square brackets; ideosyncratic techniques are needed to include
     various special characters within them.  This mode may become
     the default at some time in the future.  *Note Regexp Operators::.

`-c'
     Specifies "compatibility mode", in which the GNU extensions in
     `gawk' are disabled, so that `gawk' behaves just like Unix
     `awk'.  These extensions are noted below, where their usage is
     explained.  *Note Compatibility Mode::.

`-V'
     Prints version information for this particular copy of `gawk'. 
     This is so you can determine if your copy of `gawk' is up to
     date with respect to whatever the Free Software Foundation is
     currently distributing.  This option may disappear in a future
     version of `gawk'.

`-C'
     Prints the short version of the General Public License.  This
     option may disappear in a future version of `gawk'.

`--'
     Signals the end of the command line options.  The following
     arguments are not treated as options even if they begin with
     `-'.  This interpretation of `--' follows the POSIX argument
     parsing conventions.

     This is useful if you have file names that start with `-', or in
     shell scripts, if you have file names that will be specified by
     the user and that might start with `-'.

Any other options are flagged as invalid with a warning message, but
are otherwise ignored.

In compatibility mode, as a special case, if the value of FS supplied
to the `-F' option is `t', then `FS' is set to the tab character
(`"\t"').  Also, the `-C' and `-V' options are not recognized.

If the `-f' option is *not* used, then the first non-option command
line argument is expected to be the program text.

The `-f' option may be used more than once on the command line.  Then
`awk' reads its program source from all of the named files, as if
they had been concatenated together into one big file.  This is
useful for creating libraries of `awk' functions.  Useful functions
can be written once, and then retrieved from a standard place,
instead of having to be included into each individual program.  You
can still type in a program at the terminal and use library
functions, by specifying `-f /dev/tty'.  `awk' will read a file from
the terminal to use as part of the `awk' program.  After typing your
program, type `Control-d' (the end-of-file character) to terminate it.


▶1f◀
File: gawk-info,  Node: Other Arguments,  Next: AWKPATH Variable,  Prev: Options,  Up: Command Line

Other Command Line Arguments
============================

Any additional arguments on the command line are normally treated as
input files to be processed in the order specified.  However, an
argument that has the form `VAR=VALUE', means to assign the value
VALUE to the variable VAR--it does not specify a file at all.

All these arguments are made available to your `awk' program in the
`ARGV' array (*note Built-in Variables::.).  Command line options and
the program text (if present) are omitted from the `ARGV' array.  All
other arguments, including variable assignments, are included.

The distinction between file name arguments and variable-assignment
arguments is made when `awk' is about to open the next input file. 
At that point in execution, it checks the ``file name'' to see
whether it is really a variable assignment; if so, `awk' sets the
variable instead of reading a file.

Therefore, the variables actually receive the specified values after
all previously specified files have been read.  In particular, the
values of variables assigned in this fashion are *not* available
inside a `BEGIN' rule (*note BEGIN/END::.), since such rules are run
before `awk' begins scanning the argument list.

In some earlier implementations of `awk', when a variable assignment
occurred before any file names, the assignment would happen *before*
the `BEGIN' rule was executed.  Some applications came to depend upon
this ``feature''.  When `awk' was changed to be more consistent, the
`-v' option was added to accomodate applications that depended upon
this old behaviour.

The variable assignment feature is most useful for assigning to
variables such as `RS', `OFS', and `ORS', which control input and
output formats, before scanning the data files.  It is also useful
for controlling state if multiple passes are needed over a data file.
For example:

     awk 'pass == 1  { PASS 1 STUFF }
          pass == 2  { PASS 2 STUFF }' pass=1 datafile pass=2 datafile


▶1f◀
File: gawk-info,  Node: AWKPATH Variable,  Prev: Other Arguments,  Up: Command Line

The `AWKPATH' Environment Variable
==================================

The previous section described how `awk' program files can be named
on the command line with the `-f' option.  In some `awk'
implementations, you must supply a precise path name for each program
file, unless the file is in the current directory.

But in `gawk', if the file name supplied in the `-f' option does not
contain a `/', then `gawk' searches a list of directories (called the
"search path"), one by one, looking for a file with the specified name.

The search path is actually a string containing directory names
separated by colons.  `gawk' gets its search path from the `AWKPATH'
environment variable.  If that variable does not exist, `gawk' uses
the default path, which is `.:/usr/lib/awk:/usr/local/lib/awk'.

The search path feature is particularly useful for building up
libraries of useful `awk' functions.  The library files can be placed
in a standard directory that is in the default path, and then
specified on the command line with a short file name.  Otherwise, the
full file name would have to be typed for each file.

Path searching is not done if `gawk' is in compatibility mode.  *Note
Command Line::.

*Note:* if you want files in the current directory to be found, you
must include the current directory in the path, either by writing `.'
as an entry in the path, or by writing a null entry in the path.  (A
null entry is indicated by starting or ending the path with a colon,
or by placing two colons next to each other (`::').)  If the current
directory is not included in the path, then files cannot be found in
the current directory.  This path search mechanism is identical to
the shell's.


▶1f◀
File: gawk-info,  Node: Language History,  Next: Gawk Summary,  Prev: Command Line,  Up: Top

The Evolution of the `awk' Language
***********************************

This manual describes the GNU implementation of `awk', which is
patterned after the System V Release 4 version.  Many `awk' users are
only familiar with the original `awk' implementation in Version 7
Unix, which is also the basis for the version in Berkeley Unix.  This
chapter briefly describes the evolution of the `awk' language.


* Menu:

* V7/S5R3.1::   The major changes between V7 and System V Release 3.1.

* S5R4::        The minor changes between System V Releases 3.1 and 4.

* S5R4/GNU::    The extensions in `gawk' not in System V Release 4.

 
▶1f◀
File: gawk-info,  Node: V7/S5R3.1,  Next: S5R4,  Prev: Language History,  Up: Language History

Major Changes Between V7 and S5R3.1
===================================

The `awk' language evolved considerably between the release of
Version 7 Unix (1978) and the new version first made widely available
in System V Release 3.1 (1987).  This section summarizes the changes,
with cross-references to further details.

   * The requirement for `;' to separate rules on a line (*note
     Statements/Lines::.).

   * User-defined functions, and the `return' statement (*note
     User-defined::.).

   * The `delete' statement (*note Delete::.).

   * The `do'-`while' statement (*note Do Statement::.).

   * The built-in functions `atan2', `cos', `sin', `rand' and `srand'
     (*note Numeric Functions::.).

   * The built-in functions `gsub', `sub', and `match' (*note String
     Functions::.).

   * The built-in functions `close' and `system' (*note I/O
     Functions::.).

   * The `ARGC', `ARGV', `FNR', `RLENGTH', `RSTART', and `SUBSEP'
     built-in variables (*note Built-in Variables::.).

   * The conditional expression using the operators `?' and `:'
     (*note Conditional Exp::.).

   * The exponentiation operator `^' (*note Arithmetic Ops::.) and
     its assignment operator form `^=' (*note Assignment Ops::.).

   * C-compatible operator precedence, which breaks some old `awk'
     programs (*note Precedence::.).

   * Regexps as the value of `FS' (*note Field Separators::.), or as
     the third argument to the `split' function (*note String
     Functions::.).

   * Dynamic regexps as operands of the `~' and `!~' operators (*note
     Regexp Usage::.).

   * Escape sequences (*note Constants::.) in regexps.

   * The escape sequences `\b', `\f', and `\r' (*note Constants::.).

   * Redirection of input for the `getline' function (*note
     Getline::.).

   * Multiple `BEGIN' and `END' rules (*note BEGIN/END::.).

   * Simulation of multidimensional arrays (*note
     Multi-dimensional::.).


▶1f◀
File: gawk-info,  Node: S5R4,  Next: S5R4/GNU,  Prev: V7/S5R3.1,  Up: Language History

Minor Changes between S5R3.1 and S5R4
=====================================

The System V Release 4 version of Unix `awk' added these features:

   * The `ENVIRON' variable (*note Built-in Variables::.).

   * Multiple `-f' options on the command line (*note Command Line::.).

   * The `-v' option for assigning variables before program execution
     begins (*note Command Line::.).

   * The `--' option for terminating command line options.

   * The `\a', `\v', and `\x' escape sequences (*note Constants::.).

   * A defined return value for the `srand' built-in function (*note
     Numeric Functions::.).

   * The `toupper' and `tolower' built-in string functions for case
     translation (*note String Functions::.).

   * A cleaner specification for the `%c' format-control letter in
     the `printf' function (*note Printf::.).

   * The use of constant regexps such as `/foo/' as expressions,
     where they are equivalent to use of the matching operator, as in
     `$0 ~ /foo/'.


▶1f◀
File: gawk-info,  Node: S5R4/GNU,  Prev: S5R4,  Up: Language History

Extensions In `gawk' Not In S5R4
================================

The GNU implementation, `gawk', adds these features:

   * The `AWKPATH' environment variable for specifying a path search
     for the `-f' command line option (*note Command Line::.).

   * The `-C' and `-V' command line options (*note Command Line::.).

   * The `IGNORECASE' variable and its effects (*note
     Case-sensitivity::.).

   * The `/dev/stdin', `/dev/stdout', `/dev/stderr', and `/dev/fd/N'
     file name interpretation (*note Special Files::.).

   * The `-c' option to turn off these extensions (*note Command
     Line::.).

   * The `-a' and `-e' options to specify the syntax of regular
     expressions that `gawk' will accept (*note Command Line::.).


▶1f◀
File: gawk-info,  Node: Gawk Summary,  Next: Sample Program,  Prev: Language History,  Up: Top

`gawk' Summary
**************

This appendix provides a brief summary of the `gawk' command line and
the `awk' language.  It is designed to serve as ``quick reference.'' 
It is therefore terse, but complete.


* Menu:

* Command Line Summary::  Recapitulation of the command line.
* Language Summary::      A terse review of the language.
* Variables/Fields::      Variables, fields, and arrays.
* Rules Summary::         Patterns and Actions, and their component parts.
* Functions Summary::     Defining and calling functions.

 
▶1f◀
File: gawk-info,  Node: Command Line Summary,  Next: Language Summary,  Prev: Gawk Summary,  Up: Gawk Summary

Command Line Options Summary
============================

The command line consists of options to `gawk' itself, the `awk'
program text (if not supplied via the `-f' option), and values to be
made available in the `ARGC' and `ARGV' predefined `awk' variables:

     awk [`-FFS'] [`-v VAR=VAL'] [`-V'] [`-C'] [`-c'] [`-a'] [`-e'] [`--'] 'PROGRAM' FILE ...
     awk [`-FFS'] `-f SOURCE-FILE' [`-f SOURCE-FILE ...'] [`-v VAR=VAL'] [`-V'] [`-C'] [`-c'] [`-a'] [`-e'] [`--'] FILE ...

 The options that `gawk' accepts are:

`-FFS'
     Use FS for the input field separator (the value of the `FS'
     predefined variable).

`-f PROGRAM-FILE'
     Read the `awk' program source from the file PROGRAM-FILE,
     instead of from the first command line argument.

`-v VAR=VAL'
     Assign the variable VAR the value VAL before program execution
     begins.

`-a'
     Specifies use of traditional `awk' syntax for regular expressions.
     This means that `\' can be used to quote regular expression
     operators inside of square brackets, just as it can be outside
     of them.

`-e'
     Specifies use of `egrep' syntax for regular expressions.  This
     means that `\' does not serve as a quoting character inside of
     square brackets.

`-c'
     Specifies compatibility mode, in which `gawk' extensions are
     turned off.

`-V'
     Print version information for this particular copy of `gawk' on
     the error output.  This option may disappear in a future version
     of `gawk'.

`-C'
     Print the short version of the General Public License on the
     error output.  This option may disappear in a future version of
     `gawk'.

`--'
     Signal the end of options.  This is useful to allow further
     arguments to the `awk' program itself to start with a `-'.  This
     is mainly for consistency with the argument parsing conventions
     of POSIX.

Any other options are flagged as invalid, but are otherwise ignored. 
*Note Command Line::, for more details.


▶1f◀
File: gawk-info,  Node: Language Summary,  Next: Variables/Fields,  Prev: Command Line Summary,  Up: Gawk Summary

Language Summary
================

An `awk' program consists of a sequence of pattern-action statements
and optional function definitions.

     PATTERN    { ACTION STATEMENTS }
     
     function NAME(PARAMETER LIST)     { ACTION STATEMENTS }

`gawk' first reads the program source from the PROGRAM-FILE(s) if
specified, or from the first non-option argument on the command line.
The `-f' option may be used multiple times on the command line. 
`gawk' reads the program text from all the PROGRAM-FILE files,
effectively concatenating them in the order they are specified.  This
is useful for building libraries of `awk' functions, without having
to include them in each new `awk' program that uses them.  To use a
library function in a file from a program typed in on the command
line, specify `-f /dev/tty'; then type your program, and end it with
a `C-d'.  *Note Command Line::.

The environment variable `AWKPATH' specifies a search path to use
when finding source files named with the `-f' option.  If the
variable `AWKPATH' is not set, `gawk' uses the default path,
`.:/usr/lib/awk:/usr/local/lib/awk'.  If a file name given to the
`-f' option contains a `/' character, no path search is performed. 
*Note AWKPATH Variable::, for a full description of the `AWKPATH'
environment variable.

`gawk' compiles the program into an internal form, and then proceeds
to read each file named in the `ARGV' array.  If there are no files
named on the command line, `gawk' reads the standard input.

If a ``file'' named on the command line has the form `VAR=VAL', it is
treated as a variable assignment: the variable VAR is assigned the
value VAL.

For each line in the input, `gawk' tests to see if it matches any
PATTERN in the `awk' program.  For each pattern that the line
matches, the associated ACTION is executed.


▶1f◀
File: gawk-info,  Node: Variables/Fields,  Next: Rules Summary,  Prev: Language Summary,  Up: Gawk Summary

Variables and Fields
====================

`awk' variables are dynamic; they come into existence when they are
first used.  Their values are either floating-point numbers or strings.
`awk' also has one-dimension arrays; multiple-dimensional arrays may
be simulated.  There are several predefined variables that `awk' sets
as a program runs; these are summarized below.


* Menu:

* Fields Summary::      Input field splitting.
* Built-in Summary::    `awk''s built-in variables.
* Arrays Summary::      Using arrays.
* Data Type Summary::   Values in `awk' are numbers or strings.

 
▶1f◀
File: gawk-info,  Node: Fields Summary,  Next: Built-in Summary,  Prev: Variables/Fields,  Up: Variables/Fields

Fields
------

As each input line is read, `gawk' splits the line into FIELDS, using
the value of the `FS' variable as the field separator.  If `FS' is a
single character, fields are separated by that character.  Otherwise,
`FS' is expected to be a full regular expression.  In the special
case that `FS' is a single blank, fields are separated by runs of
blanks and/or tabs.  Note that the value of `IGNORECASE' (*note
Case-sensitivity::.) also affects how fields are split when `FS' is a
regular expression.

Each field in the input line may be referenced by its position, `$1',
`$2', and so on.  `$0' is the whole line.  The value of a field may
be assigned to as well.  Field numbers need not be constants:

     n = 5
     print $n

prints the fifth field in the input line.  The variable `NF' is set
to the total number of fields in the input line.

References to nonexistent fields (i.e., fields after `$NF') return
the null-string.  However, assigning to a nonexistent field (e.g.,
`$(NF+2) = 5') increases the value of `NF', creates any intervening
fields with the null string as their value, and causes the value of
`$0' to be recomputed, with the fields being separated by the value
of `OFS'.

*Note Reading Files::, for a full description of the way `awk'
defines and uses fields.