|
|
DataMuseum.dkPresents historical artifacts from the history of: DKUUG/EUUG Conference tapes |
This is an automatic "excavation" of a thematic subset of
See our Wiki for more about DKUUG/EUUG Conference tapes Excavated with: AutoArchaeologist - Free & Open Source Software. |
top - metrics - downloadIndex: T g
Length: 48221 (0xbc5d)
Types: TextFile
Names: »gawk-info-5«
└─⟦9ae75bfbd⟧ Bits:30007242 EUUGD3: Starter Kit
└─⟦f133efdaf⟧ »EurOpenD3/gnu/gawk/gawk-doc-2.11.tar.Z«
└─⟦a05ed705a⟧ Bits:30007078 DKUUG GNU 2/12/89
└─⟦f133efdaf⟧ »./gawk-doc-2.11.tar.Z«
└─⟦8f64183b0⟧
└─⟦this⟧ »gawk-2.11-doc/gawk-info-5«
Info file gawk-info, produced by Makeinfo, -*- Text -*- from input
file gawk.texinfo.
This file documents `awk', a program that you can use to select
particular records in a file and perform operations upon them.
Copyright (C) 1989 Free Software Foundation, Inc.
Permission is granted to make and distribute verbatim copies of this
manual provided the copyright notice and this permission notice are
preserved on all copies.
Permission is granted to copy and distribute modified versions of
this manual under the conditions for verbatim copying, provided that
the entire resulting derived work is distributed under the terms of a
permission notice identical to this one.
Permission is granted to copy and distribute translations of this
manual into another language, under the above conditions for modified
versions, except that this permission notice may be stated in a
translation approved by the Foundation.
▶1f◀
File: gawk-info, Node: String Functions, Next: I/O Functions, Prev: Numeric Functions, Up: Built-in
Built-in Functions for String Manipulation
==========================================
The functions in this section look at the text of one or more strings.
`index(IN, FIND)'
This searches the string IN for the first occurrence of the
string FIND, and returns the position where that occurrence
begins in the string IN. For example:
awk 'BEGIN { print index("peanut", "an") }'
prints `3'. If FIND is not found, `index' returns 0.
`length(STRING)'
This gives you the number of characters in STRING. If STRING is
a number, the length of the digit string representing that
number is returned. For example, `length("abcde")' is 5. By
contrast, `length(15 * 35)' works out to 3. How? Well, 15 * 35
= 525, and 525 is then converted to the string `"525"', which
has three characters.
If no argument is supplied, `length' returns the length of `$0'.
`match(STRING, REGEXP)'
The `match' function searches the string, STRING, for the
longest, leftmost substring matched by the regular expression,
REGEXP. It returns the character position, or "index", of where
that substring begins (1, if it starts at the beginning of
STRING). If no match if found, it returns 0.
The `match' function sets the built-in variable `RSTART' to the
index. It also sets the built-in variable `RLENGTH' to the
length of the matched substring. If no match is found, `RSTART'
is set to 0, and `RLENGTH' to -1.
For example:
awk '{
if ($1 == "FIND")
regex = $2
else {
where = match($0, regex)
if (where)
print "Match of", regex, "found at", where, "in", $0
}
}'
This program looks for lines that match the regular expression
stored in the variable `regex'. This regular expression can be
changed. If the first word on a line is `FIND', `regex' is
changed to be the second word on that line. Therefore, given:
FIND fo*bar
My program was a foobar
But none of it would doobar
FIND Melvin
JF+KM
This line is property of The Reality Engineering Co.
This file created by Melvin.
`awk' prints:
Match of fo*bar found at 18 in My program was a foobar
Match of Melvin found at 26 in This file created by Melvin.
`split(STRING, ARRAY, FIELDSEP)'
This divides STRING up into pieces separated by FIELDSEP, and
stores the pieces in ARRAY. The first piece is stored in
`ARRAY[1]', the second piece in `ARRAY[2]', and so forth. The
string value of the third argument, FIELDSEP, is used as a
regexp to search for to find the places to split STRING. If the
FIELDSEP is omitted, the value of `FS' is used. `split' returns
the number of elements created.
The `split' function, then, splits strings into pieces in a
manner similar to the way input lines are split into fields.
For example:
split("auto-da-fe", a, "-")
splits the string `auto-da-fe' into three fields using `-' as
the separator. It sets the contents of the array `a' as follows:
a[1] = "auto"
a[2] = "da"
a[3] = "fe"
The value returned by this call to `split' is 3.
`sprintf(FORMAT, EXPRESSION1,...)'
This returns (without printing) the string that `printf' would
have printed out with the same arguments (*note Printf::.). For
example:
sprintf("pi = %.2f (approx.)", 22/7)
returns the string `"pi = 3.14 (approx.)"'.
`sub(REGEXP, REPLACEMENT, TARGET)'
The `sub' function alters the value of TARGET. It searches this
value, which should be a string, for the leftmost substring
matched by the regular expression, REGEXP, extending this match
as far as possible. Then the entire string is changed by
replacing the matched text with REPLACEMENT. The modified
string becomes the new value of TARGET.
This function is peculiar because TARGET is not simply used to
compute a value, and not just any expression will do: it must be
a variable, field or array reference, so that `sub' can store a
modified value there. If this argument is omitted, then the
default is to use and alter `$0'.
For example:
str = "water, water, everywhere"
sub(/at/, "ith", str)
sets `str' to `"wither, water, everywhere"', by replacing the
leftmost, longest occurrence of `at' with `ith'.
The `sub' function returns the number of substitutions made
(either one or zero).
If the special character `&' appears in REPLACEMENT, it stands
for the precise substring that was matched by REGEXP. (If the
regexp can match more than one string, then this precise
substring may vary.) For example:
awk '{ sub(/candidate/, "& and his wife"); print }'
changes the first occurrence of `candidate' to `candidate and
his wife' on each input line.
The effect of this special character can be turned off by
putting a backslash before it in the string. As usual, to
insert one backslash in the string, you must write two
backslashes. Therefore, write `\\&' in a string constant to
include a literal `&' in the replacement. For example, here is
how to replace the first `|' on each line with an `&':
awk '{ sub(/\|/, "\\&"); print }'
*Note:* as mentioned above, the third argument to `sub' must be
an lvalue. Some versions of `awk' allow the third argument to
be an expression which is not an lvalue. In such a case, `sub'
would still search for the pattern and return 0 or 1, but the
result of the substitution (if any) would be thrown away because
there is no place to put it. Such versions of `awk' accept
expressions like this:
sub(/USA/, "United States", "the USA and Canada")
But that is considered erroneous in `gawk'.
`gsub(REGEXP, REPLACEMENT, TARGET)'
This is similar to the `sub' function, except `gsub' replaces
*all* of the longest, leftmost, *nonoverlapping* matching
substrings it can find. The `g' in `gsub' stands for
``global'', which means replace everywhere. For example:
awk '{ gsub(/Britain/, "United Kingdom"); print }'
replaces all occurrences of the string `Britain' with `United
Kingdom' for all input records.
The `gsub' function returns the number of substitutions made.
If the variable to be searched and altered, TARGET, is omitted,
then the entire input record, `$0', is used.
As in `sub', the characters `&' and `\' are special, and the
third argument must be an lvalue.
`substr(STRING, START, LENGTH)'
This returns a LENGTH-character-long substring of STRING,
starting at character number START. The first character of a
string is character number one. For example,
`substr("washington", 5, 3)' returns `"ing"'.
If LENGTH is not present, this function returns the whole suffix
of STRING that begins at character number START. For example,
`substr("washington", 5)' returns `"ington"'.
`tolower(STRING)'
This returns a copy of STRING, with each upper-case character in
the string replaced with its corresponding lower-case character.
Nonalphabetic characters are left unchanged. For example,
`tolower("MiXeD cAsE 123")' returns `"mixed case 123"'.
`toupper(STRING)'
This returns a copy of STRING, with each lower-case character in
the string replaced with its corresponding upper-case character.
Nonalphabetic characters are left unchanged. For example,
`toupper("MiXeD cAsE 123")' returns `"MIXED CASE 123"'.
▶1f◀
File: gawk-info, Node: I/O Functions, Prev: String Functions, Up: Built-in
Built-in Functions For Input/Output
===================================
`close(FILENAME)'
Close the file FILENAME, for input or output. The argument may
alternatively be a shell command that was used for redirecting
to or from a pipe; then the pipe is closed.
*Note Close Input::, regarding closing input files and pipes.
*Note Close Output::, regarding closing output files and pipes.
`system(COMMAND)'
The system function allows the user to execute operating system
commands and then return to the `awk' program. The `system'
function executes the command given by the string COMMAND. It
returns, as its value, the status returned by the command that
was executed.
For example, if the following fragment of code is put in your
`awk' program:
END {
system("mail -s 'awk run done' operator < /dev/null")
}
the system operator will be sent mail when the `awk' program
finishes processing input and begins its end-of-input processing.
Note that much the same result can be obtained by redirecting
`print' or `printf' into a pipe. However, if your `awk' program
is interactive, `system' is useful for cranking up large
self-contained programs, such as a shell or an editor.
Some operating systems cannot implement the `system' function.
`system' causes a fatal error if it is not supported.
▶1f◀
File: gawk-info, Node: User-defined, Next: Built-in Variables, Prev: Built-in, Up: Top
User-defined Functions
**********************
Complicated `awk' programs can often be simplified by defining your
own functions. User-defined functions can be called just like
built-in ones (*note Function Calls::.), but it is up to you to
define them--to tell `awk' what they should do.
* Menu:
* Definition Syntax:: How to write definitions and what they mean.
* Function Example:: An example function definition and what it does.
* Function Caveats:: Things to watch out for.
* Return Statement:: Specifying the value a function returns.
▶1f◀
File: gawk-info, Node: Definition Syntax, Next: Function Example, Prev: User-defined, Up: User-defined
Syntax of Function Definitions
==============================
Definitions of functions can appear anywhere between the rules of the
`awk' program. Thus, the general form of an `awk' program is
extended to include sequences of rules *and* user-defined function
definitions.
The definition of a function named NAME looks like this:
function NAME (PARAMETER-LIST) {
BODY-OF-FUNCTION
}
The keyword `function' may be abbreviated `func'.
NAME is the name of the function to be defined. A valid function
name is like a valid variable name: a sequence of letters, digits and
underscores, not starting with a digit.
PARAMETER-LIST is a list of the function's arguments and local
variable names, separated by commas. When the function is called,
the argument names are used to hold the argument values given in the
call. The local variables are initialized to the null string.
The BODY-OF-FUNCTION consists of `awk' statements. It is the most
important part of the definition, because it says what the function
should actually *do*. The argument names exist to give the body a
way to talk about the arguments; local variables, to give the body
places to keep temporary values.
Argument names are not distinguished syntactically from local
variable names; instead, the number of arguments supplied when the
function is called determines how many argument variables there are.
Thus, if three argument values are given, the first three names in
PARAMETER-LIST are arguments, and the rest are local variables.
It follows that if the number of arguments is not the same in all
calls to the function, some of the names in PARAMETER-LIST may be
arguments on some occasions and local variables on others. Another
way to think of this is that omitted arguments default to the null
string.
Usually when you write a function you know how many names you intend
to use for arguments and how many you intend to use as locals. By
convention, you should write an extra space between the arguments and
the locals, so that other people can follow how your function is
supposed to be used.
During execution of the function body, the arguments and local
variable values hide or "shadow" any variables of the same names used
in the rest of the program. The shadowed variables are not
accessible in the function definition, because there is no way to
name them while their names have been taken away for the local
variables. All other variables used in the `awk' program can be
referenced or set normally in the function definition.
The arguments and local variables last only as long as the function
body is executing. Once the body finishes, the shadowed variables
come back.
The function body can contain expressions which call functions. They
can even call this function, either directly or by way of another
function. When this happens, we say the function is "recursive".
There is no need in `awk' to put the definition of a function before
all uses of the function. This is because `awk' reads the entire
program before starting to execute any of it.
▶1f◀
File: gawk-info, Node: Function Example, Next: Function Caveats, Prev: Definition Syntax, Up: User-defined
Function Definition Example
===========================
Here is an example of a user-defined function, called `myprint', that
takes a number and prints it in a specific format.
function myprint(num)
{
printf "%6.3g\n", num
}
To illustrate, here is an `awk' rule which uses our `myprint' function:
$3 > 0 { myprint($3) }
This program prints, in our special format, all the third fields that
contain a positive number in our input. Therefore, when given:
1.2 3.4 5.6 7.8
9.10 11.12 13.14 15.16
17.18 19.20 21.22 23.24
this program, using our function to format the results, prints:
5.6
13.1
21.2
Here is a rather contrived example of a recursive function. It
prints a string backwards:
function rev (str, len) {
if (len == 0) {
printf "\n"
return
}
printf "%c", substr(str, len, 1)
rev(str, len - 1)
}
▶1f◀
File: gawk-info, Node: Function Caveats, Next: Return Statement, Prev: Function Example, Up: User-defined
Calling User-defined Functions
==============================
"Calling a function" means causing the function to run and do its job.
A function call is an expression, and its value is the value returned
by the function.
A function call consists of the function name followed by the
arguments in parentheses. What you write in the call for the
arguments are `awk' expressions; each time the call is executed,
these expressions are evaluated, and the values are the actual
arguments. For example, here is a call to `foo' with three arguments:
foo(x y, "lose", 4 * z)
*Note:* whitespace characters (spaces and tabs) are not allowed
between the function name and the open-parenthesis of the argument
list. If you write whitespace by mistake, `awk' might think that you
mean to concatenate a variable with an expression in parentheses.
However, it notices that you used a function name and not a variable
name, and reports an error.
When a function is called, it is given a *copy* of the values of its
arguments. This is called "call by value". The caller may use a
variable as the expression for the argument, but the called function
does not know this: all it knows is what value the argument had. For
example, if you write this code:
foo = "bar"
z = myfunc(foo)
then you should not think of the argument to `myfunc' as being ``the
variable `foo'''. Instead, think of the argument as the string
value, `"bar"'.
If the function `myfunc' alters the values of its local variables,
this has no effect on any other variables. In particular, if
`myfunc' does this:
function myfunc (win) {
print win
win = "zzz"
print win
}
to change its first argument variable `win', this *does not* change
the value of `foo' in the caller. The role of `foo' in calling
`myfunc' ended when its value, `"bar"', was computed. If `win' also
exists outside of `myfunc', the function body cannot alter this outer
value, because it is shadowed during the execution of `myfunc' and
cannot be seen or changed from there.
However, when arrays are the parameters to functions, they are *not*
copied. Instead, the array itself is made available for direct
manipulation by the function. This is usually called "call by
reference". Changes made to an array parameter inside the body of a
function *are* visible outside that function. *This can be very
dangerous if you don't watch what you are doing.* For example:
function changeit (array, ind, nvalue) {
array[ind] = nvalue
}
BEGIN {
a[1] = 1 ; a[2] = 2 ; a[3] = 3
changeit(a, 2, "two")
printf "a[1] = %s, a[2] = %s, a[3] = %s\n", a[1], a[2], a[3]
}
prints `a[1] = 1, a[2] = two, a[3] = 3', because calling `changeit'
stores `"two"' in the second element of `a'.
▶1f◀
File: gawk-info, Node: Return Statement, Prev: Function Caveats, Up: User-defined
The `return' Statement
======================
The body of a user-defined function can contain a `return' statement.
This statement returns control to the rest of the `awk' program. It
can also be used to return a value for use in the rest of the `awk'
program. It looks like this:
return EXPRESSION
The EXPRESSION part is optional. If it is omitted, then the returned
value is undefined and, therefore, unpredictable.
A `return' statement with no value expression is assumed at the end
of every function definition. So if control reaches the end of the
function definition, then the function returns an unpredictable value.
Here is an example of a user-defined function that returns a value
for the largest number among the elements of an array:
function maxelt (vec, i, ret) {
for (i in vec) {
if (ret == "" || vec[i] > ret)
ret = vec[i]
}
return ret
}
You call `maxelt' with one argument, an array name. The local
variables `i' and `ret' are not intended to be arguments; while there
is nothing to stop you from passing two or three arguments to
`maxelt', the results would be strange. The extra space before `i'
in the function parameter list is to indicate that `i' and `ret' are
not supposed to be arguments. This is a convention which you should
follow when you define functions.
Here is a program that uses our `maxelt' function. It loads an
array, calls `maxelt', and then reports the maximum number in that
array:
awk '
function maxelt (vec, i, ret) {
for (i in vec) {
if (ret == "" || vec[i] > ret)
ret = vec[i]
}
return ret
}
# Load all fields of each record into nums.
{
for(i = 1; i <= NF; i++)
nums[NR, i] = $i
}
END {
print maxelt(nums)
}'
Given the following input:
1 5 23 8 16
44 3 5 2 8 26
256 291 1396 2962 100
-6 467 998 1101
99385 11 0 225
our program tells us (predictably) that:
99385
is the largest number in our array.
▶1f◀
File: gawk-info, Node: Built-in Variables, Next: Command Line, Prev: User-defined, Up: Top
Built-in Variables
******************
Most `awk' variables are available for you to use for your own
purposes; they never change except when your program assigns them,
and never affect anything except when your program examines them.
A few variables have special built-in meanings. Some of them `awk'
examines automatically, so that they enable you to tell `awk' how to
do certain things. Others are set automatically by `awk', so that
they carry information from the internal workings of `awk' to your
program.
This chapter documents all the built-in variables of `gawk'. Most of
them are also documented in the chapters where their areas of
activity are described.
* Menu:
* User-modified:: Built-in variables that you change to control `awk'.
* Auto-set:: Built-in variables where `awk' gives you information.
▶1f◀
File: gawk-info, Node: User-modified, Next: Auto-set, Prev: Built-in Variables, Up: Built-in Variables
Built-in Variables That Control `awk'
=====================================
This is a list of the variables which you can change to control how
`awk' does certain things.
`FS'
`FS' is the input field separator (*note Field Separators::.).
The value is a regular expression that matches the separations
between fields in an input record.
The default value is `" "', a string consisting of a single
space. As a special exception, this value actually means that
any sequence of spaces and tabs is a single separator. It also
causes spaces and tabs at the beginning or end of a line to be
ignored.
You can set the value of `FS' on the command line using the `-F'
option:
awk -F, 'PROGRAM' INPUT-FILES
`IGNORECASE'
If `IGNORECASE' is nonzero, then *all* regular expression
matching is done in a case-independent fashion. In particular,
regexp matching with `~' and `!~', and the `gsub' `index',
`match', `split' and `sub' functions all ignore case when doing
their particular regexp operations. *Note:* since field
splitting with the value of the `FS' variable is also a regular
expression operation, that too is done with case ignored. *Note
Case-sensitivity::.
If `gawk' is in compatibility mode (*note Command Line::.), then
`IGNORECASE' has no special meaning, and regexp operations are
always case-sensitive.
`OFMT'
This string is used by `awk' to control conversion of numbers to
strings (*note Conversion::.). It works by being passed, in
effect, as the first argument to the `sprintf' function. Its
default value is `"%.6g"'.
`OFS'
This is the output field separator (*note Output Separators::.).
It is output between the fields output by a `print' statement.
Its default value is `" "', a string consisting of a single space.
`ORS'
This is the output record separator. It is output at the end of
every `print' statement. Its default value is a string
containing a single newline character, which could be written as
`"\n"'. (*Note Output Separators::).
`RS'
This is `awk''s record separator. Its default value is a string
containing a single newline character, which means that an input
record consists of a single line of text. (*Note Records::.)
`SUBSEP'
`SUBSEP' is a subscript separator. It has the default value of
`"\034"', and is used to separate the parts of the name of a
multi-dimensional array. Thus, if you access `foo[12,3]', it
really accesses `foo["12\0343"]'. (*Note Multi-dimensional::).
▶1f◀
File: gawk-info, Node: Auto-set, Prev: User-modified, Up: Built-in Variables
Built-in Variables That Convey Information to You
=================================================
This is a list of the variables that are set automatically by `awk'
on certain occasions so as to provide information for your program.
`ARGC'
`ARGV'
The command-line arguments available to `awk' are stored in an
array called `ARGV'. `ARGC' is the number of command-line
arguments present. `ARGV' is indexed from zero to `ARGC - 1'.
*Note Command Line::. For example:
awk '{ print ARGV[$1] }' inventory-shipped BBS-list
In this example, `ARGV[0]' contains `"awk"', `ARGV[1]' contains
`"inventory-shipped"', and `ARGV[2]' contains `"BBS-list"'. The
value of `ARGC' is 3, one more than the index of the last
element in `ARGV' since the elements are numbered from zero.
Notice that the `awk' program is not entered in `ARGV'. The
other special command line options, with their arguments, are
also not entered. But variable assignments on the command line
*are* treated as arguments, and do show up in the `ARGV' array.
Your program can alter `ARGC' and the elements of `ARGV'. Each
time `awk' reaches the end of an input file, it uses the next
element of `ARGV' as the name of the next input file. By
storing a different string there, your program can change which
files are read. You can use `"-"' to represent the standard
input. By storing additional elements and incrementing `ARGC'
you can cause additional files to be read.
If you decrease the value of `ARGC', that eliminates input files
from the end of the list. By recording the old value of `ARGC'
elsewhere, your program can treat the eliminated arguments as
something other than file names.
To eliminate a file from the middle of the list, store the null
string (`""') into `ARGV' in place of the file's name. As a
special feature, `awk' ignores file names that have been
replaced with the null string.
`ENVIRON'
This is an array that contains the values of the environment.
The array indices are the environment variable names; the values
are the values of the particular environment variables. For
example, `ENVIRON["HOME"]' might be `/u/close'. Changing this
array does not affect the environment passed on to any programs
that `awk' may spawn via redirection or the `system' function.
(In a future version of `gawk', it may do so.)
Some operating systems may not have environment variables. On
such systems, the array `ENVIRON' is empty.
`FILENAME'
This is the name of the file that `awk' is currently reading.
If `awk' is reading from the standard input (in other words,
there are no files listed on the command line), `FILENAME' is
set to `"-"'. `FILENAME' is changed each time a new file is
read (*note Reading Files::.).
`FNR'
`FNR' is the current record number in the current file. `FNR'
is incremented each time a new record is read (*note Getline::.).
It is reinitialized to 0 each time a new input file is started.
`NF'
`NF' is the number of fields in the current input record. `NF'
is set each time a new record is read, when a new field is
created, or when `$0' changes (*note Fields::.).
`NR'
This is the number of input records `awk' has processed since
the beginning of the program's execution. (*note Records::.).
`NR' is set each time a new record is read.
`RLENGTH'
`RLENGTH' is the length of the substring matched by the `match'
function (*note String Functions::.). `RLENGTH' is set by
invoking the `match' function. Its value is the length of the
matched string, or -1 if no match was found.
`RSTART'
`RSTART' is the start-index of the substring matched by the
`match' function (*note String Functions::.). `RSTART' is set
by invoking the `match' function. Its value is the position of
the string where the matched substring starts, or 0 if no match
was found.
▶1f◀
File: gawk-info, Node: Command Line, Next: Language History, Prev: Built-in Variables, Up: Top
Invocation of `awk'
*******************
There are two ways to run `awk': with an explicit program, or with
one or more program files. Here are templates for both of them;
items enclosed in `[...]' in these templates are optional.
awk [`-FFS'] [`-v VAR=VAL'] [`-V'] [`-C'] [`-c'] [`-a'] [`-e'] [`--'] 'PROGRAM' FILE ...
awk [`-FFS'] `-f SOURCE-FILE' [`-f SOURCE-FILE ...'] [`-v VAR=VAL'] [`-V'] [`-C'] [`-c'] [`-a'] [`-e'] [`--'] FILE ...
* Menu:
* Options:: Command line options and their meanings.
* Other Arguments:: Input file names and variable assignments.
* AWKPATH Variable:: Searching directories for `awk' programs.
▶1f◀
File: gawk-info, Node: Options, Next: Other Arguments, Prev: Command Line, Up: Command Line
Command Line Options
====================
Options begin with a minus sign, and consist of a single character.
The options and their meanings are as follows:
`-FFS'
Sets the `FS' variable to FS (*note Field Separators::.).
`-f SOURCE-FILE'
Indicates that the `awk' program is to be found in SOURCE-FILE
instead of in the first non-option argument.
`-v VAR=VAL'
Sets the variable VAR to the value VAL *before* execution of the
program begins. Such variable values are available inside the
`BEGIN' rule (see below for a fuller explanation).
The `-v' option only has room to set one variable, but you can
use it more than once, setting another variable each time, like
this: `-v foo=1 -v bar=2'.
`-a'
Specifies use of traditional `awk' syntax for regular expressions.
This means that `\' can be used to quote any regular expression
operators inside of square brackets, just as it can be outside
of them. This mode is currently the default; the `-a' option is
useful in shell scripts so that they will not break if the
default is changed. *Note Regexp Operators::.
`-e'
Specifies use of `egrep' syntax for regular expressions. This
means that `\' does not serve as a quoting character inside of
square brackets; ideosyncratic techniques are needed to include
various special characters within them. This mode may become
the default at some time in the future. *Note Regexp Operators::.
`-c'
Specifies "compatibility mode", in which the GNU extensions in
`gawk' are disabled, so that `gawk' behaves just like Unix
`awk'. These extensions are noted below, where their usage is
explained. *Note Compatibility Mode::.
`-V'
Prints version information for this particular copy of `gawk'.
This is so you can determine if your copy of `gawk' is up to
date with respect to whatever the Free Software Foundation is
currently distributing. This option may disappear in a future
version of `gawk'.
`-C'
Prints the short version of the General Public License. This
option may disappear in a future version of `gawk'.
`--'
Signals the end of the command line options. The following
arguments are not treated as options even if they begin with
`-'. This interpretation of `--' follows the POSIX argument
parsing conventions.
This is useful if you have file names that start with `-', or in
shell scripts, if you have file names that will be specified by
the user and that might start with `-'.
Any other options are flagged as invalid with a warning message, but
are otherwise ignored.
In compatibility mode, as a special case, if the value of FS supplied
to the `-F' option is `t', then `FS' is set to the tab character
(`"\t"'). Also, the `-C' and `-V' options are not recognized.
If the `-f' option is *not* used, then the first non-option command
line argument is expected to be the program text.
The `-f' option may be used more than once on the command line. Then
`awk' reads its program source from all of the named files, as if
they had been concatenated together into one big file. This is
useful for creating libraries of `awk' functions. Useful functions
can be written once, and then retrieved from a standard place,
instead of having to be included into each individual program. You
can still type in a program at the terminal and use library
functions, by specifying `-f /dev/tty'. `awk' will read a file from
the terminal to use as part of the `awk' program. After typing your
program, type `Control-d' (the end-of-file character) to terminate it.
▶1f◀
File: gawk-info, Node: Other Arguments, Next: AWKPATH Variable, Prev: Options, Up: Command Line
Other Command Line Arguments
============================
Any additional arguments on the command line are normally treated as
input files to be processed in the order specified. However, an
argument that has the form `VAR=VALUE', means to assign the value
VALUE to the variable VAR--it does not specify a file at all.
All these arguments are made available to your `awk' program in the
`ARGV' array (*note Built-in Variables::.). Command line options and
the program text (if present) are omitted from the `ARGV' array. All
other arguments, including variable assignments, are included.
The distinction between file name arguments and variable-assignment
arguments is made when `awk' is about to open the next input file.
At that point in execution, it checks the ``file name'' to see
whether it is really a variable assignment; if so, `awk' sets the
variable instead of reading a file.
Therefore, the variables actually receive the specified values after
all previously specified files have been read. In particular, the
values of variables assigned in this fashion are *not* available
inside a `BEGIN' rule (*note BEGIN/END::.), since such rules are run
before `awk' begins scanning the argument list.
In some earlier implementations of `awk', when a variable assignment
occurred before any file names, the assignment would happen *before*
the `BEGIN' rule was executed. Some applications came to depend upon
this ``feature''. When `awk' was changed to be more consistent, the
`-v' option was added to accomodate applications that depended upon
this old behaviour.
The variable assignment feature is most useful for assigning to
variables such as `RS', `OFS', and `ORS', which control input and
output formats, before scanning the data files. It is also useful
for controlling state if multiple passes are needed over a data file.
For example:
awk 'pass == 1 { PASS 1 STUFF }
pass == 2 { PASS 2 STUFF }' pass=1 datafile pass=2 datafile
▶1f◀
File: gawk-info, Node: AWKPATH Variable, Prev: Other Arguments, Up: Command Line
The `AWKPATH' Environment Variable
==================================
The previous section described how `awk' program files can be named
on the command line with the `-f' option. In some `awk'
implementations, you must supply a precise path name for each program
file, unless the file is in the current directory.
But in `gawk', if the file name supplied in the `-f' option does not
contain a `/', then `gawk' searches a list of directories (called the
"search path"), one by one, looking for a file with the specified name.
The search path is actually a string containing directory names
separated by colons. `gawk' gets its search path from the `AWKPATH'
environment variable. If that variable does not exist, `gawk' uses
the default path, which is `.:/usr/lib/awk:/usr/local/lib/awk'.
The search path feature is particularly useful for building up
libraries of useful `awk' functions. The library files can be placed
in a standard directory that is in the default path, and then
specified on the command line with a short file name. Otherwise, the
full file name would have to be typed for each file.
Path searching is not done if `gawk' is in compatibility mode. *Note
Command Line::.
*Note:* if you want files in the current directory to be found, you
must include the current directory in the path, either by writing `.'
as an entry in the path, or by writing a null entry in the path. (A
null entry is indicated by starting or ending the path with a colon,
or by placing two colons next to each other (`::').) If the current
directory is not included in the path, then files cannot be found in
the current directory. This path search mechanism is identical to
the shell's.
▶1f◀
File: gawk-info, Node: Language History, Next: Gawk Summary, Prev: Command Line, Up: Top
The Evolution of the `awk' Language
***********************************
This manual describes the GNU implementation of `awk', which is
patterned after the System V Release 4 version. Many `awk' users are
only familiar with the original `awk' implementation in Version 7
Unix, which is also the basis for the version in Berkeley Unix. This
chapter briefly describes the evolution of the `awk' language.
* Menu:
* V7/S5R3.1:: The major changes between V7 and System V Release 3.1.
* S5R4:: The minor changes between System V Releases 3.1 and 4.
* S5R4/GNU:: The extensions in `gawk' not in System V Release 4.
▶1f◀
File: gawk-info, Node: V7/S5R3.1, Next: S5R4, Prev: Language History, Up: Language History
Major Changes Between V7 and S5R3.1
===================================
The `awk' language evolved considerably between the release of
Version 7 Unix (1978) and the new version first made widely available
in System V Release 3.1 (1987). This section summarizes the changes,
with cross-references to further details.
* The requirement for `;' to separate rules on a line (*note
Statements/Lines::.).
* User-defined functions, and the `return' statement (*note
User-defined::.).
* The `delete' statement (*note Delete::.).
* The `do'-`while' statement (*note Do Statement::.).
* The built-in functions `atan2', `cos', `sin', `rand' and `srand'
(*note Numeric Functions::.).
* The built-in functions `gsub', `sub', and `match' (*note String
Functions::.).
* The built-in functions `close' and `system' (*note I/O
Functions::.).
* The `ARGC', `ARGV', `FNR', `RLENGTH', `RSTART', and `SUBSEP'
built-in variables (*note Built-in Variables::.).
* The conditional expression using the operators `?' and `:'
(*note Conditional Exp::.).
* The exponentiation operator `^' (*note Arithmetic Ops::.) and
its assignment operator form `^=' (*note Assignment Ops::.).
* C-compatible operator precedence, which breaks some old `awk'
programs (*note Precedence::.).
* Regexps as the value of `FS' (*note Field Separators::.), or as
the third argument to the `split' function (*note String
Functions::.).
* Dynamic regexps as operands of the `~' and `!~' operators (*note
Regexp Usage::.).
* Escape sequences (*note Constants::.) in regexps.
* The escape sequences `\b', `\f', and `\r' (*note Constants::.).
* Redirection of input for the `getline' function (*note
Getline::.).
* Multiple `BEGIN' and `END' rules (*note BEGIN/END::.).
* Simulation of multidimensional arrays (*note
Multi-dimensional::.).
▶1f◀
File: gawk-info, Node: S5R4, Next: S5R4/GNU, Prev: V7/S5R3.1, Up: Language History
Minor Changes between S5R3.1 and S5R4
=====================================
The System V Release 4 version of Unix `awk' added these features:
* The `ENVIRON' variable (*note Built-in Variables::.).
* Multiple `-f' options on the command line (*note Command Line::.).
* The `-v' option for assigning variables before program execution
begins (*note Command Line::.).
* The `--' option for terminating command line options.
* The `\a', `\v', and `\x' escape sequences (*note Constants::.).
* A defined return value for the `srand' built-in function (*note
Numeric Functions::.).
* The `toupper' and `tolower' built-in string functions for case
translation (*note String Functions::.).
* A cleaner specification for the `%c' format-control letter in
the `printf' function (*note Printf::.).
* The use of constant regexps such as `/foo/' as expressions,
where they are equivalent to use of the matching operator, as in
`$0 ~ /foo/'.
▶1f◀
File: gawk-info, Node: S5R4/GNU, Prev: S5R4, Up: Language History
Extensions In `gawk' Not In S5R4
================================
The GNU implementation, `gawk', adds these features:
* The `AWKPATH' environment variable for specifying a path search
for the `-f' command line option (*note Command Line::.).
* The `-C' and `-V' command line options (*note Command Line::.).
* The `IGNORECASE' variable and its effects (*note
Case-sensitivity::.).
* The `/dev/stdin', `/dev/stdout', `/dev/stderr', and `/dev/fd/N'
file name interpretation (*note Special Files::.).
* The `-c' option to turn off these extensions (*note Command
Line::.).
* The `-a' and `-e' options to specify the syntax of regular
expressions that `gawk' will accept (*note Command Line::.).
▶1f◀
File: gawk-info, Node: Gawk Summary, Next: Sample Program, Prev: Language History, Up: Top
`gawk' Summary
**************
This appendix provides a brief summary of the `gawk' command line and
the `awk' language. It is designed to serve as ``quick reference.''
It is therefore terse, but complete.
* Menu:
* Command Line Summary:: Recapitulation of the command line.
* Language Summary:: A terse review of the language.
* Variables/Fields:: Variables, fields, and arrays.
* Rules Summary:: Patterns and Actions, and their component parts.
* Functions Summary:: Defining and calling functions.
▶1f◀
File: gawk-info, Node: Command Line Summary, Next: Language Summary, Prev: Gawk Summary, Up: Gawk Summary
Command Line Options Summary
============================
The command line consists of options to `gawk' itself, the `awk'
program text (if not supplied via the `-f' option), and values to be
made available in the `ARGC' and `ARGV' predefined `awk' variables:
awk [`-FFS'] [`-v VAR=VAL'] [`-V'] [`-C'] [`-c'] [`-a'] [`-e'] [`--'] 'PROGRAM' FILE ...
awk [`-FFS'] `-f SOURCE-FILE' [`-f SOURCE-FILE ...'] [`-v VAR=VAL'] [`-V'] [`-C'] [`-c'] [`-a'] [`-e'] [`--'] FILE ...
The options that `gawk' accepts are:
`-FFS'
Use FS for the input field separator (the value of the `FS'
predefined variable).
`-f PROGRAM-FILE'
Read the `awk' program source from the file PROGRAM-FILE,
instead of from the first command line argument.
`-v VAR=VAL'
Assign the variable VAR the value VAL before program execution
begins.
`-a'
Specifies use of traditional `awk' syntax for regular expressions.
This means that `\' can be used to quote regular expression
operators inside of square brackets, just as it can be outside
of them.
`-e'
Specifies use of `egrep' syntax for regular expressions. This
means that `\' does not serve as a quoting character inside of
square brackets.
`-c'
Specifies compatibility mode, in which `gawk' extensions are
turned off.
`-V'
Print version information for this particular copy of `gawk' on
the error output. This option may disappear in a future version
of `gawk'.
`-C'
Print the short version of the General Public License on the
error output. This option may disappear in a future version of
`gawk'.
`--'
Signal the end of options. This is useful to allow further
arguments to the `awk' program itself to start with a `-'. This
is mainly for consistency with the argument parsing conventions
of POSIX.
Any other options are flagged as invalid, but are otherwise ignored.
*Note Command Line::, for more details.
▶1f◀
File: gawk-info, Node: Language Summary, Next: Variables/Fields, Prev: Command Line Summary, Up: Gawk Summary
Language Summary
================
An `awk' program consists of a sequence of pattern-action statements
and optional function definitions.
PATTERN { ACTION STATEMENTS }
function NAME(PARAMETER LIST) { ACTION STATEMENTS }
`gawk' first reads the program source from the PROGRAM-FILE(s) if
specified, or from the first non-option argument on the command line.
The `-f' option may be used multiple times on the command line.
`gawk' reads the program text from all the PROGRAM-FILE files,
effectively concatenating them in the order they are specified. This
is useful for building libraries of `awk' functions, without having
to include them in each new `awk' program that uses them. To use a
library function in a file from a program typed in on the command
line, specify `-f /dev/tty'; then type your program, and end it with
a `C-d'. *Note Command Line::.
The environment variable `AWKPATH' specifies a search path to use
when finding source files named with the `-f' option. If the
variable `AWKPATH' is not set, `gawk' uses the default path,
`.:/usr/lib/awk:/usr/local/lib/awk'. If a file name given to the
`-f' option contains a `/' character, no path search is performed.
*Note AWKPATH Variable::, for a full description of the `AWKPATH'
environment variable.
`gawk' compiles the program into an internal form, and then proceeds
to read each file named in the `ARGV' array. If there are no files
named on the command line, `gawk' reads the standard input.
If a ``file'' named on the command line has the form `VAR=VAL', it is
treated as a variable assignment: the variable VAR is assigned the
value VAL.
For each line in the input, `gawk' tests to see if it matches any
PATTERN in the `awk' program. For each pattern that the line
matches, the associated ACTION is executed.
▶1f◀
File: gawk-info, Node: Variables/Fields, Next: Rules Summary, Prev: Language Summary, Up: Gawk Summary
Variables and Fields
====================
`awk' variables are dynamic; they come into existence when they are
first used. Their values are either floating-point numbers or strings.
`awk' also has one-dimension arrays; multiple-dimensional arrays may
be simulated. There are several predefined variables that `awk' sets
as a program runs; these are summarized below.
* Menu:
* Fields Summary:: Input field splitting.
* Built-in Summary:: `awk''s built-in variables.
* Arrays Summary:: Using arrays.
* Data Type Summary:: Values in `awk' are numbers or strings.
▶1f◀
File: gawk-info, Node: Fields Summary, Next: Built-in Summary, Prev: Variables/Fields, Up: Variables/Fields
Fields
------
As each input line is read, `gawk' splits the line into FIELDS, using
the value of the `FS' variable as the field separator. If `FS' is a
single character, fields are separated by that character. Otherwise,
`FS' is expected to be a full regular expression. In the special
case that `FS' is a single blank, fields are separated by runs of
blanks and/or tabs. Note that the value of `IGNORECASE' (*note
Case-sensitivity::.) also affects how fields are split when `FS' is a
regular expression.
Each field in the input line may be referenced by its position, `$1',
`$2', and so on. `$0' is the whole line. The value of a field may
be assigned to as well. Field numbers need not be constants:
n = 5
print $n
prints the fifth field in the input line. The variable `NF' is set
to the total number of fields in the input line.
References to nonexistent fields (i.e., fields after `$NF') return
the null-string. However, assigning to a nonexistent field (e.g.,
`$(NF+2) = 5') increases the value of `NF', creates any intervening
fields with the null string as their value, and causes the value of
`$0' to be recomputed, with the fields being separated by the value
of `OFS'.
*Note Reading Files::, for a full description of the way `awk'
defines and uses fields.