DataMuseum.dk

Presents historical artifacts from the history of:

DKUUG/EUUG Conference tapes

This is an automatic "excavation" of a thematic subset of
artifacts from Datamuseum.dk's BitArchive.

See our Wiki for more about DKUUG/EUUG Conference tapes

Excavated with: AutoArchaeologist - Free & Open Source Software.


top - metrics - download
Index: T g

⟦bfa30052a⟧ TextFile

    Length: 50259 (0xc453)
    Types: TextFile
    Names: »gawk-info-1«

Derivation

└─⟦9ae75bfbd⟧ Bits:30007242 EUUGD3: Starter Kit
    └─⟦f133efdaf⟧ »EurOpenD3/gnu/gawk/gawk-doc-2.11.tar.Z« 
└─⟦a05ed705a⟧ Bits:30007078 DKUUG GNU 2/12/89
    └─⟦f133efdaf⟧ »./gawk-doc-2.11.tar.Z« 
        └─⟦8f64183b0⟧ 
            └─⟦this⟧ »gawk-2.11-doc/gawk-info-1« 

TextFile

Info file gawk-info, produced by Makeinfo, -*- Text -*- from input
file gawk.texinfo.

This file documents `awk', a program that you can use to select
particular records in a file and perform operations upon them.

Copyright (C) 1989 Free Software Foundation, Inc.

Permission is granted to make and distribute verbatim copies of this
manual provided the copyright notice and this permission notice are
preserved on all copies.

Permission is granted to copy and distribute modified versions of
this manual under the conditions for verbatim copying, provided that
the entire resulting derived work is distributed under the terms of a
permission notice identical to this one.

Permission is granted to copy and distribute translations of this
manual into another language, under the above conditions for modified
versions, except that this permission notice may be stated in a
translation approved by the Foundation.


▶1f◀
File: gawk-info,  Node: Top,  Next: Preface,  Prev: (dir),  Up: (dir)

This file documents `awk', a program that you can use to select
particular records in a file and perform operations upon them; it
contains the following chapters:


* Menu:

* Preface::            What you can do with `awk'; brief history
                       and acknowledgements.

* Copying::            Your right to copy and distribute `gawk'.

* This Manual::        Using this manual. 
                       Includes sample input files that you can use.
* Getting Started::    A basic introduction to using `awk'.
                       How to run an `awk' program.  Command line syntax.

* Reading Files::      How to read files and manipulate fields.

* Printing::           How to print using `awk'.  Describes the
                       `print' and `printf' statements.  
                       Also describes redirection of output.

* One-liners::         Short, sample `awk' programs.

* Patterns::           The various types of patterns explained in detail.

* Actions::            The various types of actions are introduced here.
                       Describes expressions and the various operators in
                       detail.  Also describes comparison expressions.

* Expressions::        Expressions are the basic building blocks of statements.

* Statements::         The various control statements are described in
                       detail.

* Arrays::             The description and use of arrays.  Also includes
                       array-oriented control statements.

* Built-in::           The built-in functions are summarized here.

* User-defined::       User-defined functions are described in detail.

* Var: Built-in Variables.  The built-in variables are summarized here.

* Command Line::       How to run `gawk'.

* Language History::   The evolution of the `awk' language.

* Gawk Summary::       `gawk' Options and Language Summary.

* Sample Program::     A sample `awk' program with a complete explanation.

* Notes::              Something about the implementation of `gawk'.

* Glossary::           An explanation of some unfamiliar terms.

* Index::


▶1f◀
File: gawk-info,  Node: Preface,  Next: Copying,  Prev: Top,  Up: Top

Preface
*******

If you are like many computer users, you frequently would like to
make changes in various text files wherever certain patterns appear,
or extract data from parts of certain lines while discarding the
rest.  To write a program to do this in a language such as C or
Pascal is a time-consuming inconvenience that may take many lines of
code.  The job may be easier with `awk'.

The `awk' utility interprets a special-purpose programming language
that makes it possible to handle simple data-reformatting jobs easily
with just a few lines of code.

The GNU implementation of `awk' is called `gawk'; it is fully upward
compatible with the System V Release 3.1 and later version of `awk'. 
All properly written `awk' programs should work with `gawk'.  So we
usually don't distinguish between `gawk' and other `awk'
implementations in this manual.

This manual teaches you what `awk' does and how you can use `awk'
effectively.  You should already be familiar with basic system
commands such as `ls'.  Using `awk' you can:

   * manage small, personal databases,

   * generate reports,

   * validate data,

   * produce indexes, and perform other document preparation tasks,

   * even experiment with algorithms that can be adapted later to
     other computer languages!


* Menu:

* History::  The history of `gawk' and `awk'.  Acknowledgements.

 
▶1f◀
File: gawk-info,  Node: History,  Prev: Preface,  Up: Preface

History of `awk' and `gawk'
===========================

The name `awk' comes from the initials of its designers: Alfred V. 
Aho, Peter J. Weinberger, and Brian W. Kernighan.  The original
version of `awk' was written in 1977.  In 1985 a new version made the
programming language more powerful, introducing user-defined
functions, multiple input streams, and computed regular expressions. 
This new version became generally available with System V Release 3.1.
The version in System V Release 4 added some new features and also
cleaned up the behaviour in some of the ``dark corners'' of the
language.

The GNU implementation, `gawk', was written in 1986 by Paul Rubin and
Jay Fenlason, with advice from Richard Stallman.  John Woods
contributed parts of the code as well.  In 1988 and 1989, David
Trueman, with help from Arnold Robbins, thoroughly reworked `gawk'
for compatibility with the newer `awk'.

Many people need to be thanked for their assistance in producing this
manual.  Jay Fenlason contributed many ideas and sample programs. 
Richard Mlynarik and Robert Chassell gave helpful comments on drafts
of this manual.  The paper ``A Supplemental Document for `awk''' by
John W.  Pierce of the Chemistry Department at UC San Diego,
pinpointed several issues relevant both to `awk' implementation and
to this manual, that would otherwise have escaped us.

Finally, we would like to thank Brian Kernighan of Bell Labs for
invaluable assistance during the testing and debugging of `gawk', and
for help in clarifying several points about the language.


▶1f◀
File: gawk-info,  Node: Copying,  Next: This Manual,  Prev: Preface,  Up: Top

GNU General Public License
**************************

                        Version 1, February 1989

     Copyright (C) 1989 Free Software Foundation, Inc.
     675 Mass Ave, Cambridge, MA 02139, USA
     
     Everyone is permitted to copy and distribute verbatim copies
     of this license document, but changing it is not allowed.

 Preamble
=========

  The license agreements of most software companies try to keep users
at the mercy of those companies.  By contrast, our General Public
License is intended to guarantee your freedom to share and change
free software--to make sure the software is free for all its users. 
The General Public License applies to the Free Software Foundation's
software and to any other program whose authors commit to using it. 
You can use it for your programs, too.

  When we speak of free software, we are referring to freedom, not
price.  Specifically, the General Public License is designed to make
sure that you have the freedom to give away or sell copies of free
software, that you receive source code or can get it if you want it,
that you can change the software or use pieces of it in new free
programs; and that you know you can do these things.

  To protect your rights, we need to make restrictions that forbid
anyone to deny you these rights or to ask you to surrender the rights.
These restrictions translate to certain responsibilities for you if
you distribute copies of the software, or if you modify it.

  For example, if you distribute copies of a such a program, whether
gratis or for a fee, you must give the recipients all the rights that
you have.  You must make sure that they, too, receive or can get the
source code.  And you must tell them their rights.

  We protect your rights with two steps: (1) copyright the software,
and (2) offer you this license which gives you legal permission to
copy, distribute and/or modify the software.

  Also, for each author's protection and ours, we want to make certain
that everyone understands that there is no warranty for this free
software.  If the software is modified by someone else and passed on,
we want its recipients to know that what they have is not the
original, so that any problems introduced by others will not reflect
on the original authors' reputations.

  The precise terms and conditions for copying, distribution and
modification follow.

                          TERMS AND CONDITIONS

  1. This License Agreement applies to any program or other work
     which contains a notice placed by the copyright holder saying it
     may be distributed under the terms of this General Public
     License.  The ``Program'', below, refers to any such program or
     work, and a ``work based on the Program'' means either the
     Program or any work containing the Program or a portion of it,
     either verbatim or with modifications.  Each licensee is
     addressed as ``you''.

  2. You may copy and distribute verbatim copies of the Program's
     source code as you receive it, in any medium, provided that you
     conspicuously and appropriately publish on each copy an
     appropriate copyright notice and disclaimer of warranty; keep
     intact all the notices that refer to this General Public License
     and to the absence of any warranty; and give any other
     recipients of the Program a copy of this General Public License
     along with the Program.  You may charge a fee for the physical
     act of transferring a copy.

  3. You may modify your copy or copies of the Program or any portion
     of it, and copy and distribute such modifications under the
     terms of Paragraph 1 above, provided that you also do the
     following:

        * cause the modified files to carry prominent notices stating
          that you changed the files and the date of any change; and

        * cause the whole of any work that you distribute or publish,
          that in whole or in part contains the Program or any part
          thereof, either with or without modifications, to be
          licensed at no charge to all third parties under the terms
          of this General Public License (except that you may choose
          to grant warranty protection to some or all third parties,
          at your option).

        * If the modified program normally reads commands
          interactively when run, you must cause it, when started
          running for such interactive use in the simplest and most
          usual way, to print or display an announcement including an
          appropriate copyright notice and a notice that there is no
          warranty (or else, saying that you provide a warranty) and
          that users may redistribute the program under these
          conditions, and telling the user how to view a copy of this
          General Public License.

        * You may charge a fee for the physical act of transferring a
          copy, and you may at your option offer warranty protection
          in exchange for a fee.

     Mere aggregation of another independent work with the Program
     (or its derivative) on a volume of a storage or distribution
     medium does not bring the other work under the scope of these
     terms.

  4. You may copy and distribute the Program (or a portion or
     derivative of it, under Paragraph 2) in object code or
     executable form under the terms of Paragraphs 1 and 2 above
     provided that you also do one of the following:

        * accompany it with the complete corresponding
          machine-readable source code, which must be distributed
          under the terms of Paragraphs 1 and 2 above; or,

        * accompany it with a written offer, valid for at least three
          years, to give any third party free (except for a nominal
          charge for the cost of distribution) a complete
          machine-readable copy of the corresponding source code, to
          be distributed under the terms of Paragraphs 1 and 2 above;
          or,

        * accompany it with the information you received as to where
          the corresponding source code may be obtained.  (This
          alternative is allowed only for noncommercial distribution
          and only if you received the program in object code or
          executable form alone.)

     Source code for a work means the preferred form of the work for
     making modifications to it.  For an executable file, complete
     source code means all the source code for all modules it
     contains; but, as a special exception, it need not include
     source code for modules which are standard libraries that
     accompany the operating system on which the executable file
     runs, or for standard header files or definitions files that
     accompany that operating system.

  5. You may not copy, modify, sublicense, distribute or transfer the
     Program except as expressly provided under this General Public
     License.  Any attempt otherwise to copy, modify, sublicense,
     distribute or transfer the Program is void, and will
     automatically terminate your rights to use the Program under
     this License.  However, parties who have received copies, or
     rights to use copies, from you under this General Public License
     will not have their licenses terminated so long as such parties
     remain in full compliance.

  6. By copying, distributing or modifying the Program (or any work
     based on the Program) you indicate your acceptance of this
     license to do so, and all its terms and conditions.

  7. Each time you redistribute the Program (or any work based on the
     Program), the recipient automatically receives a license from
     the original licensor to copy, distribute or modify the Program
     subject to these terms and conditions.  You may not impose any
     further restrictions on the recipients' exercise of the rights
     granted herein.

  8. The Free Software Foundation may publish revised and/or new
     versions of the General Public License from time to time.  Such
     new versions will be similar in spirit to the present version,
     but may differ in detail to address new problems or concerns.

     Each version is given a distinguishing version number.  If the
     Program specifies a version number of the license which applies
     to it and ``any later version'', you have the option of
     following the terms and conditions either of that version or of
     any later version published by the Free Software Foundation.  If
     the Program does not specify a version number of the license,
     you may choose any version ever published by the Free Software
     Foundation.

  9. If you wish to incorporate parts of the Program into other free
     programs whose distribution conditions are different, write to
     the author to ask for permission.  For software which is
     copyrighted by the Free Software Foundation, write to the Free
     Software Foundation; we sometimes make exceptions for this.  Our
     decision will be guided by the two goals of preserving the free
     status of all derivatives of our free software and of promoting
     the sharing and reuse of software generally.

                                   NO WARRANTY

 10. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO
     WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE
     LAW.  EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT
     HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM ``AS IS''
     WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED,
     INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
     MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.  THE
     ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS
     WITH YOU.  SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE
     COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION.

 11. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN
     WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY
     MODIFY AND/OR REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE
     LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL,
     INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR
     INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS
     OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY
     YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH
     ANY OTHER PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN
     ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.

                      END OF TERMS AND CONDITIONS

Appendix: Using These Terms in New Programs
===========================================

  If you develop a new program, and you want it to be of the greatest
possible use to humanity, the best way to achieve this is to make it
free software which everyone can redistribute and change under these
terms.

  To do so, attach the following notices to the program.  It is safest
to attach them to the start of each source file to most effectively
convey the exclusion of warranty; and each file should have at least
the ``copyright'' line and a pointer to where the full notice is found.

     ONE LINE TO GIVE THE PROGRAM'S NAME AND A BRIEF IDEA OF WHAT IT DOES.
     Copyright (C) 19YY  NAME OF AUTHOR
     
     This program is free software; you can redistribute it and/or modify
     it under the terms of the GNU General Public License as published by
     the Free Software Foundation; either version 1, or (at your option)
     any later version.
     
     This program is distributed in the hope that it will be useful,
     but WITHOUT ANY WARRANTY; without even the implied warranty of
     MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
     GNU General Public License for more details.
     
     You should have received a copy of the GNU General Public License
     along with this program; if not, write to the Free Software
     Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.

 Also add information on how to contact you by electronic and paper
mail.

If the program is interactive, make it output a short notice like
this when it starts in an interactive mode:

     Gnomovision version 69, Copyright (C) 19YY NAME OF AUTHOR
     Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
     This is free software, and you are welcome to redistribute it
     under certain conditions; type `show c' for details.

 The hypothetical commands `show w' and `show c' should show the
appropriate parts of the General Public License.  Of course, the
commands you use may be called something other than `show w' and
`show c'; they could even be mouse-clicks or menu items--whatever
suits your program.

You should also get your employer (if you work as a programmer) or
your school, if any, to sign a ``copyright disclaimer'' for the
program, if necessary.  Here a sample; alter the names:

     Yoyodyne, Inc., hereby disclaims all copyright interest in the
     program `Gnomovision' (a program to direct compilers to make passes
     at assemblers) written by James Hacker.
     
     SIGNATURE OF TY COON, 1 April 1989
     Ty Coon, President of Vice

That's all there is to it!


▶1f◀
File: gawk-info,  Node: This Manual,  Next: Getting Started,  Prev: Copying,  Up: Top

Using This Manual
*****************

The term `gawk' refers to a particular program (a version of `awk',
developed as part the GNU project), and to the language you use to
tell this program what to do.  When we need to be careful, we call
the program ``the `awk' utility'' and the language ``the `awk'
language''.  The purpose of this manual is to explain the `awk'
language and how to run the `awk' utility.

The term "`awk' program" refers to a program written by you in the
`awk' programming language.

*Note Getting Started::, for the bare essentials you need to know to
start using `awk'.

Some useful ``one-liners'' are included to give you a feel for the
`awk' language (*note One-liners::.).

A sizable sample `awk' program has been provided for you (*note
Sample Program::.).

If you find terms that you aren't familiar with, try looking them up
in the glossary (*note Glossary::.).

Most of the time complete `awk' programs are used as examples, but in
some of the more advanced sections, only the part of the `awk'
program that illustrates the concept being described is shown.


* Menu:

This chapter contains the following sections:

* Sample Data Files::  Sample data files for use in the `awk' programs
                       illustrated in this manual.

 
▶1f◀
File: gawk-info,  Node: Sample Data Files,  Prev: This Manual,  Up: This Manual

Data Files for the Examples
===========================

Many of the examples in this manual take their input from two sample
data files.  The first, called `BBS-list', represents a list of
computer bulletin board systems and information about those systems. 
The second data file, called `inventory-shipped', contains
information about shipments on a monthly basis.  Each line of these
files is one "record".

In the file `BBS-list', each record contains the name of a computer
bulletin board, its phone number, the board's baud rate, and a code
for the number of hours it is operational.  An `A' in the last column
means the board operates 24 hours all week.  A `B' in the last column
means the board operates evening and weekend hours, only.  A `C'
means the board operates only on weekends.

     aardvark     555-5553     1200/300          B
     alpo-net     555-3412     2400/1200/300     A
     barfly       555-7685     1200/300          A
     bites        555-1675     2400/1200/300     A
     camelot      555-0542     300               C
     core         555-2912     1200/300          C
     fooey        555-1234     2400/1200/300     B
     foot         555-6699     1200/300          B
     macfoo       555-6480     1200/300          A
     sdace        555-3430     2400/1200/300     A
     sabafoo      555-2127     1200/300          C

The second data file, called `inventory-shipped', represents
information about shipments during the year.  Each line of this file
is also one record.  Each record contains the month of the year, the
number of green crates shipped, the number of red boxes shipped, the
number of orange bags shipped, and the number of blue packages
shipped, respectively.  There are 16 entries, covering the 12 months
of one year and 4 months of the next year.

     Jan  13  25  15 115
     Feb  15  32  24 226
     Mar  15  24  34 228
     Apr  31  52  63 420
     May  16  34  29 208
     Jun  31  42  75 492
     Jul  24  34  67 436
     Aug  15  34  47 316
     Sep  13  55  37 277
     Oct  29  54  68 525
     Nov  20  87  82 577
     Dec  17  35  61 401
     
     Jan  21  36  64 620
     Feb  26  58  80 652
     Mar  24  75  70 495
     Apr  21  70  74 514

If you are reading this in GNU Emacs using Info, you can copy the
regions of text showing these sample files into your own test files. 
This way you can try out the examples shown in the remainder of this
document.  You do this by using the command `M-x write-region' to
copy text from the Info file into a file for use with `awk' (see your
``GNU Emacs Manual'' for more information).  Using this information,
create your own `BBS-list' and `inventory-shipped' files, and
practice what you learn in this manual.


▶1f◀
File: gawk-info,  Node: Getting Started,  Next: Reading Files,  Prev: This Manual,  Up: Top

Getting Started With `awk'
**************************

The basic function of `awk' is to search files for lines (or other
units of text) that contain certain patterns.  When a line matches
one of the patterns, `awk' performs specified actions on that line. 
`awk' keeps processing input lines in this way until the end of the
input file is reached.

When you run `awk', you specify an `awk' "program" which tells `awk'
what to do.  The program consists of a series of "rules".  (It may
also contain "function definitions", but that is an advanced feature,
so let's ignore it for now.  *Note User-defined::.)  Each rule
specifies one pattern to search for, and one action to perform when
that pattern is found.

Syntactically, a rule consists of a pattern followed by an action. 
The action is enclosed in curly braces to separate it from the pattern.
Rules are usually separated by newlines.  Therefore, an `awk' program
looks like this:

     PATTERN { ACTION }
     PATTERN { ACTION }
     ...

 
* Menu:

* Very Simple::      A very simple example.
* Two Rules::        A less simple one-line example with two rules.
* More Complex::     A more complex example.
* Running gawk::     How to run `gawk' programs; includes command line syntax.
* Comments::         Adding documentation to `gawk' programs.
* Statements/Lines:: Subdividing or combining statements into lines.
* When::             When to use `gawk' and when to use other things.

 
▶1f◀
File: gawk-info,  Node: Very Simple,  Next: Two Rules,  Prev: Getting Started,  Up: Getting Started

A Very Simple Example
=====================

The following command runs a simple `awk' program that searches the
input file `BBS-list' for the string of characters: `foo'.  (A string
of characters is usually called, quite simply, a "string".  The term
"string" is perhaps based on similar usage in English, such as ``a
string of pearls,'' or, ``a string of cars in a train.'')

     awk '/foo/ { print $0 }' BBS-list

When lines containing `foo' are found, they are printed, because
`print $0' means print the current line.  (Just `print' by itself
also means the same thing, so we could have written that instead.)

You will notice that slashes, `/', surround the string `foo' in the
actual `awk' program.  The slashes indicate that `foo' is a pattern
to search for.  This type of pattern is called a "regular
expression", and is covered in more detail later (*note Regexp::.). 
There are single-quotes around the `awk' program so that the shell
won't interpret any of it as special shell characters.

Here is what this program prints:

     fooey        555-1234     2400/1200/300     B
     foot         555-6699     1200/300          B
     macfoo       555-6480     1200/300          A
     sabafoo      555-2127     1200/300          C

In an `awk' rule, either the pattern or the action can be omitted,
but not both.  If the pattern is omitted, then the action is
performed for *every* input line.  If the action is omitted, the
default action is to print all lines that match the pattern.

Thus, we could leave out the action (the `print' statement and the
curly braces) in the above example, and the result would be the same:
all lines matching the pattern `foo' would be printed.  By
comparison, omitting the `print' statement but retaining the curly
braces makes an empty action that does nothing; then no lines would
be printed.


▶1f◀
File: gawk-info,  Node: Two Rules,  Next: More Complex,  Prev: Very Simple,  Up: Getting Started

An Example with Two Rules
=========================

The `awk' utility reads the input files one line at a time.  For each
line, `awk' tries the patterns of all the rules.  If several patterns
match then several actions are run, in the order in which they appear
in the `awk' program.  If no patterns match, then no actions are run.

After processing all the rules (perhaps none) that match the line,
`awk' reads the next line (however, *note Next Statement::.).  This
continues until the end of the file is reached.

For example, the `awk' program:

     /12/  { print $0 }
     /21/  { print $0 }

contains two rules.  The first rule has the string `12' as the
pattern and `print $0' as the action.  The second rule has the string
`21' as the pattern and also has `print $0' as the action.  Each
rule's action is enclosed in its own pair of braces.

This `awk' program prints every line that contains the string `12'
*or* the string `21'.  If a line contains both strings, it is printed
twice, once by each rule.

If we run this program on our two sample data files, `BBS-list' and
`inventory-shipped', as shown here:

     awk '/12/ { print $0 }
          /21/ { print $0 }' BBS-list inventory-shipped

we get the following output:

     aardvark     555-5553     1200/300          B
     alpo-net     555-3412     2400/1200/300     A
     barfly       555-7685     1200/300          A
     bites        555-1675     2400/1200/300     A
     core         555-2912     1200/300          C
     fooey        555-1234     2400/1200/300     B
     foot         555-6699     1200/300          B
     macfoo       555-6480     1200/300          A
     sdace        555-3430     2400/1200/300     A
     sabafoo      555-2127     1200/300          C
     sabafoo      555-2127     1200/300          C
     Jan  21  36  64 620
     Apr  21  70  74 514

Note how the line in `BBS-list' beginning with `sabafoo' was printed
twice, once for each rule.


▶1f◀
File: gawk-info,  Node: More Complex,  Next: Running gawk,  Prev: Two Rules,  Up: Getting Started

A More Complex Example
======================

Here is an example to give you an idea of what typical `awk' programs
do.  This example shows how `awk' can be used to summarize, select,
and rearrange the output of another utility.  It uses features that
haven't been covered yet, so don't worry if you don't understand all
the details.

     ls -l | awk '$5 == "Nov" { sum += $4 }
                  END { print sum }'

This command prints the total number of bytes in all the files in the
current directory that were last modified in November (of any year). 
(In the C shell you would need to type a semicolon and then a
backslash at the end of the first line; in the Bourne shell or the
Bourne-Again shell, you can type the example as shown.)

The `ls -l' part of this example is a command that gives you a full
listing of all the files in a directory, including file size and date.
Its output looks like this:

     -rw-r--r--  1 close        1933 Nov  7 13:05 Makefile
     -rw-r--r--  1 close       10809 Nov  7 13:03 gawk.h
     -rw-r--r--  1 close         983 Apr 13 12:14 gawk.tab.h
     -rw-r--r--  1 close       31869 Jun 15 12:20 gawk.y
     -rw-r--r--  1 close       22414 Nov  7 13:03 gawk1.c
     -rw-r--r--  1 close       37455 Nov  7 13:03 gawk2.c
     -rw-r--r--  1 close       27511 Dec  9 13:07 gawk3.c
     -rw-r--r--  1 close        7989 Nov  7 13:03 gawk4.c

The first field contains read-write permissions, the second field
contains the number of links to the file, and the third field
identifies the owner of the file.  The fourth field contains the size
of the file in bytes.  The fifth, sixth, and seventh fields contain
the month, day, and time, respectively, that the file was last
modified.  Finally, the eighth field contains the name of the file.

The `$5 == "Nov"' in our `awk' program is an expression that tests
whether the fifth field of the output from `ls -l' matches the string
`Nov'.  Each time a line has the string `Nov' in its fifth field, the
action `{ sum += $4 }' is performed.  This adds the fourth field (the
file size) to the variable `sum'.  As a result, when `awk' has
finished reading all the input lines, `sum' is the sum of the sizes
of files whose lines matched the pattern.

After the last line of output from `ls' has been processed, the `END'
rule is executed, and the value of `sum' is printed.  In this
example, the value of `sum' would be 80600.

These more advanced `awk' techniques are covered in later sections
(*note Actions::.).  Before you can move on to more advanced `awk'
programming, you have to know how `awk' interprets your input and
displays your output.  By manipulating fields and using `print'
statements, you can produce some very useful and spectacular looking
reports.


▶1f◀
File: gawk-info,  Node: Running gawk,  Next: Comments,  Prev: More Complex,  Up: Getting Started

How to Run `awk' Programs
=========================

There are several ways to run an `awk' program.  If the program is
short, it is easiest to include it in the command that runs `awk',
like this:

     awk 'PROGRAM' INPUT-FILE1 INPUT-FILE2 ...

 where PROGRAM consists of a series of patterns and actions, as
described earlier.

When the program is long, you would probably prefer to put it in a
file and run it with a command like this:

     awk -f PROGRAM-FILE INPUT-FILE1 INPUT-FILE2 ...

 
* Menu:

* One-shot::            Running a short throw-away `awk' program.
* Read Terminal::       Using no input files (input from terminal instead).
* Long::                Putting permanent `awk' programs in files.
* Executable Scripts::  Making self-contained `awk' programs.

 
▶1f◀
File: gawk-info,  Node: One-shot,  Next: Read Terminal,  Prev: Running gawk,  Up: Running gawk

One-shot Throw-away `awk' Programs
----------------------------------

Once you are familiar with `awk', you will often type simple programs
at the moment you want to use them.  Then you can write the program
as the first argument of the `awk' command, like this:

     awk 'PROGRAM' INPUT-FILE1 INPUT-FILE2 ...

 where PROGRAM consists of a series of PATTERNS and ACTIONS, as
described earlier.

This command format tells the shell to start `awk' and use the
PROGRAM to process records in the input file(s).  There are single
quotes around the PROGRAM so that the shell doesn't interpret any
`awk' characters as special shell characters.  They cause the shell
to treat all of PROGRAM as a single argument for `awk'.  They also
allow PROGRAM to be more than one line long.

This format is also useful for running short or medium-sized `awk'
programs from shell scripts, because it avoids the need for a
separate file for the `awk' program.  A self-contained shell script
is more reliable since there are no other files to misplace.


▶1f◀
File: gawk-info,  Node: Read Terminal,  Next: Long,  Prev: One-shot,  Up: Running gawk

Running `awk' without Input Files
---------------------------------

You can also use `awk' without any input files.  If you type the
command line:

     awk 'PROGRAM'

then `awk' applies the PROGRAM to the "standard input", which usually
means whatever you type on the terminal.  This continues until you
indicate end-of-file by typing `Control-d'.

For example, if you execute this command:

     awk '/th/'

whatever you type next is taken as data for that `awk' program.  If
you go on to type the following data:

     Kathy
     Ben
     Tom
     Beth
     Seth
     Karen
     Thomas
     `Control-d'

then `awk' prints this output:

     Kathy
     Beth
     Seth

as matching the pattern `th'.  Notice that it did not recognize
`Thomas' as matching the pattern.  The `awk' language is "case
sensitive", and matches patterns exactly.  (However, you can override
this with the variable `IGNORECASE'.  *Note Case-sensitivity::.)


▶1f◀
File: gawk-info,  Node: Long,  Next: Executable Scripts,  Prev: Read Terminal,  Up: Running gawk

Running Long Programs
---------------------

Sometimes your `awk' programs can be very long.  In this case it is
more convenient to put the program into a separate file.  To tell
`awk' to use that file for its program, you type:

     awk -f SOURCE-FILE INPUT-FILE1 INPUT-FILE2 ...

 The `-f' tells the `awk' utility to get the `awk' program from the
file SOURCE-FILE.  Any file name can be used for SOURCE-FILE.  For
example, you could put the program:

     /th/

into the file `th-prog'.  Then this command:

     awk -f th-prog

does the same thing as this one:

     awk '/th/'

which was explained earlier (*note Read Terminal::.).  Note that you
don't usually need single quotes around the file name that you
specify with `-f', because most file names don't contain any of the
shell's special characters.

If you want to identify your `awk' program files clearly as such, you
can add the extension `.awk' to the file name.  This doesn't affect
the execution of the `awk' program, but it does make ``housekeeping''
easier.


▶1f◀
File: gawk-info,  Node: Executable Scripts,  Prev: Long,  Up: Running gawk

Executable `awk' Programs
-------------------------

Once you have learned `awk', you may want to write self-contained
`awk' scripts, using the `#!' script mechanism.  You can do this on
BSD Unix systems and (someday) on GNU.

For example, you could create a text file named `hello', containing
the following (where `BEGIN' is a feature we have not yet discussed):

     #! /bin/awk -f
     
     # a sample awk program
     BEGIN    { print "hello, world" }

After making this file executable (with the `chmod' command), you can
simply type:

     hello

at the shell, and the system will arrange to run `awk' as if you had
typed:

     awk -f hello

Self-contained `awk' scripts are useful when you want to write a
program which users can invoke without knowing that the program is
written in `awk'.

If your system does not support the `#!' mechanism, you can get a
similar effect using a regular shell script.  It would look something
like this:

     : The colon makes sure this script is executed by the Bourne shell.
     awk 'PROGRAM' "$@"

Using this technique, it is *vital* to enclose the PROGRAM in single
quotes to protect it from interpretation by the shell.  If you omit
the quotes, only a shell wizard can predict the result.

The `"$@"' causes the shell to forward all the command line arguments
to the `awk' program, without interpretation.  The first line, which
starts with a colon, is used so that this shell script will work even
if invoked by a user who uses the C shell.


▶1f◀
File: gawk-info,  Node: Comments,  Next: Statements/Lines,  Prev: Running gawk,  Up: Getting Started

Comments in `awk' Programs
==========================

A "comment" is some text that is included in a program for the sake
of human readers, and that is not really part of the program. 
Comments can explain what the program does, and how it works.  Nearly
all programming languages have provisions for comments, because
programs are hard to understand without their extra help.

In the `awk' language, a comment starts with the sharp sign
character, `#', and continues to the end of the line.  The `awk'
language ignores the rest of a line following a sharp sign.  For
example, we could have put the following into `th-prog':

     # This program finds records containing the pattern `th'.  This is how
     # you continue comments on additional lines.
     /th/

You can put comment lines into keyboard-composed throw-away `awk'
programs also, but this usually isn't very useful; the purpose of a
comment is to help you or another person understand the program at
another time.


▶1f◀
File: gawk-info,  Node: Statements/Lines,  Next: When,  Prev: Comments,  Up: Getting Started

`awk' Statements versus Lines
=============================

Most often, each line in an `awk' program is a separate statement or
separate rule, like this:

     awk '/12/  { print $0 }
          /21/  { print $0 }' BBS-list inventory-shipped

But sometimes statements can be more than one line, and lines can
contain several statements.  You can split a statement into multiple
lines by inserting a newline after any of the following:

     ,    {    ?    :    ||    &&    do    else

A newline at any other point is considered the end of the statement.

If you would like to split a single statement into two lines at a
point where a newline would terminate it, you can "continue" it by
ending the first line with a backslash character, `\'.  This is
allowed absolutely anywhere in the statement, even in the middle of a
string or regular expression.  For example:

     awk '/This program is too long, so continue it\
      on the next line/ { print $1 }'

We have generally not used backslash continuation in the sample
programs in this manual.  Since there is no limit on the length of a
line, it is never strictly necessary; it just makes programs
prettier.  We have preferred to make them even more pretty by keeping
the statements short.  Backslash continuation is most useful when
your `awk' program is in a separate source file, instead of typed in
on the command line.

*Warning: backslash continuation does not work as described above
with the C shell.*  Continuation with backslash works for `awk'
programs in files, and also for one-shot programs *provided* you are
using the Bourne shell or the Bourne-again shell.  But the C shell
used on Berkeley Unix behaves differently!  There, you must use two
backslashes in a row, followed by a newline.

When `awk' statements within one rule are short, you might want to
put more than one of them on a line.  You do this by separating the
statements with semicolons, `;'.  This also applies to the rules
themselves.  Thus, the above example program could have been written:

     /12/ { print $0 } ; /21/ { print $0 }

*Note:* the requirement that rules on the same line must be separated
with a semicolon is a recent change in the `awk' language; it was
done for consistency with the treatment of statements within an action.


▶1f◀
File: gawk-info,  Node: When,  Prev: Statements/Lines,  Up: Getting Started

When to Use `awk'
=================

What use is all of this to me, you might ask?  Using additional
utility programs, more advanced patterns, field separators,
arithmetic statements, and other selection criteria, you can produce
much more complex output.  The `awk' language is very useful for
producing reports from large amounts of raw data, such as summarizing
information from the output of other utility programs such as `ls'. 
(*Note A More Complex Example: More Complex.)

Programs written with `awk' are usually much smaller than they would
be in other languages.  This makes `awk' programs easy to compose and
use.  Often `awk' programs can be quickly composed at your terminal,
used once, and thrown away.  Since `awk' programs are interpreted,
you can avoid the usually lengthy edit-compile-test-debug cycle of
software development.

Complex programs have been written in `awk', including a complete
retargetable assembler for 8-bit microprocessors (*note Glossary::.,
for more information) and a microcode assembler for a special purpose
Prolog computer.  However, `awk''s capabilities are strained by tasks
of such complexity.

If you find yourself writing `awk' scripts of more than, say, a few
hundred lines, you might consider using a different programming
language.  Emacs Lisp is a good choice if you need sophisticated
string or pattern matching capabilities.  The shell is also good at
string and pattern matching; in addition, it allows powerful use of
the system utilities.  More conventional languages, such as C, C++,
and Lisp, offer better facilities for system programming and for
managing the complexity of large programs.  Programs in these
languages may require more lines of source code than the equivalent
`awk' programs, but they are easier to maintain and usually run more
efficiently.


▶1f◀
File: gawk-info,  Node: Reading Files,  Next: Printing,  Prev: Getting Started,  Up: Top

Reading Input Files
*******************

In the typical `awk' program, all input is read either from the
standard input (usually the keyboard) or from files whose names you
specify on the `awk' command line.  If you specify input files, `awk'
reads data from the first one until it reaches the end; then it reads
the second file until it reaches the end, and so on.  The name of the
current input file can be found in the built-in variable `FILENAME'
(*note Built-in Variables::.).

The input is read in units called "records", and processed by the
rules one record at a time.  By default, each record is one line. 
Each record read is split automatically into "fields", to make it
more convenient for a rule to work on parts of the record under
consideration.

On rare occasions you will need to use the `getline' command, which
can do explicit input from any number of files (*note Getline::.).


* Menu:

* Records::             Controlling how data is split into records.
* Fields::              An introduction to fields.
* Non-Constant Fields:: Non-constant Field Numbers.
* Changing Fields::     Changing the Contents of a Field.
* Field Separators::    The field separator and how to change it.
* Multiple Line::       Reading multi-line records.

* Getline::             Reading files under explicit program control
                        using the `getline' function.

* Close Input::         Closing an input file (so you can read from
                        the beginning once more).

 
▶1f◀
File: gawk-info,  Node: Records,  Next: Fields,  Prev: Reading Files,  Up: Reading Files

How Input is Split into Records
===============================

The `awk' language divides its input into records and fields. 
Records are separated by a character called the "record separator". 
By default, the record separator is the newline character. 
Therefore, normally, a record is a line of text.

Sometimes you may want to use a different character to separate your
records.  You can use different characters by changing the built-in
variable `RS'.

The value of `RS' is a string that says how to separate records; the
default value is `"\n"', the string of just a newline character. 
This is why records are, by default, single lines.

`RS' can have any string as its value, but only the first character
of the string is used as the record separator.  The other characters
are ignored.  `RS' is exceptional in this regard; `awk' uses the full
value of all its other built-in variables.

You can change the value of `RS' in the `awk' program with the
assignment operator, `=' (*note Assignment Ops::.).  The new
record-separator character should be enclosed in quotation marks to
make a string constant.  Often the right time to do this is at the
beginning of execution, before any input has been processed, so that
the very first record will be read with the proper separator.  To do
this, use the special `BEGIN' pattern (*note BEGIN/END::.).  For
example:

     awk 'BEGIN { RS = "/" } ; { print $0 }' BBS-list

changes the value of `RS' to `"/"', before reading any input.  This
is a string whose first character is a slash; as a result, records
are separated by slashes.  Then the input file is read, and the
second rule in the `awk' program (the action with no pattern) prints
each record.  Since each `print' statement adds a newline at the end
of its output, the effect of this `awk' program is to copy the input
with each slash changed to a newline.

Another way to change the record separator is on the command line,
using the variable-assignment feature (*note Command Line::.).

     awk '...' RS="/" SOURCE-FILE

This sets `RS' to `/' before processing SOURCE-FILE.

The empty string (a string of no characters) has a special meaning as
the value of `RS': it means that records are separated only by blank
lines.  *Note Multiple Line::, for more details.

The `awk' utility keeps track of the number of records that have been
read so far from the current input file.  This value is stored in a
built-in variable called `FNR'.  It is reset to zero when a new file
is started.  Another built-in variable, `NR', is the total number of
input records read so far from all files.  It starts at zero but is
never automatically reset to zero.

If you change the value of `RS' in the middle of an `awk' run, the
new value is used to delimit subsequent records, but the record
currently being processed (and records already finished) are not
affected.


▶1f◀
File: gawk-info,  Node: Fields,  Next: Non-Constant Fields,  Prev: Records,  Up: Reading Files

Examining Fields
================

When `awk' reads an input record, the record is automatically
separated or "parsed" by the interpreter into pieces called "fields".
By default, fields are separated by whitespace, like words in a line.
Whitespace in `awk' means any string of one or more spaces and/or
tabs; other characters such as newline, formfeed, and so on, that are
considered whitespace by other languages are *not* considered
whitespace by `awk'.

The purpose of fields is to make it more convenient for you to refer
to these pieces of the record.  You don't have to use them--you can
operate on the whole record if you wish--but fields are what make
simple `awk' programs so powerful.

To refer to a field in an `awk' program, you use a dollar-sign, `$',
followed by the number of the field you want.  Thus, `$1' refers to
the first field, `$2' to the second, and so on.  For example, suppose
the following is a line of input:

     This seems like a pretty nice example.

 Here the first field, or `$1', is `This'; the second field, or `$2',
is `seems'; and so on.  Note that the last field, `$7', is
`example.'.  Because there is no space between the `e' and the `.',
the period is considered part of the seventh field.

No matter how many fields there are, the last field in a record can
be represented by `$NF'.  So, in the example above, `$NF' would be
the same as `$7', which is `example.'.  Why this works is explained
below (*note Non-Constant Fields::.).  If you try to refer to a field
beyond the last one, such as `$8' when the record has only 7 fields,
you get the empty string.

Plain `NF', with no `$', is a built-in variable whose value is the
number of fields in the current record.

`$0', which looks like an attempt to refer to the zeroth field, is a
special case: it represents the whole input record.  This is what you
would use when you aren't interested in fields.

Here are some more examples:

     awk '$1 ~ /foo/ { print $0 }' BBS-list

This example prints each record in the file `BBS-list' whose first
field contains the string `foo'.  The operator `~' is called a
"matching operator" (*note Comparison Ops::.); it tests whether a
string (here, the field `$1') contains a match for a given regular
expression.

By contrast, the following example:

     awk '/foo/ { print $1, $NF }' BBS-list

looks for `foo' in *the entire record* and prints the first field and
the last field for each input record containing a match.