DataMuseum.dk

Presents historical artifacts from the history of:

DKUUG/EUUG Conference tapes

This is an automatic "excavation" of a thematic subset of
artifacts from Datamuseum.dk's BitArchive.

See our Wiki for more about DKUUG/EUUG Conference tapes

Excavated with: AutoArchaeologist - Free & Open Source Software.


top - metrics - download
Index: T s

⟦754e1b811⟧ TextFile

    Length: 14588 (0x38fc)
    Types: TextFile
    Names: »syntax.texinfo«

Derivation

└─⟦a05ed705a⟧ Bits:30007078 DKUUG GNU 2/12/89
    └─⟦c06c473ab⟧ »./UNRELEASED/lispref.tar.Z« 
        └─⟦1b57a2ffe⟧ 
            └─⟦this⟧ »syntax.texinfo« 

TextFile

@setfilename ../info/syntax
@node Syntax Tables, Lisp Expressions, Searching and Matching, Top
@chapter Syntax Tables

@cindex text parsing

  A @dfn{syntax table} provides Emacs with the information that Emacs
needs to determine the syntatic use of each character in a buffer.  This
information is used by the parsing commands, the complex movement
commands, and others, to determine where words, symbols, and other
syntactic constructs begin and end.


@c !!! perhaps should list which commands  that are described
@c elsewhere are affected by syntax  tables, such as the word commands,
@c the list and sexp commands, Lisp parsing functions, such as 
@c @code{parse-partial-sexp}, and  others.

  A syntax table is a vector of 256 elements; it contains one entry for
each of the 256 @sc{ASCII} characters of an 8-bit byte.  Each element is
an integer which encodes the syntax of the character in question.

  Syntax tables are used only for moving across text, not for the GNU
Emacs Lisp reader.  GNU Emacs Lisp uses built-in C code to read Lisp
expressions and does not use syntax tables.

  Each different buffer may be in a different major mode, and each
different major mode may have a different definition of the syntactic
class of a given character.  For example, in Lisp mode, the character
@samp{;} begins a comment, but in C mode, it terminates a statement.
Syntax tables are local to each buffer, but a major mode will usually
provide the same syntax table to all buffers that use the mode.
@xref{Major Modes}, for an example of how to set up a syntax table.

@defun syntax-table-p object
  This function returns @code{t} if @var{object} is a vector of length
256 elements.  This means that the vector may be a syntax table.
However, according to this test, any vector of length 256 is considered
to be a syntax table, no matter what its contents.
@end defun

@menu
* Syntax Classes::	
* Syntax Table Functions::	
* Some Standard Syntax Tables::	
* Syntax Table Internals::	
@end menu

@node Syntax Classes, Syntax Table Functions, Syntax Tables, Syntax Tables
@section Syntax Classes

  A character belongs to one of twelve @dfn{syntax classes}.  Often,
different tables put characters into different classes; there does not
need be any relation between the class of a character in one table and
its class in any other.  

  Most of the functions that operate on syntax tables use
characters to represent the classes; the character that represents each
class is chosen as a mnemonic.

  Here is a summary of the classes, and the characters that stand for
the classes:

@table @kbd
@item @key{SPC}
  Whitespace syntax

@item w
  Word constituent

@item _
  Symbol constituent

@item .
  Punctuation

@item (
  Open-parenthesis

@item )
  Close-parenthesis

@item "
  String quote

@item \
  Character quote

@item $
  Paired delimiter

@item `
  Expression prefix operator

@item <
  Comment starter

@item >
  Comment ender

@end table

@deffn Syntax-class whitespace characters

@cindex whitespace syntax
@dfn{Whitespace characters} divide symbols and words from each other.
Typically, @w{whitespace} characters have no other syntactic use and
multiple whitespace characters are considered as one.  Space,
tab, newline, formfeed are almost always considered whitespace.
@end deffn

@deffn Syntax-class word constituents
@cindex word syntax
@dfn{Word constituents} are parts of normal English words and are
typically used in variable and command names in programs.  All upper and
lower case letters and the digits are typically word constituents.
@end deffn

@deffn Syntax-class symbol constituents
@cindex symbol syntax
@dfn{Symbol constituents} are the extra characters, that along with word
constituents, are used in variable and command names.  The symbol
constituents class may be used, for example, to allow Lisp symbols to
include extra characters without changing the notion that a word is an
element of the English language.  In Lisp, symbol constituents are
@samp{$&*+-_<>}.  In standard C, the only non-word-constituent character
that is valid in symbols is underscore (@samp{_}).
@end deffn

@deffn Syntax-class punctuation characters
@dfn{Punctuation characters} are those characters that are used as
punctuation in English, or are used in some way in a programming
languages to separate symbols from one another.  Most programming
language modes, including Emacs Lisp mode, do not have any characters in
this class since the few characters that are not symbol or word
constituents all have other uses.
@end deffn

@deffn Syntax-class parenthesis characters
Open and close @dfn{parenthesis characters} come in pairs of matching
characters that allow one to surround sentences or expressions.  In
English text, and in C code, these are @samp{()}, @samp{[]}, and
@samp{@{@}}.  In Lisp, the list and vector delimiting characters
(@samp{()} and @samp{[]}) are parenthesis characters.

Normally, Emacs shows a matching open parenthesis when you type a
closing parenthesis.  @xref{Blinking}.
@end deffn

@deffn Syntax-class string quote
In many programming languages including Lisp, a pair of @dfn{string
quote characters} delimit a string of characters.  English text has no
string quote characters because it is not a programming language.  Emacs
Lisp has two string quote characters: double quote (@samp{"}) and
vertical bar (@samp{|}), although there is no use in Emacs Lisp of the
@samp{@samp{|}} character.  C also has two string quote characters:
double quote for strings, and single quote (@samp{'}) for character
constants.
@end deffn

@deffn Syntax-class character quote
A @dfn{character quote character} quotes the following character
such that it loses its normal syntax meaning.  In regular expressions,
the backslash (@samp{\}) does this.
@end deffn

@deffn Syntax-class delimiter characters
Paired @dfn{delimiter characters} serve a similar purpose to string quote
characters, but differ in that the same character is used at the
beginning and end.  Only TeX mode uses paired delimiters presently---the
@samp{$} that begins and ends math mode.
@end deffn

@deffn Syntax-class expression prefix
An @dfn{expression prefix operator} is used in Lisp for things that go next to
an expression but aren't part of a symbol if they are next to it, such
as the apostrophe, @samp{'}, used for quoting and the comma, @samp{,}
used in macros.
@end deffn

@deffn Syntax-class comment starter
@cindex comment syntax
The @dfn{comment starter} and @dfn{comment ender} characters are used in
different languages to delimit comments.  English text has no
comment characters.  In Lisp, the semi-colon (@samp{;}) starts a comment
and a newline or formfeed ends one.  
@end deffn

@cindex syntax flags
In addition to these classes, entries for characters in a syntax table can
include flags.  At present, there are four possible flags, all of which are
intended to deal with multi-character comment delimiters.  The four flags
(represented by the characters @samp{1}, @samp{2}, @samp{3}, and @samp{4})
indicate that the character for which the entry is being made can
@emph{also} be part of a comment sequence.  Thus an asterisk (used for
multiplication in C) is a punctuation character, @emph{and} the second
character of a start-of-comment sequence (@samp{/*}), @emph{and} the first
character of an end-of-comment sequence (@samp{*/}).

The flags for a character @var{c} are:

@table @code
@item 1
means @var{c} is the start of a two-character comment start sequence.

@item 2
means @var{c} is the second character of such a sequence.

@item 3
means @var{c} is the start of a two-character comment end sequence.

@item 4
means @var{c} is the second character of such a sequence.
@end table

  Thus, the entry for the character @samp{@kbd{*}} in the C syntax table
is: @code{.@:23} (i.e., punctuation, second character of a
comment-starter, first character of an comment--ender), and the entry
for @samp{/} is @code{.@:14} (i.e., punctuation, first character of a
comment-starter, second character of a comment-ender).

@node Syntax Table Functions, Some Standard Syntax Tables, Syntax Classes, Syntax Tables
@section Syntax Table Functions

  The syntax table functions all work with strings of characters that
represent a syntax class.  The representative characters are chosen to
be mnemonic.

@defun make-syntax-table  &optional table
  This function constructs a copy of @var{table} and returns it.  If
@var{table} is not supplied, it returns a copy of the current syntax
table.

  It is an error if @var{table} is not a syntax table.
@end defun

@defun copy-syntax-table &optional table
  This function is identical to @code{make-syntax-table}.
@end defun

@deffn Command modify-syntax-entry char syntax-string  &optional table
  This function sets the syntax entry for @var{char} according to
@var{syntax-string}.  The syntax is changed only for @var{table}, which
defaults to the current buffer's syntax table.  The syntax string defines
the new syntax for the character according to the definitions for the
representation characters (@pxref{Syntax Classes}).

  The old syntax information in the table for this character is completely
forgotten.

  This function always returns @code{nil}.  It is an error if the first
character of the syntax string is not one of the twelve syntax class
characters.  It is an error if @var{char} is not a character.

  The first example makes the @key{SPC} character an element of the
class of whitespace.  The second example, makes @samp{@kbd{$}} an open
parenthesis character, with @samp{@kbd{^}} as its matching close
parenthesis character.  The third example example makes @samp{@kbd{^}} a
close parenthesis character, with @samp{@kbd{$}} its matching open
parenthesis character.  The fourth example makes @samp{@kbd{/}} a
punctuation character, the first character of a start-comment sequence,
and the second character of an end-comment sequence.

@example
(modify-syntax-entry ?\  " ")  ; space
     @result{} nil

(modify-syntax-entry ?$ "(^")
     @result{} nil
(modify-syntax-entry ?^ ")$")
     @result{} nil

(modify-syntax-entry ?/ ".13")
     @result{} nil
@end example
@end deffn

@defun char-syntax character
  This function returns the syntax class of @var{character}, represented
by its mnemonic character.  This @emph{only} returns the class, not any
matching parentheses, or flags.

  It is an error if @var{char} is not a character.

  The first example shows that the syntax class of space is whitespace
(represented by a space).  The second example shows that the syntax of
@samp{@kbd{/}} is punctuation in C-mode.  This does not show the fact that
it is also a comment sequence character.  The third example shows that open
parenthesis is in the class of open parentheses.  This does not show the fact
that it has a matching character, @samp{@kbd{)}}.

@example
(char-to-string (char-syntax ? ))
     @result{} " "

(char-to-string (char-syntax ?/))
     @result{} "."

(char-to-string (char-syntax ?( ))
     @result{} "("
@end example
@end defun

@defun set-syntax-table table
  This function makes @var{table} the syntax table for the current buffer.
It returns @var{table}.
@end defun

@defun syntax-table
  This function returns the current syntax table.  This is the table for
the current buffer.
@end defun

@defun backward-prefix-chars
  This function moves the point backward over any number of chars with
syntax @var{prefix}.  @xref{Syntax Tables}, to find out about prefix
characters.
@end defun

@deffn command describe-syntax
  This function describes the syntax specifications of the current syntax
table.  It makes a listing in the @samp{*Help*} buffer, and then pops up
a window to view this in.

  It returns @code{nil}.

A portion of a description is shown below.
@example
(describe-syntax)
     @result{} nil

---------- Buffer: *Help* ----------
C-q             \       which means: escape
C-r .. C-_              which means: whitespace
!               .       which means: punctuation
(               ()      which means: open, matches )
)               )(      which means: close, matches (
* .. +          _       which means: symbol
,               .       which means: punctuation
-               _       which means: symbol
.               .       which means: punctuation
/               . 13    which means: punctuation,
          is the first character of a comment-start sequence,
          is the first character of a comment-end sequence
0 .. 9          w       which means: word
---------- Buffer: *Help* ----------
@end example
@end deffn

@node Some Standard Syntax Tables, Syntax Table Internals, Syntax Table Functions, Syntax Tables
@section Some Standard Syntax Tables

  Each of the major modes in Emacs has its own syntax table.  Here are
several of them: 

@defun standard-syntax-table
  This function returns the standard syntax table.  This is the syntax
table used in Fundamental mode.
@end defun

@defvar text-mode-syntax-table
  The value of this variable is the syntax table used in text mode.
@end defvar

@defvar c-mode-syntax-table
  The value of this variable is the syntax table in use in C-mode buffers.
@end defvar

@defvar emacs-lisp-mode-syntax-table
  The value of this variable is the syntax table used in Emacs Lisp mode
by editing commands.  (It has no effect on the Lisp @code{read}
function.)
@end defvar

@node Syntax Table Internals,  , Some Standard Syntax Tables, Syntax Tables
@section Syntax Table Internals
@cindex syntax table internals

  Each element of a syntax table is an integer which translates into the
full meaning of the entry: class, possible matching character, and
flags.  However, it is not common for a programmer to work with the
entries directly in this form since the syntax table functions all
expect a string of representative characters.  In such a string, the
first character of the string will always be the class of the character;
the second character will be the matching parenthesis (if it is a
parenthesis character); and the subsequent characters will be the flags,
if any.

    The low 8 bits of each element of a syntax table indicates the
syntax class.

@table @code
@item Integer
Class
@item 0
whitespace
@item 1
punctuation
@item 2
word
@item 3
symbol
@item 4
open paren
@item 5
close paren
@item 6
expression prefix
@item 7
string quote
@item 8

@item 9
character quote
@item 10

@item 11
comment-start
@item 12
comment-end
@end table

  The next 8 bits are the matching opposite parenthesis (if the
character has parenthesis syntax); otherwise, they are not meaningful.
The next 4 bits are the flags.