|
DataMuseum.dkPresents historical artifacts from the history of: DKUUG/EUUG Conference tapes |
This is an automatic "excavation" of a thematic subset of
See our Wiki for more about DKUUG/EUUG Conference tapes Excavated with: AutoArchaeologist - Free & Open Source Software. |
top - metrics - downloadIndex: T s
Length: 14588 (0x38fc) Types: TextFile Names: »syntax.texinfo«
└─⟦a05ed705a⟧ Bits:30007078 DKUUG GNU 2/12/89 └─⟦c06c473ab⟧ »./UNRELEASED/lispref.tar.Z« └─⟦1b57a2ffe⟧ └─⟦this⟧ »syntax.texinfo«
@setfilename ../info/syntax @node Syntax Tables, Lisp Expressions, Searching and Matching, Top @chapter Syntax Tables @cindex text parsing A @dfn{syntax table} provides Emacs with the information that Emacs needs to determine the syntatic use of each character in a buffer. This information is used by the parsing commands, the complex movement commands, and others, to determine where words, symbols, and other syntactic constructs begin and end. @c !!! perhaps should list which commands that are described @c elsewhere are affected by syntax tables, such as the word commands, @c the list and sexp commands, Lisp parsing functions, such as @c @code{parse-partial-sexp}, and others. A syntax table is a vector of 256 elements; it contains one entry for each of the 256 @sc{ASCII} characters of an 8-bit byte. Each element is an integer which encodes the syntax of the character in question. Syntax tables are used only for moving across text, not for the GNU Emacs Lisp reader. GNU Emacs Lisp uses built-in C code to read Lisp expressions and does not use syntax tables. Each different buffer may be in a different major mode, and each different major mode may have a different definition of the syntactic class of a given character. For example, in Lisp mode, the character @samp{;} begins a comment, but in C mode, it terminates a statement. Syntax tables are local to each buffer, but a major mode will usually provide the same syntax table to all buffers that use the mode. @xref{Major Modes}, for an example of how to set up a syntax table. @defun syntax-table-p object This function returns @code{t} if @var{object} is a vector of length 256 elements. This means that the vector may be a syntax table. However, according to this test, any vector of length 256 is considered to be a syntax table, no matter what its contents. @end defun @menu * Syntax Classes:: * Syntax Table Functions:: * Some Standard Syntax Tables:: * Syntax Table Internals:: @end menu @node Syntax Classes, Syntax Table Functions, Syntax Tables, Syntax Tables @section Syntax Classes A character belongs to one of twelve @dfn{syntax classes}. Often, different tables put characters into different classes; there does not need be any relation between the class of a character in one table and its class in any other. Most of the functions that operate on syntax tables use characters to represent the classes; the character that represents each class is chosen as a mnemonic. Here is a summary of the classes, and the characters that stand for the classes: @table @kbd @item @key{SPC} Whitespace syntax @item w Word constituent @item _ Symbol constituent @item . Punctuation @item ( Open-parenthesis @item ) Close-parenthesis @item " String quote @item \ Character quote @item $ Paired delimiter @item ` Expression prefix operator @item < Comment starter @item > Comment ender @end table @deffn Syntax-class whitespace characters @cindex whitespace syntax @dfn{Whitespace characters} divide symbols and words from each other. Typically, @w{whitespace} characters have no other syntactic use and multiple whitespace characters are considered as one. Space, tab, newline, formfeed are almost always considered whitespace. @end deffn @deffn Syntax-class word constituents @cindex word syntax @dfn{Word constituents} are parts of normal English words and are typically used in variable and command names in programs. All upper and lower case letters and the digits are typically word constituents. @end deffn @deffn Syntax-class symbol constituents @cindex symbol syntax @dfn{Symbol constituents} are the extra characters, that along with word constituents, are used in variable and command names. The symbol constituents class may be used, for example, to allow Lisp symbols to include extra characters without changing the notion that a word is an element of the English language. In Lisp, symbol constituents are @samp{$&*+-_<>}. In standard C, the only non-word-constituent character that is valid in symbols is underscore (@samp{_}). @end deffn @deffn Syntax-class punctuation characters @dfn{Punctuation characters} are those characters that are used as punctuation in English, or are used in some way in a programming languages to separate symbols from one another. Most programming language modes, including Emacs Lisp mode, do not have any characters in this class since the few characters that are not symbol or word constituents all have other uses. @end deffn @deffn Syntax-class parenthesis characters Open and close @dfn{parenthesis characters} come in pairs of matching characters that allow one to surround sentences or expressions. In English text, and in C code, these are @samp{()}, @samp{[]}, and @samp{@{@}}. In Lisp, the list and vector delimiting characters (@samp{()} and @samp{[]}) are parenthesis characters. Normally, Emacs shows a matching open parenthesis when you type a closing parenthesis. @xref{Blinking}. @end deffn @deffn Syntax-class string quote In many programming languages including Lisp, a pair of @dfn{string quote characters} delimit a string of characters. English text has no string quote characters because it is not a programming language. Emacs Lisp has two string quote characters: double quote (@samp{"}) and vertical bar (@samp{|}), although there is no use in Emacs Lisp of the @samp{@samp{|}} character. C also has two string quote characters: double quote for strings, and single quote (@samp{'}) for character constants. @end deffn @deffn Syntax-class character quote A @dfn{character quote character} quotes the following character such that it loses its normal syntax meaning. In regular expressions, the backslash (@samp{\}) does this. @end deffn @deffn Syntax-class delimiter characters Paired @dfn{delimiter characters} serve a similar purpose to string quote characters, but differ in that the same character is used at the beginning and end. Only TeX mode uses paired delimiters presently---the @samp{$} that begins and ends math mode. @end deffn @deffn Syntax-class expression prefix An @dfn{expression prefix operator} is used in Lisp for things that go next to an expression but aren't part of a symbol if they are next to it, such as the apostrophe, @samp{'}, used for quoting and the comma, @samp{,} used in macros. @end deffn @deffn Syntax-class comment starter @cindex comment syntax The @dfn{comment starter} and @dfn{comment ender} characters are used in different languages to delimit comments. English text has no comment characters. In Lisp, the semi-colon (@samp{;}) starts a comment and a newline or formfeed ends one. @end deffn @cindex syntax flags In addition to these classes, entries for characters in a syntax table can include flags. At present, there are four possible flags, all of which are intended to deal with multi-character comment delimiters. The four flags (represented by the characters @samp{1}, @samp{2}, @samp{3}, and @samp{4}) indicate that the character for which the entry is being made can @emph{also} be part of a comment sequence. Thus an asterisk (used for multiplication in C) is a punctuation character, @emph{and} the second character of a start-of-comment sequence (@samp{/*}), @emph{and} the first character of an end-of-comment sequence (@samp{*/}). The flags for a character @var{c} are: @table @code @item 1 means @var{c} is the start of a two-character comment start sequence. @item 2 means @var{c} is the second character of such a sequence. @item 3 means @var{c} is the start of a two-character comment end sequence. @item 4 means @var{c} is the second character of such a sequence. @end table Thus, the entry for the character @samp{@kbd{*}} in the C syntax table is: @code{.@:23} (i.e., punctuation, second character of a comment-starter, first character of an comment--ender), and the entry for @samp{/} is @code{.@:14} (i.e., punctuation, first character of a comment-starter, second character of a comment-ender). @node Syntax Table Functions, Some Standard Syntax Tables, Syntax Classes, Syntax Tables @section Syntax Table Functions The syntax table functions all work with strings of characters that represent a syntax class. The representative characters are chosen to be mnemonic. @defun make-syntax-table &optional table This function constructs a copy of @var{table} and returns it. If @var{table} is not supplied, it returns a copy of the current syntax table. It is an error if @var{table} is not a syntax table. @end defun @defun copy-syntax-table &optional table This function is identical to @code{make-syntax-table}. @end defun @deffn Command modify-syntax-entry char syntax-string &optional table This function sets the syntax entry for @var{char} according to @var{syntax-string}. The syntax is changed only for @var{table}, which defaults to the current buffer's syntax table. The syntax string defines the new syntax for the character according to the definitions for the representation characters (@pxref{Syntax Classes}). The old syntax information in the table for this character is completely forgotten. This function always returns @code{nil}. It is an error if the first character of the syntax string is not one of the twelve syntax class characters. It is an error if @var{char} is not a character. The first example makes the @key{SPC} character an element of the class of whitespace. The second example, makes @samp{@kbd{$}} an open parenthesis character, with @samp{@kbd{^}} as its matching close parenthesis character. The third example example makes @samp{@kbd{^}} a close parenthesis character, with @samp{@kbd{$}} its matching open parenthesis character. The fourth example makes @samp{@kbd{/}} a punctuation character, the first character of a start-comment sequence, and the second character of an end-comment sequence. @example (modify-syntax-entry ?\ " ") ; space @result{} nil (modify-syntax-entry ?$ "(^") @result{} nil (modify-syntax-entry ?^ ")$") @result{} nil (modify-syntax-entry ?/ ".13") @result{} nil @end example @end deffn @defun char-syntax character This function returns the syntax class of @var{character}, represented by its mnemonic character. This @emph{only} returns the class, not any matching parentheses, or flags. It is an error if @var{char} is not a character. The first example shows that the syntax class of space is whitespace (represented by a space). The second example shows that the syntax of @samp{@kbd{/}} is punctuation in C-mode. This does not show the fact that it is also a comment sequence character. The third example shows that open parenthesis is in the class of open parentheses. This does not show the fact that it has a matching character, @samp{@kbd{)}}. @example (char-to-string (char-syntax ? )) @result{} " " (char-to-string (char-syntax ?/)) @result{} "." (char-to-string (char-syntax ?( )) @result{} "(" @end example @end defun @defun set-syntax-table table This function makes @var{table} the syntax table for the current buffer. It returns @var{table}. @end defun @defun syntax-table This function returns the current syntax table. This is the table for the current buffer. @end defun @defun backward-prefix-chars This function moves the point backward over any number of chars with syntax @var{prefix}. @xref{Syntax Tables}, to find out about prefix characters. @end defun @deffn command describe-syntax This function describes the syntax specifications of the current syntax table. It makes a listing in the @samp{*Help*} buffer, and then pops up a window to view this in. It returns @code{nil}. A portion of a description is shown below. @example (describe-syntax) @result{} nil ---------- Buffer: *Help* ---------- C-q \ which means: escape C-r .. C-_ which means: whitespace ! . which means: punctuation ( () which means: open, matches ) ) )( which means: close, matches ( * .. + _ which means: symbol , . which means: punctuation - _ which means: symbol . . which means: punctuation / . 13 which means: punctuation, is the first character of a comment-start sequence, is the first character of a comment-end sequence 0 .. 9 w which means: word ---------- Buffer: *Help* ---------- @end example @end deffn @node Some Standard Syntax Tables, Syntax Table Internals, Syntax Table Functions, Syntax Tables @section Some Standard Syntax Tables Each of the major modes in Emacs has its own syntax table. Here are several of them: @defun standard-syntax-table This function returns the standard syntax table. This is the syntax table used in Fundamental mode. @end defun @defvar text-mode-syntax-table The value of this variable is the syntax table used in text mode. @end defvar @defvar c-mode-syntax-table The value of this variable is the syntax table in use in C-mode buffers. @end defvar @defvar emacs-lisp-mode-syntax-table The value of this variable is the syntax table used in Emacs Lisp mode by editing commands. (It has no effect on the Lisp @code{read} function.) @end defvar @node Syntax Table Internals, , Some Standard Syntax Tables, Syntax Tables @section Syntax Table Internals @cindex syntax table internals Each element of a syntax table is an integer which translates into the full meaning of the entry: class, possible matching character, and flags. However, it is not common for a programmer to work with the entries directly in this form since the syntax table functions all expect a string of representative characters. In such a string, the first character of the string will always be the class of the character; the second character will be the matching parenthesis (if it is a parenthesis character); and the subsequent characters will be the flags, if any. The low 8 bits of each element of a syntax table indicates the syntax class. @table @code @item Integer Class @item 0 whitespace @item 1 punctuation @item 2 word @item 3 symbol @item 4 open paren @item 5 close paren @item 6 expression prefix @item 7 string quote @item 8 @item 9 character quote @item 10 @item 11 comment-start @item 12 comment-end @end table The next 8 bits are the matching opposite parenthesis (if the character has parenthesis syntax); otherwise, they are not meaningful. The next 4 bits are the flags.