|
DataMuseum.dkPresents historical artifacts from the history of: DKUUG/EUUG Conference tapes |
This is an automatic "excavation" of a thematic subset of
See our Wiki for more about DKUUG/EUUG Conference tapes Excavated with: AutoArchaeologist - Free & Open Source Software. |
top - metrics - downloadIndex: T s
Length: 40056 (0x9c78) Types: TextFile Names: »searching.texinfo«
└─⟦a05ed705a⟧ Bits:30007078 DKUUG GNU 2/12/89 └─⟦c06c473ab⟧ »./UNRELEASED/lispref.tar.Z« └─⟦1b57a2ffe⟧ └─⟦this⟧ »searching.texinfo«
@setfilename ../info/searching @node Searching and Matching, Syntax Tables, Text, Top @chapter Searching and Matching @cindex searching @cindex matching GNU Emacs provides two ways to search through a buffer for specified text: exact string searches and regular expression searches. @dfn{Matching} may be used following a regular expression search to specify the positions of sub-expressions found by a regular expression search. This chapter also describes replacement functions. @menu * Searching for Strings:: * Regular Expressions:: * Regular Expression Searching:: * Replacement:: * Match Data:: * Standard Regexps:: * Searching and Case:: @end menu @node Searching for Strings, Regular Expressions, Searching and Matching, Searching and Matching @section Searching for Strings @cindex string search @deffn Command search-forward string &optional limit noerror repeat This function searches forward from point for an exact match for @var{string}. It sets point to the end of the occurrence found, and returns @code{t}. In this example, point is positioned at the beginning of the line. Then @code{(search-forward "fox")} is evaluated in the minibuffer and point is left after the last letter of @samp{fox}: @example ---------- Buffer: foo ---------- @point{}The quick brown fox jumped over the lazy dog. ---------- Buffer: foo ---------- (search-forward "fox") @result{} t ---------- Buffer: foo ---------- The quick brown fox@point{} jumped over the lazy dog. ---------- Buffer: foo ---------- @end example If @var{limit} is non-@code{nil} (it must be a position in the current buffer), then it is the upper bound to the search. No match extending after that position is accepted. @cindex search-failed error If @var{noerror} is @code{nil}, then a @code{search-failed} error is signaled. If @var{noerror} is @code{t}, then if the search fails, it just returns @code{nil}, and doesn't signal an error. If @var{noerror} is neither @code{nil} nor @code{t}, then @code{search-forward} moves the point to @var{limit} and returns @code{nil}. If @var{repeat} is non-@code{nil}, then the search is repeated that many times, the point being positioned at the end of the last match. @end deffn @deffn Command search-backward string &optional limit noerror repeat This function searches backward from point for @var{string}. It is the exact analog of @code{search-forward}. It leaves the point at the beginning of the string matched. @end deffn @deffn Command word-search-forward string &optional limit noerror repeat This function searchs forward from the point for a ``word'' match for @var{string}. It sets the point to the end of the occurrence found, and returns @code{t}. A word search differs from a simple string search in that a word search @strong{requires} that the words it searches for are separate words (searching for the word @samp{ball} will not match the word @samp{balls}), and punctuation and spacing is ignored (searching for @samp{ball boy} will match @samp{ball. Boy!}). In the example, the point is first placed at the beginning of the buffer; the search leaves it between the @kbd{y} and the @kbd{!}. @example ---------- Buffer: foo ---------- @point{}He said ``Please! Find the ball boy!'' ---------- Buffer: foo ---------- (word-search-forward "Please find the ball, boy.") @result{} t ---------- Buffer: foo ---------- He said ``Please! Find the ball boy@point{}!'' ---------- Buffer: foo ---------- @end example If @var{limit} is non-@code{nil} (it must be a position in the current buffer), then it is the upper bound to the search. The match found must not extend after that position. If @var{noerror} is @code{t}, then @code{word-search-forward} returns @code{nil} when a search fails, instead of signalling an error. If @var{noerror} is neither @code{nil} nor @code{t}, then @code{word-search-forward} moves point to @var{limit} and returns @code{nil}. If @var{repeat} is non-@code{nil}, then the search is repeated that many times, the point being positioned at the end of the last match. When @code{word-search-forward} is called interactively, Emacs prompts you for the search string; @var{limit} and @var{noerror}, are set to @code{nil}, and @var{repeat} is set to 1. @end deffn @deffn Command word-search-backward string This function searches backward from the point for a word match to @var{string}. This function is the exact analog to @code{word-search-forward}. @end deffn @node Regular Expressions, Regular Expression Searching, Searching for Strings, Searching and Matching @section Regular Expression Syntax @cindex patterns @cindex regular expression syntax A @dfn{regular expression} (@dfn{regexp}, for short) is a pattern that denotes a set of strings, possibly an infinite set. Searching for matches for a regexp is a very powerful operation. In GNU Emacs, you can search for the next match for a regexp either incrementally or not. Incremental search commands are described in the @cite{The GNU Emacs User Manual}. @xref{Regexp Search, , Regular Expression Search, emacs, The GNU Emacs Manual}. Also, @pxref{Major Modes, Major Modes}. @menu * Complex Regexp Example:: Illustrates regular expression syntax. @end menu @cindex invalid-regexp error Regular expressions have a syntax in which a few characters are special constructs and the rest are @dfn{ordinary}. An ordinary character is a simple regular expression which matches that character and nothing else. The special characters are @samp{$}, @samp{^}, @samp{.}, @samp{*}, @samp{+}, @samp{?}, @samp{[}, @samp{]} and @samp{\}; no new special characters will be defined in the future. Any other character appearing in a regular expression is ordinary, unless a @samp{\} precedes it. If a regular expression is malformed, an @code{invalid-regexp} error signaled. @refill For example, @samp{f} is not a special character, so it is ordinary, and therefore @samp{f} is a regular expression that matches the string @samp{f} and no other string. (It does @emph{not} match the string @samp{ff}.) Likewise, @samp{o} is a regular expression that matches only @samp{o}.@refill Any two regular expressions @var{a} and @var{b} can be concatenated. The result is a regular expression which matches a string if @var{a} matches some amount of the beginning of that string and @var{b} matches the rest of the string.@refill As a simple example, we can concatenate the regular expressions @samp{f} and @samp{o} to get the regular expression @samp{fo}, which matches only the string @samp{fo}. Still trivial. To do something nontrivial, you need to use one of the special characters. Here is a list of them: @table @kbd @item .@: @r{(Period)} @cindex @samp{.} in regexp is a special character that matches any single character except a newline. Using concatenation, we can make regular expressions like @samp{a.b} which matches any three-character string which begins with @samp{a} and ends with @samp{b}.@refill @item * @cindex @samp{*} in regexp is not a construct by itself; it is a suffix, which means the preceding regular expression is to be repeated as many times as possible. In @samp{fo*}, the @samp{*} applies to the @samp{o}, so @samp{fo*} matches one @samp{f} followed by any number of @samp{o}s. The case of zero @samp{o}s is allowed: @samp{fo*} does match @samp{f}.@refill @samp{*} always applies to the @emph{smallest} possible preceding expression. Thus, @samp{fo*} has a repeating @samp{o}, not a repeating @samp{fo}.@refill The matcher processes a @samp{*} construct by matching, immediately, as many repetitions as can be found. Then it continues with the rest of the pattern. If that fails, backtracking occurs, discarding some of the matches of the @samp{*}-modified construct in case that makes it possible to match the rest of the pattern. For example, matching @samp{ca*ar} against the string @samp{caaar}, the @samp{a*} first tries to match all three @samp{a}s; but the rest of the pattern is @samp{ar} and there is only @samp{r} left to match, so this try fails. The next alternative is for @samp{a*} to match only two @samp{a}s. With this choice, the rest of the regexp matches successfully.@refill @item + @cindex @samp{+} in regexp Is a suffix character similar to @samp{*} except that it requires that the preceding expression be matched at least once. So, for example, @samp{ca+r} will match the strings @samp{car} and @samp{caaaar} but not the string @samp{cr}, whereas @samp{ca*r} would match all three strings.@refill @item ? @cindex @samp{?} in regexp Is a suffix character similar to @samp{*} except that it can match the preceding expression either once or not at all. For example, @samp{ca?r} will match @samp{car} or @samp{cr}; nothing else. @item [ @dots{} ] @cindex @samp{[} in regexp @cindex @samp{]} in regexp @samp{[} begins a @dfn{character set}, which is terminated by a @samp{]}. In the simplest case, the characters between the two form the set. Thus, @samp{[ad]} matches either one @samp{a} or one @samp{d}, and @samp{[ad]*} matches any string composed of just @samp{a}s and @samp{d}s (including the empty string), from which it follows that @samp{c[ad]*r} matches @samp{cr}, @samp{car}, @samp{cdr}, @samp{caddaar}, etc.@refill Character ranges can also be included in a character set, by writing two characters with a @samp{-} between them. Thus, @samp{[a-z]} matches any lower-case letter. Ranges may be intermixed freely with individual characters, as in @samp{[a-z$%.]}, which matches any lower case letter or @samp{$}, @samp{%} or period.@refill Note that the usual special characters are not special any more inside a character set. A completely different set of special characters exists inside character sets: @samp{]}, @samp{-} and @samp{^}.@refill To include a @samp{]} in a character set, you must make it the first character. For example, @samp{[]a]} matches @samp{]} or @samp{a}. To include a @samp{-}, write @samp{---}, which is a range containing only @samp{-}. To include @samp{^}, make it other than the first character in the set.@refill @item [^ @dots{} ] @cindex @samp{^} in regexp @samp{[^} begins a @dfn{complement character set}, which matches any character except the ones specified. Thus, @samp{[^a-z0-9A-Z]} matches all characters @emph{except} letters and digits.@refill @samp{^} is not special in a character set unless it is the first character. The character following the @samp{^} is treated as if it were first (@samp{-} and @samp{]} are not special there). Note that a complement character set can match a newline, unless newline is mentioned as one of the characters not to match. @item ^ @cindex @samp{^} in regexp @cindex beginning of line is a special character that matches the empty string, but only if at the beginning of a line in the text being matched. Otherwise it fails to match anything. Thus, @samp{^foo} matches a @samp{foo} which occurs at the beginning of a line. When matching a string, @samp{^} matches at the beginning of the string or after a newline character @samp{\n}. @item $ @cindex @samp{$} in regexp is similar to @samp{^} but matches only at the end of a line. Thus, @samp{xx*$} matches a string of one @samp{x} or more at the end of a line. When matching a string, @samp{^} matches at the end of the string or before a newline character @samp{\n}. @item \ @cindex @samp{\} in regexp has two functions: it quotes the special characters (including @samp{\}), and it introduces additional special constructs. Because @samp{\} quotes special characters, @samp{\$} is a regular expression which matches only @samp{$}, and @samp{\[} is a regular expression which matches only @samp{[}, and so on. Note that @samp{\} also has special meaning inside the read syntax of Lisp strings (@pxref{String Type}). Therefore, to build a regular expression that matches the @samp{\} character, you must preceed each @samp{\} in @code{"\\"} with another @samp{\}, i.e., @code{"\\\\"}. @refill @end table Note: for historical compatibility, special characters are treated as ordinary ones if they are in contexts where their special meanings make no sense. For example, @samp{*foo} treats @samp{*} as ordinary since there is no preceding expression on which the @samp{*} can act. It is poor practice to depend on this behavior; better to quote the special character anyway, regardless of where is appears.@refill For the most part, @samp{\} followed by any character matches only that character. However, there are several exceptions: characters which, when preceded by @samp{\}, are special constructs. Such characters are always ordinary when encountered on their own. Here is a table of @samp{\} constructs. @table @kbd @item \| @cindex @samp{|} in regexp @cindex regexp alternative specifies an alternative. Two regular expressions @var{a} and @var{b} with @samp{\|} in between form an expression that matches anything that either @var{a} or @var{b} will match.@refill Thus, @samp{foo\|bar} matches either @samp{foo} or @samp{bar} but no other string.@refill @samp{\|} applies to the largest possible surrounding expressions. Only a surrounding @samp{\( @dots{} \)} grouping can limit the grouping power of @samp{\|}.@refill Full backtracking capability exists to handle multiple uses of @samp{\|}. @item \( @dots{} \) @cindex @samp{(} in regexp @cindex @samp{)} in regexp @cindex regexp grouping is a grouping construct that serves three purposes: @enumerate @item To enclose a set of @samp{\|} alternatives for other operations. Thus, @samp{\(foo\|bar\)x} matches either @samp{foox} or @samp{barx}. @item To enclose a complicated expression for the postfix @samp{*} to operate on. Thus, @samp{ba\(na\)*} matches @samp{bananana}, etc., with any (zero or more) number of @samp{na} strings.@refill @item To mark a matched substring for future reference. @end enumerate This last application is not a consequence of the idea of a parenthetical grouping; it is a separate feature which happens to be assigned as a second meaning to the same @samp{\( @dots{} \)} construct because there is no conflict in practice between the two meanings. Here is an explanation of this feature: @item \@var{digit} after the end of a @samp{\( @dots{} \)} construct, the matcher remembers the beginning and end of the text matched by that construct. Then, later on in the regular expression, you can use @samp{\} followed by @var{digit} to mean ``match the same text matched the @var{digit}'th time by the @samp{\( @dots{} \)} construct.''@refill The strings matching the first nine @samp{\( @dots{} \)} constructs appearing in a regular expression are assigned numbers 1 through 9 in order that the open-parentheses appear in the regular expression. @samp{\1} through @samp{\9} may be used to refer to the text matched by the corresponding @samp{\( @dots{} \)} construct. For example, @samp{\(.*\)\1} matches any newline-free string that is composed of two identical halves. The @samp{\(.*\)} matches the first half, which may be anything, but the @samp{\1} that follows must match the same exact text. @item \` @cindex @samp{`} in regexp matches the empty string, provided it is at the beginning of the buffer. @item \' @cindex @samp{'} in regexp matches the empty string, provided it is at the end of the buffer. @item \b @cindex @samp{\b} in regexp matches the empty string, provided it is at the beginning or end of a word. Thus, @samp{\bfoo\b} matches any occurrence of @samp{foo} as a separate word. @samp{\bballs?\b} matches @samp{ball} or @samp{balls} as a separate word.@refill @item \B @cindex @samp{\B} in regexp matches the empty string, provided it is @emph{not} at the beginning or end of a word. @item \< @cindex @samp{\<} in regexp matches the empty string, provided it is at the beginning of a word. @item \> @cindex @samp{\>} in regexp matches the empty string, provided it is at the end of a word. @item \w @cindex @samp{\w} in regexp matches any word-constituent character. The editor syntax table determines which characters these are. @item \W @cindex @samp{\W} in regexp matches any character that is not a word-constituent. @item \s@var{code} @cindex @samp{\s} in regexp matches any character whose syntax is @var{code}. @var{code} is a character which represents a syntax code: thus, @samp{w} for word constituent, @samp{-} for whitespace, @samp{(} for open-parenthesis, etc. @xref{Syntax Tables}.@refill @item \S@var{code} @cindex @samp{\S} in regexp matches any character whose syntax is not @var{code}. @end table @node Complex Regexp Example, , Regular Expressions, Regular Expressions @comment node-name, next, previous, up @subsection Complex Regexp Example Here is a complicated regexp, used by Emacs to recognize the end of a sentence together with any whitespace that follows. It is the value of the variable @code{sentence-end}. First, the regexp is given in Lisp syntax to enable you to distinguish the spaces from the tab characters. In Lisp syntax, the string constant begins and ends with a double-quote. @samp{\"} stands for a double-quote as part of the regexp, @samp{\\} for a backslash as part of the regexp, @samp{\t} for a tab and @samp{\n} for a newline. @example "[.?!][]\"')@}]*\\($\\|\t\\| \\)[\n]*" @end example In contrast, if you evaluate the variable @code{sentence-end}, you will see the following: @example sentence-end @result{} "[.?!][]\"')@}]*\\($\\| \\| \\)[ ]*" @end example @noindent In this case, the tab and carriage return are the actual characters. This regular expression contains four parts in succession and can be decyphered as follows: @table @code @item [.?!] The first part of the pattern consists of three characters, a period, a question mark and an exclamation mark, within square brackets. The match must begin with one or other of these characters. @item []\"')@}]* The second part of the pattern is the group of closing braces and quotation marks, which can appear zero or more times. These may follow the period, question mark or exclamation mark. The @code{\"} is Lisp syntax for a double quote in a string. The asterisk, @samp{*}, indicates that the items in the previous group (the group surrounded by square brackets, @samp{[]}) may be repeated zero or more times. @item \\($\\|\t\\| \\) The third part of the pattern is one or other of: either the end of a line, or a tab, or two blank spaces. The double back-slashes are used to prevent Emacs from reading the parentheses and vertical bars as part of the search pattern; the parentheses are used to mark the group and the vertical bars are used to indicated that the patterns to either side of them are alternatives. The dollar sign is used to indicate the end of a line. The @key{TAB} character is inserted using @kbd{\t} and the two spaces are inserted as is. @item [\n]*" Finally, the last part of the pattern indicates that the end of the line or the whitespace following the period, question mark or exclamation mark may, but need not, be followed by one or more carriage returns. @end table @defun regexp-quote string This function returns a regular expression string which matches exactly @var{string} and nothing else. This allows you to request an exact string match when calling a function that wants a regular expression. @example (regexp-quote "^The cat$") @result{} "\\^The cat\\$" @end example One use of @code{regexp-quote} is to combine an exact string match with context described as a regular expression. For example, this searches for the string which is the value of @code{string}, surrounded by whitespace: @example (re-search-forward (concat "\\s " (regexp-quote string) "\\s ")) @end example @end defun @node Regular Expression Searching, Replacement, Regular Expressions, Searching and Matching @section Regular Expression Searching @cindex regular expression searching A cluster of functions provide various features involved with regular expression searches. The primary function is @code{re-search-forward}. @deffn Command re-search-forward regexp &optional limit noerror repeat This function searches forward in the current buffer for a string of text that is matched by the regular expression @var{regexp}. The function skips over any amount of text that is not matched by @var{regexp}, and leaves the point at the end of the first string of text that does match. If the search is successful (i.e., if there is text that is matched by @var{regexp}), then point is left at the end of that text, and the function returns @code{t}. If there is no text matched by @var{regexp}, then a @code{search-failed} error is signaled. However, if @var{noerror} is @code{t}, then if the search fails, @code{re-search-forward} returns @code{nil} without signaling an error. If @var{limit} is supplied (it must be a number or a marker), it will be the maximum position in the buffer that the point can be skipped to. Point will be left at or before @var{limit}. Also, the match found must not extend after that position. This means that nothing can be found beyond the @var{limit}. Also, if the search fails and @var{noerror} is neither @code{nil} nor @code{t}, then point is moved to and left at the @var{limit} position; and the function returns @code{nil}. If @var{repeat} is supplied (it must be a positive number), then the search is repeated that many times; and point left at the end of the last match found. When called interactively, Emacs prompts you for @var{regexp} in the minibuffer. In the example, point is located directly before the @samp{T}. After evaluating the form, it is located at the end of that line (between the @samp{t} of @samp{hat} and before the newline). @example ---------- Buffer: foo ---------- I read "@point{}The cat in the hat comes back" twice. ---------- Buffer: foo ---------- (re-search-forward "[a-z]+" nil t 5) @result{} t ---------- Buffer: foo ---------- I read "The cat in the hat@point{} comes back" twice. ---------- Buffer: foo ---------- @end example @end deffn @deffn Command re-search-backward regexp &optional limit noerror repeat This function searches backward in the current buffer for a string of text that is matched by the regular expression @var{regexp}, leaving point at the beginning of the first text found. This function is the exact analog of @code{re-search-forward}. @end deffn @defun string-match regexp string &optional start This function returns the index of the start of the first match for the regular expression @var{regexp} in @var{string}, or @code{nil} if there is no match. If @var{start} is non-@code{nil}, the search is started at that index in @var{string}. For example, @example (string-match "quick" "The quick brown fox jumped quickly.") @result{} 4 (string-match "quick" "The quick brown fox jumped quickly." 8) @result{} 27 @end example @noindent The index of the first character of the string is 0, the index of the second character is 1, and so on. After this function returns, the index of the first character beyond the match is available as @code{(match-end 0)}. @example (string-match "quick" "The quick brown fox jumped quickly." 8) @result{} 27 (match-end 0) @result{} 32 @end example The @code{match-end} function is described along with @code{match-beginning}; @pxref{Match Data}. @end defun @defun looking-at regexp This function determines whether the text in the current buffer directly following the point matches the regular expression @var{regexp}. ``Directly following'' means precisely that: the search is ``anchored'' and it must succeed starting with the first character following the point. The result is @code{t} if so, @code{nil} otherwise. Point is not moved, but the match data is updated and can be used with @code{match-beginning} or @code{match-end}. In the example, the point is located directly before the @samp{T}. If it were anywhere else, the result would have been @code{nil}. @example ---------- Buffer: foo ---------- I read "@point{}The cat in the hat comes back" twice. ---------- Buffer: foo ---------- (looking-at "The cat in the hat$") @result{} t @end example @end defun @ignore @deffn Command delete-matching-lines regexp This function is identical to @code{delete-non-matching-lines}, save that it deletes what @code{delete-non-matching-lines} keeps. In the example below, the point is located on the first line of text. @example ---------- Buffer: foo ---------- We hold these truths to be self-evident, that all men are created equal, and that they are ---------- Buffer: foo ---------- (delete-matching-lines "the") @result{} nil ---------- Buffer: foo ---------- to be self-evident, that all men are created ---------- Buffer: foo ---------- @end example @end deffn @deffn Command flush-lines regexp This function is the same as @code{delete-matching-lines}. @end deffn @defun delete-non-matching-lines regexp This function deletes all lines following the point which don't contain a match for the regular expression @var{regexp}. @end defun @deffn Command keep-lines regexp This function is the same as @code{delete-non-matching-lines}. @end deffn @deffn Command how-many regexp This function counts the number of matches for @var{regexp} there are in the current buffer following the point. It prints this number in the echo area, returning the string printed. @end deffn @deffn Command count-matches regexp This function is a synonym of @code{how-many}. @end deffn @deffn Command list-matching-lines regexp nlines This function is a synonym of @code{occur}. Show all lines following point containing a match for @var{regexp}. Display each line with @var{nlines} lines before and after, or @code{-}@var{nlines} before if @var{nlines} is negative. @var{nlines} defaults to @code{list-matching-lines-default-context-lines}. Interactively it is the prefix arg. The lines are shown in a buffer named @samp{*Occur*}. It serves as a menu to find any of the occurrences in this buffer. @kbd{C-h m} (@code{describe-mode} in that buffer gives help. @end deffn @defopt list-matching-lines-default-context-lines Default value is 0. Default number of context lines to include around a @code{list-matching-lines} match. A negative number means to include that many lines before the match. A positive number means to include that many lines both before and after. @end defopt @end ignore @node Replacement, Match Data, Regular Expression Searching, Searching and Matching @section Replacement @cindex replacement Emacs has several replacement commands for interactive use. For a description of these, @pxref{Replace, , Replacement Commands, emacs, The GNU Emacs Manual}. The commands include: @table @code @item replace-regexp This function replaces every match of @var{regexp} occurring between point and the maximum point by @var{replacement}, which must be a string. @item replace-string This function replaces occurrences of @var{string} with @var{replacement}. @end table The following function replaces characters, not strings, but is included here since it involves replacement. @defun subst-char-in-region start end old-char new-char &optional noundo @cindex replace characters This function replaces all occurrences of the character @var{old-char} with the character @var{new-char} in the region of the current buffer defined by @var{start} and @var{end}. @cindex Outline mode @cindex undo avoidance If @var{noundo} is non-@code{nil}, then @code{subst-char-in-region} does not record the change for undo and does not mark the buffer as modified. This optional argument is used for obscure purposes, for example, in Outline mode to change visible lines to invisible lines and vice versa. @code{subst-char-in-region} does not move point and returns @code{nil}. @example ---------- Buffer: foo ---------- This is the contents of the buffer before. ---------- Buffer: foo ---------- (subst-char-in-region 1 20 ?i ?X) @result{} nil ---------- Buffer: foo ---------- ThXs Xs the contents of the buffer before. ---------- Buffer: foo ---------- @end example @end defun @ignore @deffn Command replace-regexp regexp replacement delimited The action of this function is to replace every match of @var{regexp} occurring between point and the maximum point by @var{replacement}, which must be a string. The special treatment of @samp{\} in @code{replacement} is the same as for @code{replace-match}. If @var{delimited} is non-@code{nil}, then it replaces only matches surrounded by word boundaries. The case of the replacement text will be determined by the same rules that @code{replace-match} uses. @end deffn @deffn Command replace-string string replacement &optional delimited This function replaces occurrences of @var{string} with @var{replacement}. @end deffn @defvar query-replace-help The value of this variable is a help message to print when the user types @kbd{?} in @code{query-replace}. @end defvar @end ignore @node Match Data, Standard Regexps, Replacement, Searching and Matching @section Match Data Emacs keeps track of the positions of the start and end of segments of text found during a regular expression search. This means, for example, that you can search for a complex pattern, such as a date in an rmail message, and extract different parts of it. @defun match-beginning count This function returns the position of the start of text matched by the last regular expression searched for. @var{count}, a number, specifies which subexpression to return the start position of. If @var{count} is zero, then it returns the position of the text matched by the whole regexp. If @var{count} is greater than zero, then the position of the beginning of the text matched by the @var{count}'th subexpression is returned, regardless of whether it was used in the final match. Subexpressions are those expressions grouped inside of parentheses, @samp{\(@dots{}\)}. The @var{count}'th subexpression is found by counting occurances of @samp{\(} from the beginning of the whole regular expression. The first subexpression is 1, the second is 2, and so on. The @code{match-end} function is similar to the @code{match-beginning} function except that it returns the position of the end of the matched text. (In the example, the positions in the text are numbered to make the results more apparent.) @example (string-match "\\(qu\\)\\(ick\\)" "The quick brown fox jumped quickly.") @result{} 4 ;^^^^^^^^^^ ;0123456789 (match-beginning 1) ; @r{The beginning of the match} @result{} 4 ; @r{with @samp{qu} is at index 4.} (match-beginning 2) ; @r{The beginning of the match} @result{} 6 ; @r{with @samp{ick} is at index 6.} (match-end 1) ; @r{The end of the match} @result{} 6 ; @r{with @samp{qu} is at index 6.} (match-end 2) ; @r{The end of the match} @result{} 9 ; @r{with @samp{ick} is at index 9.} @end example @noindent The @code{match-end} function is used in functions such as @code{rmail-make-basic-summary-line}. Here is another example. Before the form is evaluated, the point is located at the beginning of the line. After evaluating the search form, it is located on the line between the space and the word @kbd{in}. The beginning of the entire match is at the 9th character of the buffer (@samp{T}), and the beginning of the match for the first subexpression is at the 13th character (@samp{c}). @example (list (re-search-forward "The \\(cat \\)") (match-beginning 0) (match-beginning 1)) @result{} (t 9 13) ---------- Buffer: foo ---------- I read "The cat @point{}in the hat comes back" twice. ^ ^ 9 13 ---------- Buffer: foo ---------- @end example @noindent (Note that in this case, the index returned is a buffer position; the first character of the buffer counts as 1.) It is essential that @code{match-beginning} be called after the search desired and before any other searches are performed. @code{match-beginning} may not give the desired results if called in a separate command from the search. The example below is the wrong way to call @code{match-beginning}. @example (re-search-forward "The \\(cat \\)") @result{} t (foo) ; @r{Perhaps @code{foo} does more regexp searching.} (match-beginning 0) @result{} 61 ; @r{Unexpected result!} @end example See the discussion of @code{store-match-data} for an example of how to save match data and restore the information after an intervening search. @end defun @defun match-end count This function returns the position of the end of text matched by the last regular expression searched for. This function is the exact analog of @code{match-beginning}. @end defun @defun replace-match replacement &optional fixedcase literal This function replaces the text matched by the last search with @var{replacement}. @cindex case in replacements If @var{fixedcase} is non-@code{nil}, then the case of the replacement text is not changed. Otherwise the replacement text is converted to a different case depending upon the capitalization of the text to be replaced. If the original text is all upper case, then the replacement text is converted to upper case, except when all of the words in the original text are only one character long. In that event, the replacement text is capitalized. If @emph{all} of the words in the original text are capitalized, then all of the words in the replacement text will be capitalized. If @var{literal} is non-@code{nil}, then @var{replacement} is inserted exactly as it is, the only alterations being a possible change in case. If it is @var{nil} (the default), then the character @samp{\} is treated specially. If a @samp{\} appears in @var{replacement}, then it must be followed by one of the following characters: @table @asis @item @kbd{\&} @cindex @samp{&} in replacement @kbd{\&} is replaced by the entire original text. @item @kbd{\@var{N}} @cindex @samp{\@var{n}} in replacement @var{n} is a digit. @kbd{\@var{n}} is replaced by the @var{n}'th subexpression in the original regexp. Subexpressions are those expressions grouped inside of @samp{\(@dots{}\)}. @item @kbd{\\} @cindex @samp{\} in replacement @samp{\\} is replaced by @samp{\}. @end table @code{replace-match} leaves the point at the end of the replacement text, and returns @code{t}. @end defun @defun match-data This function returns a new list containing all the information on what the last search matched. The zero'th element is the beginning of the match for the whole expression; the first element is the end of the match for the expression. The next two elements are the beginning and end of the match for the first subexpression. In general, the 2@var{n}'th element corresponds to @code{(match-beginning @var{n})}; and element 2@var{n} + 1 corresponds to @code{(match-end @var{n})}. All the elements are markers, or @code{nil} if there was no match for that subexpression. As with other search commands, there must be no possibility of intervening searches between the call to a search and the call to @code{match-data} that is intended to save the match-data for that search. @example (match-data) @result{} (#<marker at 9 in foo> #<marker at 17 in foo> #<marker at 13 in foo> #<marker at 17 in foo>) @end example @end defun @defun store-match-data match-list This function sets the internal data structure for the ``last search match'' to the elements of @var{match-list}. @var{match-list} should have been created by calling @code{match-data} previously. Together with @code{match-data}, @code{store-match-data} may be used to avoid changing the @code{match-data} if you do a regexp search. This is useful when such searches occur in subroutines whose callers may not expect searches to go on. The following example illustrates the canonical use of these two functions. @example (let ((data (match-data))) (unwind-protect ... ; @r{May change the original match data.} (store-match-data data))) @end example All asynchronous process functions (filters and sentinels) and some modes that use @code{recursive-edit} should save and restore the match data if they do a search or if they let a user make a search. Here is a function which will restore the match data if the buffer associated with it still exists. @example (defun restore-match-data (data) "Restore the match data DATA unless the buffer is missing." (catch 'foo (let ((d data)) (while d (and (car d) (null (marker-buffer (car d))) ;; match-data buffer is deleted. (throw 'foo nil)) (setq d (cdr d))) (store-match-data data) ))) @end example @end defun @node Standard Regexps, Searching and Case, Match Data, Searching and Matching @section Standard Regular Expressions Used in Editing @cindex regular expressions used standardly in editing @cindex standard regular expressions used in editing @defvar page-delimiter This is the regexp describing line-beginnings that separate pages. The default value is @code{"^\014"} (i.e., @code{"^^L"} or @code{"^\C-l"}). @end defvar @defvar paragraph-separate This is the regular expression for the beginning of a line that separates paragraphs. (If you change this, you may have to change @code{paragraph-start} also.) The default value is @code{"^[ \t\f]*$"}, which is a line that consists entirely of spaces, tabs, and form feeds. @end defvar @defvar paragraph-start This is the regular expression for the beginning of a line that starts @emph{or} separates paragraphs. The default value is @code{"^[ \t\n\f]"}, which means any number of spaces, tabs, newlines, and form feeds. @end defvar @defvar sentence-end This is the regular expression describing the end of a sentence. All paragraph boundaries also end sentences, regardless. Default value is @code{"[.?!][]\"')@}]*\\($\\|\t\\| \\)[ \t\n]*"}. This means a period, question mark or exclamation mark, followed by a closing brace, followed by tabs, spaces or new lines. For a full description of this regular expression, @pxref{Complex Regexp Example}. @end defvar @node Searching and Case, , Standard Regexps, Searching and Matching @section Searching and Case @cindex searching and case By default, searches in Emacs ignore the case of the text they are searching through; if you specify searching for @samp{FOO}, then @samp{Foo} and @samp{foo} are also considered a match. Regexps, and in particular character sets, are included: @samp{[aB]} would match @samp{a} or @samp{A} or @samp{b} or @samp{B}.@refill If you do not want this feature, set the variable @code{case-fold-search} to @code{nil}. Then all letters must match exactly, including case. This is a per-buffer-local variable; altering the variable affects only the current buffer. (@xref{Buffer Local Variables}.) Alternatively, you may change the value of @code{default-case-fold-search}, which is the default value of @code{case-fold-search} for buffers that do not override it. @defopt case-replace This variable determines whether @code{query-replace} should preserve case in replacements. If the variable is @code{nil}, then case need not be preserved. @end defopt @defopt case-fold-search This buffer-local variable determines whether searches should ignore case. If the variable is @code{nil} they will not, if it is @code{t}, then they will ignore case. @end defopt @defvar default-case-fold-search The value of this variable is the default value for @code{case-fold-search} in buffers that do not override it. This is the same as @code{(default-value 'case-fold-search)}. @end defvar