|
DataMuseum.dkPresents historical artifacts from the history of: DKUUG/EUUG Conference tapes |
This is an automatic "excavation" of a thematic subset of
See our Wiki for more about DKUUG/EUUG Conference tapes Excavated with: AutoArchaeologist - Free & Open Source Software. |
top - metrics - downloadIndex: T s
Length: 21841 (0x5551) Types: TextFile Names: »strings.texinfo«
└─⟦a05ed705a⟧ Bits:30007078 DKUUG GNU 2/12/89 └─⟦c06c473ab⟧ »./UNRELEASED/lispref.tar.Z« └─⟦1b57a2ffe⟧ └─⟦this⟧ »strings.texinfo«
@setfilename ../info/strings @node Strings and Characters, Lists, Numbers, Top @comment node-name, next, previous, up @chapter Strings and Characters @cindex strings @cindex character arrays @cindex characters @cindex bytes Strings are used to send messages to users, to hold extracts from buffers, and for many other purposes. The print names of symbols are strings. Because strings are so important, a large number of functions are provided expressly for manipulating them. In Emacs Lisp, programmers often use strings, but they seldom use characters, except when defining keymaps. The length of a string (like any array) is fixed, and cannot be altered. In particular, strings in Lisp are @emph{not} terminated by any particular character, such as @sc{ASCII} code 0, as they are in C. This means that any character, including the null character (@sc{ASCII} code 0), is a valid element of a string.@refill Since strings are considered arrays, you can operate on them with the general array functions. (@xref{Sequences Arrays Vectors}.) For example, you can access or change individual characters in a string by using the @code{aref} and @code{aset} functions (@pxref{Arrays}). On the other hand, although strings are considered arrays, they require only 8 bits for each element rather than a minimum of 32. This means that a string takes up much less memory than a vector of the same length. @xref{Text}, for information about functions that display strings or copy them into buffers. @xref{Character Type}, and @pxref{String Type}, for information about the syntax of characters and strings. @menu * Predicates for Strings:: * Creating Strings:: * Comparison of Characters and Strings:: * Conversion of Characters and Strings:: * Character Case:: @end menu @node Predicates for Strings, Creating Strings, Strings and Characters, Strings and Characters @section The Predicates for Strings For more information about general sequence and array predicates, @pxref{Sequences Arrays Vectors}, and @pxref{Arrays}. @defun stringp object This function returns @code{t} if the object is a string, @code{nil} otherwise. @end defun @defun char-or-string-p object This function returns @code{t} if the object is a string or a character (i.e., an integer), @code{nil} otherwise. @end defun @node Creating Strings, Comparison of Characters and Strings, Predicates for Strings, Strings and Characters @section Creating Strings The following functions return a string. For example, in @file{emacs/lisp/undigest.el}, the @code{makestring} function is used generate a string of similar characters to use in searching for the separator line in a mail digest. In the @code{dired-uncompress} function @file{emacs/lisp/dired.el}, the @code{substring} function is used to extract a filename without its @samp{.Z} extension, whereas in @code{dired-compress}, the @code{concat} function is used to append @samp{.Z} to a filename. @defun make-string integer character This function returns a string made up of @var{integer} repetitions of @var{character}. @example (make-string 5 ?x) @result "xxxxx" (make-string 0 ?x) @result "" @end example Compare this function to @code{char-to-string}, @code{make-vector}, and @code{make-list}. (@xref{Conversion of Characters and Strings}, @pxref{Vectors}, and @pxref{Building Cons Cells and Lists}.) @end defun @defun substring string start &optional end This function returns a new string which consists of those characters from @var{string} in the range from (and including) the character at the index @var{start} up to (but excluding) the character at the index @samp{end}. The first character is numbered zero. @group @example (substring "abcdefg" 0 3) @result "abc" @end example @end group @noindent In the example, the index for @samp{a} is 0, the index for @samp{b} is 1, and the index for @samp{c} is 2. Thus, three letters, @samp{abc}, are copied from the full string. The index @samp{3} marks the character position up to which the substring is copied. The character whose index is three is actually the fourth character in the string. Note that the index location is just before the character that the cursor appears on top of; it may be just before the first character. This location is exactly analogous to the location of point in a buffer, except that the position of the first character in a buffer is 1, but the index of the first character in a string is 0. (@xref{Positions}.) A negative number counts from the end of the string, so that -1 is the index of the last character of the string. @group @example (substring "abcdefg" -3 -1) @result "ef" @end example @end group @noindent In this example, the index for @samp{e} = @minus{}3, the index for @samp{f} = @minus{}2, and the index for @samp{g} = @minus{}1. More precisely, the starting index, @minus{}3, falls between the letters @samp{d} and @samp{e}; and the ending index, @minus{}1, falls between the letters @samp{f} and @samp{g}. That is why only the two characters between the two index positions are copied, @samp{e} and @samp{f}. When @code{nil} is used as an index, it falls after the last character in the string. Thus: @group @example (substring "abcdefg" -3 nil) @result "efg" @end example @end group When given 0 for @var{start} and no end argument, the @code{substring} function returns the whole string passed it as an argument. @group @example (substring "abcdefg" 0) @result "abcdefg" @end example @end group @noindent (But don't use @samp{substring} to copy a whole string; use @code{copy-sequence} instead.) @cindex args-out-of-range error A @code{wrong-type-argument} error results if either @var{start} or @var{end} are non-integers. An @code{args-out-of-range} error results if @var{start} indicates a character following @var{end}, or if either integer is out of range for the string.@refill @end defun @defun concat &rest sequences @cindex copying sequences @cindex copying strings @cindex concatenation This function returns a new string consisting of the characters in the one or more arguments passed to it. The arguments are not changed. The arguments may be strings, lists of numbers, or vectors of numbers. If no arguments are passed to @code{concat}, the function returns an empty string. @example (concat "abc" "-def") @result "abc-def" (concat "abc" -123 (list 120 (+ 256 121)) [122]) @result "abc-123xyz" (concat "The " "quick brown " "fox.") @result "The quick brown fox." (concat) @result "" @end example The @code{concat} functions always constructs a new string which is not @code{eq} to any existing string. @cindex integer to decimal As a feature, in the special case where an argument is an integer (not a sequence of integers), it is converted to a string of digits making up the decimal print representation of the integer. @group @example (concat 137) @result "137" (concat 54 321) @result "54321" @end example @end group For information about other concatenation functions, see @code{mapconcat} in @ref{Mapping Functions}, @code{vconcat} in @ref{Vectors}, and @code{append} in @ref{Building Cons Cells and Lists}. @end defun @node Comparison of Characters and Strings, Conversion of Characters and Strings, Creating Strings, Strings and Characters @section Comparison of Characters and Strings @cindex string equality @defun char-equal character1 character2 This function returns @code{t} if the arguments represent the same character, @code{nil} otherwise. This is done by comparing two integers modulo 256. @example (char-equal ?x ?x) @result t (char-to-string (+ 256 ?x)) @result "x" (char-equal ?x (+ 256 ?x)) @result t @end example @end defun @defun string= string1 string2 This function returns @code{t} if the characters of the two strings match exactly. Case is significant. @code{string-equal} is the same as @code{string=}. @example (string= "abc" "abc") @result t (string= "abc" "ABC") @result nil (string= "ab" "ABC") @result nil @end example @end defun @defun string-equal string1 string2 @code{string-equal} is another name for @code{string=}. @end defun @cindex lexical comparison @defun string< string1 string2 @c (findex string< causes problems for permuted index!!) This function compares two strings and finds the first pair of characters, if any, in the two strings that do not match. If the numeric value of the @sc{ASCII} code of the character from the first string is less than the numeric value of the @sc{ASCII} code of the character from the second string, then this function returns @code{t}. If the strings match, the value is @code{nil}. Lower case letters have higher numeric values in the @sc{ASCII} character set than their upper case counterparts; numbers and many punctuation characters have a lower numeric value than upper case letters. @group @example (string< "abc" "abd") @result t (string< "abd" "abc") @result nil (string< "123" "abc") @result t @end example @end group When the strings have different lengths, and they match up to the length of @var{string1}, then the result is @code{t}. If they match up to the length of @var{string2}, the result is @code{nil}. A string without any characters in it is the smallest possible string. @group @example (string< "" "abc") @result t (string< "ab" "abc") @result t (string< "abc" "") @result nil (string< "abc" "ab") @result nil (string< "" "") @result nil @end example @end group @end defun @defun string-lessp string1 string2 @code{string-lessp} is another name for @code{string<}. @end defun @node Conversion of Characters and Strings, Formatting Strings, Comparison of Characters and Strings, Strings and Characters @comment node-name, next, previous, up @section Conversion of Characters and Strings @cindex conversion Characters and strings may be converted into each other and into integers. @code{format} and @code{prin1-to-string} (@pxref{Output Functions}) may also be used to convert Lisp objects into strings. @code{read-from-string} (@pxref{Input Functions}) may be used to ``convert'' a string representation of a Lisp object into an object. Also @code{concat}, @code{append}, and @code{vconcat} perform conversion of an integer to decimal representation as a special feature.@refill @xref{Documentation}, for a description of the functions @code{single-key-description} and @code{text-char-description}, which return a string representing the Emacs standard notation of the argument character. These functions are used primarily for printing help messages.@refill @defun char-to-string character @cindex character to string This function returns a new string with a length of one character. The value of @var{character}, modulo 256, is used to initialize the element of the string. This function is similar to @code{make-string} with an integer argument of 1. (@xref{Creating Strings}.) @example (char-to-string ?x) @result "x" (char-to-string (+ 256 ?x)) @result "x" (make-string 1 ?x) @result "x" @end example @end defun @defun string-to-char string @cindex string to character This function returns the decimal numeric value of the first character in the string. Expressed another way, this function returns the first character of the string in the print syntax for characters. (@xref{Character Type}.) If the string is empty, the function returns 0, which is identical to the result if the string begins with the null character, @sc{ASCII} code 0. @example (string-to-char "ABC") @result 65 (string-to-char "xyz") @result 120 (string-to-char "") @result 0 (string-to-char "\000") @result 0 @end example @end defun @defun int-to-string integer @cindex integer to string @cindex integer to decimal This function returns a string consisting of the digits of @var{integer}, base ten. When passed a positive integer as an argument, this function returns an unsigned string. When passed a negative integer, the function returns a string with a leading minus sign. @example (int-to-string 256) @result "256" (int-to-string -23) @result "-23" @end example @end defun @defun string-to-int string @cindex string to integer This function returns the integer value of the characters in @var{string}, read as a number in base ten. The string is read starting from (and including) the first character, and it is read until a non-digit is encountered. If the first character is not a digit, this function returns 0. @example (string-to-int "256") @result 256 (string-to-int "25 is a perfect square.") @result 25 (string-to-int "X256") @result 0 @end example @end defun @node Formatting Strings, Character Case, Conversion of Characters and Strings,Strings and Characters @comment node-name, next, previous, up @section Formatting Strings @cindex formatting strings @cindex strings, formatting them The @code{message} and @code{error} functions use conversion specification sequences that are identical to those described here for @code{format}. @defun format string &rest objects @cindex formatting This function returns a new string that is made up from the characters in @var{string} with any conversion specification sequences in it replaced by the print representations of the matching @var{objects}. A conversion specification sequence is a string of characters beginning with a @samp{%}. For example, if there is a @samp{%s} in @var{string}, the @code{format} function does not print the @samp{%s} as such, but substitutes one of the other arguments there. It evaluates that argument and inserts its print representation in place of the @samp{%s}. @group @example (format "The value of fill-column is %d." fill-column) @result "The value of fill-column is 72." @end example @end group If @var{string} contains more than one conversion specification, the conversion specifications are matched in order with successive arguments. Thus, the first conversion specification in @var{string} is matched with the first argument that follows @var{string}; the second conversion specification is matched with the second argument that follows @var{string}, and so on. Any extra conversion specifications (those for which there are no matching arguments) will have unpredictable behavior. Any extra objects will be ignored. Although conversion specifications are supposed to specify the types of object with which they are matched, no error is reported if there is a mismatch between the typed specified by the conversion specification and the type of its corresponding object. @cindex @samp{%} in format The character @samp{%} begins a conversion specification. The characters following it indicate how the object should be represented. The characters recognized are described below: @table @samp @item s @cindex print representation @cindex object to string Replace the specification with the print representation of the object. If there is no corresponding object, the empty string is used. @item o @cindex integer to octal Replace the specification with the base eight representation of an integer. @item d @cindex integer to decimal Replace the specification with the base ten representation of an integer. @item x @cindex integer to hexadecimal Replace the specification with the base sixteen representation of an integer. @item c @cindex character to string Replace the specification with the print representation of a character. @item % A single @kbd{%} is placed in the string. @end table @cindex invalid format error Any other conversion character results in an @samp{Invalid format operation} error. Here are several examples: @group @example (format "The name of this buffer is %s." (buffer-name)) @result "The name of this buffer is strings.texinfo." (format "In hash notation, this buffer is %s." (current-buffer)) @result "In hash notation, this buffer is #<buffer strings.texinfo>." (format "The octal value of 18 is %o, and the hex value is %x." 18 18) @result "The octal value of 18 is 22, and the hex value is 12." @end example @end group @cindex numeric prefix @cindex field width @cindex padding All the specification characters allow an optional numeric prefix between the @samp{%} and the character. The optional numeric prefix defines the minimum width for the object. If the print representation of the object contains fewer characters than this, then it is padded. The padding character is normally a space, but if the numeric prefix starts with a zero, zeros are used for padding. The padding is on the left if the prefix is positive (or starts with zero) and on the right if the prefix is negative. @example (format "%06d will be padded on the left with zeros" 123) @result "000123 will be padded on the left with zeros" (format "%-6d will be padded on the right" 123) @result "123 will be padded on the right" @end example No matter what the prefix, nothing in the print representation will be truncated. This allows the programmer to specify spacing exactly without knowing how many characters there are in the object's print representation. In the three following examples, @samp{%7s} specifies a minimum width of 7. In the first case, the word inserted in place of @samp{%7s} has only 3 letters, so 4 blank spaces are inserted for padding. In the second case, the word "specification" is 13 letters wide but is not truncated. In the third case, the padding is on the right. (This does not work in version 18, but does work in version 19.) @example (format "The word `%7s' actually has %d letters in it." "foo" (length "foo")) @result "The word ` foo' actually has 3 letters in it." (format "The word `%7s' actually has %d letters in it." "specification" (length "specification")) @result "The word `specification' actually has 13 letters in it." (format "The word `%-7s' actually has %d letters in it." "foo" (length "foo")) @result "The word `foo ' actually has 3 letters in it." ;; @r{%-7s fails to work in version 18, but does work in version 19.} ;; @r{In version 18, padding is not inserted.} @end example @end defun @node Character Case, , Formatting Strings, Strings and Characters @comment node-name, next, previous, up @section Character Case @cindex upper case @cindex lower case @cindex character case The character case functions change the case both of single characters and characters within strings. The functions only convert alphabetic characters (the letters `A' through `Z' and `a' through `z'); other characters are not converted. The functions do not modify the strings which are passed to them as arguments. The examples below use the characters @samp{X} and @samp{x} which have @sc{ASCII} values 88 and 120 respectively. @defun downcase string-or-char This function converts a character or a string to lower case. When the argument is a string, a string is returned. When the argument is a character, the returned value is a Lisp object that is represented in the print syntax for characters; it is the decimal number of the @sc{ASCII} code for the character. When the argument to @code{downcase} is a string, the function creates a new string in which each letter in the argument that is upper case is converted to lower case.@refill When the argument to @code{downcase} is a character, @code{downcase} returns the print representation of the lower case of that character if it is a letter, or the print representation of the character itself if not. @example (downcase "The cat in the hat") @result "the cat in the hat" (downcase ?X) @result 120 @end example @end defun @defun upcase string-or-char This function converts a character or a string to upper case. When the argument is a string, a string is returned. When the argument is a character, the returned value is a Lisp object that is represented in the print syntax for characters; it is the decimal number of the @sc{ASCII} code for the character. When the argument to @code{upcase} is a string, the function creates a new string in which each letter in the argument that is lower case is converted to upper case.@refill When the argument to @code{upcase} is a character, @code{upcase} returns the upper case of the print representation of that character if it is a letter, or the print representation of the character itself if not. @example (upcase "The cat in the hat") @result "THE CAT IN THE HAT" (upcase ?x) @result 88 @end example @end defun @defun capitalize string-or-char @cindex capitalization This function capitalizes strings or characters. If @var{string-or-char} is a string, the function capitalizes each word in @var{string-or-char}. This means that the first character of each word is converted to upper case and the rest are converted to lower case. The definition of a word is any sequence of consecutive characters that are assigned to the @code{word} category in the current syntax table (@xref{Syntax Classes}). When the argument to @code{capitalize} is a character, @code{capitalize} returns the upper case of that character if it is a letter, or the character itself if not. In this case, the returned value is a Lisp object that is represented in the print syntax for characters; it is the decimal number of the @sc{ASCII} code for the character. When the argument is a string, a string is returned. @example (capitalize "The cat in the hat") @result "The Cat In The Hat" (capitalize "THE 77TH-HATTED CAT") @result "The 77th-Hatted Cat" (capitalize ?x) @result 88 @end example @end defun